Heap is a lot larger than used, how to find the cause?

kdb+

Heap is a lot larger than used, how to find the cause?

Posted by nick_mospan on March 13, 2023 at 12:00 am

I’ve got a process doing some calculations on a timer and sending updated table to another process. Its heap is more than 3x of used even after manual trigger of .Q.gc.

key value

used 567774096

heap 1946157056

peak 2617245696

I’m using KDB+ 4.0 2021.04.26

Is memory fragmentation the only cause to it? How do I find which operation contributes to it the most?

Are there any other cases when kdb accumulates its internal memory or known bugs leading to memory leaks?

Thanks

nick_mospan replied 2 months, 1 week ago 4 Members · 9 Replies
9 Replies

key	value
used	567774096
heap	1946157056
peak	2617245696

gyorokpeter-kx

Member
March 13, 2023 at 12:00 am

As a first step you could insert printouts of .Q.w[] in between the actual operations in the query, even breaking down expressions into single operator invocations if necessary. Additionally .Q.ts can be used to figure out the time and space used by an operation, similarly to ts but it also returns the result (it is parameterized like . (dot) for multi-parameter apply).
Laura

Administrator
March 14, 2023 at 12:00 am
Hi Nick,

The previous comment of using .Q.w[] is a good start for isolating what part of the calculations are memory intensive and requiring a large heap allocation by the OS. Printing to standard out using 0N! after each expected memory intensive line will isolate that point in your code.

On the more under-the-hood side, this article by AquaQ is quite helpful to help understand. But to summarise and add some additional points:
- KDB allocates memory in powers of two. Meaning a vector of data will be placed in a memory block one power of 2 up from the raw data, leading to at most 2x memory used.
- Memory fragmentation may also be an issue depending on your aggregations – example here
- The Q process starts with a certain amount of heap allocation that is larger than the used space (this can be seen by starting a Q session and running .Q.w[] straight away). The process won’t go below this heap allocation by the OS on startup.
If you don’t think that a combination of these points contributes enough to cause the heap to be this much larger than used after calling .Q.gc[] I’d recommend invoking the script from the timer manually and investigating with .Q.w from there, as the heap does appear rather large even given the above. This would eliminate the issue of running garbage collection, and the timer function running again while investigating with .Q.w causing the numbers to be misleading.

nick_mospan

Member

March 14, 2023 at 12:00 am

Thanks, I found one of the causes – code that brings and refreshes a large table from another process.

I’m starting a fresh process and bringing in a table of 107Mb. The heap settles to 268Mb after .Q.gc[].

However after updating this table the heap jumps up to 469Mb and stays there.

What’s different between the first and second call to position:h”position” ? Why the heap does not go back to the initial 268Mb?

Here’s the console output:

q).Q.w[] 
used| 360512 
heap| 67108864 
peak| 67108864 
wmax| 0 
mmap| 0 
mphy| 34359267328 
syms| 686 
symw| 37328 

q)position:h"position" 
q).Q.w[] 
used| 226930848 
heap| 402653184 
peak| 402653184 
wmax| 0 
mmap| 0 
mphy| 34359267328 
syms| 1833 
symw| 95932 

q).Q.gc[] 
134217728 

q).Q.w[] 
used| 226930848 
heap| 268435456 
peak| 402653184 
wmax| 0 
mmap| 0 
mphy| 34359267328 
syms| 1834 
symw| 95962 

q)position:h"position" 
q).Q.gc[] 
134217728 

q).Q.w[] 
used| 226933216 
heap| 469762048 
peak| 603979776 
wmax| 0 
mmap| 0 
mphy| 34359267328 
syms| 1834 
symw| 95962 

q).Q.gc[] 
0 

q)count position 
276765 

q)-22!position 
107637762

Laura

Administrator
March 15, 2023 at 12:00 am
Hi Nick,

Here are the steps I did to attempt reproducing your issue:

Host Machine (Port 5000):
```
q)n:50000000 
q)position:([]time:n?.z.p;sym:n?`ABC`APPL`WOW;x:n?10f)
```
Client Machine:
```
q)h:hopen`::5000 
q).Q.w[] 
used| 357632 
heap| 67108864 
peak| 67108864 
wmax| 0 
mmap| 0 
mphy| 8335175680 
syms| 668 
symw| 28560 

q)position:h"position" 
q).Q.w[] 
used| 1610970544 
heap| 2751463424 
peak| 2751463424 
wmax| 0 
mmap| 0 
mphy| 8335175680 
syms| 672 
symw| 28678 

q).Q.gc[] 
1073741824 

q).Q.w[] 
used| 1610969232 
heap| 1677721600 
peak| 2751463424 
wmax| 0 
mmap| 0 
mphy| 8335175680 
syms| 673 
symw| 28708 

q)position:h"position" 
q).Q.w[] 
used| 1610969232 
heap| 4362076160 
peak| 4362076160 
wmax| 0 
mmap| 0 
mphy| 8335175680 
syms| 673 
symw| 28708 

q).Q.gc[] 
2684354560 

q).Q.w[] 
used| 1610969232 
heap| 1677721600 
peak| 4362076160 
wmax| 0 
mmap| 0 
mphy| 8335175680 
syms| 673 
symw| 28708
```
As you can see in trying to replicate your issue, my example releases the expected amount of memory back to OS. Due to the number of records you have and the relative size of the table after, the issue I think you’re encountering is due to the data structure of position leading to memory fragmentation. As per my other reply the reference on code kx gives an example of this stating “nested data, e.g. columns of char vectors, or much grouping” will lead to fragmenting memory heavily, does this reflect your data?

To fix this I’d suggest the approach on the reference of serialise, release, deserialise. Or to extend further to your case: serialise, release, deserialise, release, IPC reassign, release. This will maintain a low memory footprint and try to remedy the memory fragmentation but you may still unavoidably have heap greater than used purely due to the data structure (however to a lesser extent to what you’re experiencing).

If memory fragmentation isn’t the cause can you give a bit more insight on the data structure of position as my attempt to replicate shows this problem is data specific.
davidcrossey

Member
March 15, 2023 at 12:00 am

might be worth checking if the objects are <64MB too

“During that return of memory, q checks if the capacity of the object is ?64MB. If it is and g is 1, the memory is returned immediately to the OS; otherwise, the memory is returned to the thread-local heap for reuse.

Executing .Q.gc[] additionally attempts to coalesce pieces of the heap into their original allocation units and returns any units ?64MB to the OS.” – System commands in q | Basics | kdb+ and q documentation – Kdb+ and q documentation (kx.com)
nick_mospan

Member
March 15, 2023 at 12:00 am
My table has 54 columns of various simple types, mainly floats, symbols, ints and timestamps. Each column is of around 2Mb in size.

I can reproduce it with your code by dropping n to 2000000, which makes columns similar in size to my case. .Q.gc[] does not help releasing the excess heap to the OS:
```
q).Q.w[] 
used| 50694464 
heap| 134217728 
peak| 201326592 
wmax| 0 
mmap| 0 
mphy| 34359267328 
syms| 696 
symw| 37613
```
Each column with n:2000000 should be allocated 16777216 bytes of heap.
```
q)(-22!) each value flip position 
16000014 8667837 16000014
```
What is the reason for such behaviour? Are these columns small enough to lead to memory fragmentation or there’s something else going on?
Laura

Administrator
March 17, 2023 at 12:00 am

I wasn’t able to replicate the issue on my local machine running on KDB+ 4.0 2020.07.15:

My heap returned back to the level it was at the start of the Q session on release as expected.

However I was able to recreate the issue running KDB+ 4.0 Cloud Edition 2022.01.31.

So the issue seems to lie with QCE releasing back to OS. I’ll follow up internally on this to see if it’s a known issue and what can be done to minimise the heap used.

However, per the screenshot I wasn’t able to recreate the re-assigning of position via IPC call not lowering heap after running .Q.gc[] (heap is the same after GC and re-assigning this as initial assign and GC).

As a potential fix to this can you try before your second assignment of position purging it from memory:

delete position from `. .Q.gc[]

.Q.w[] // to inspect position:h”position”

.Q.w[] // to inspect .Q.gc[]

.Q.w[] // to inspect
nick_mospan

Member
March 21, 2023 at 12:00 am

To replicate the issue please copy position table twice, like you did with the cloud edition. It’s the second copy that takes and not releases the memory. I’m not running a cloud edition but the windows version:

KDB+ 4.0 2021.04.26 Copyright (C) 1993-2021 Kx Systems w64/ 8()core 32767MB

My theory is that the first copy creates the object in the first 64Mb block. For the second invocation of h”position” it had to create the second block and then assignment repoints the columns from the first to the second block. But because the first block has other objects already it cannot be freed. When the process is constantly updating this position table and at the same time serves other queries this situation repeats over and over slowly leading to a memory fragmentation that appears as a memory leak.

Is it possible to control the minimum block from command line? So knowing that a process is frequently creating “small” objects I could start it with 1Mb minimum block size instead of 64Mb?
Laura

Administrator
March 22, 2023 at 12:00 am

Hi Nick,

Understood on the QCE version not being an issue. So in my initial response to this I wasn’t able to replicate the issue with n:50000000, if you look at that you see I call position twice and the heap returns to normal.

For n:2000000 I see the issue however so on the same page now:

Regardless, did you try my fix I suggested in the latest response – as it works for both QCE and Q:

See how if I delete position from the local namespace before reassigning it the heap returns to normal after GC.

I think your theory about the first block allocation then second block use on second IPC call is correct. The reason I didn’t see this for the n=50000000 case was because the data was of a size that the memory allocated was large enough to hold both the IPC read and what was currently in memory without allocating another block. For the data you’re using or the n=2000000 case the memory allocated was nearer to the amount taken up by the object in memory.

So my solution of deleting from the local namespace before calling again reduces the used memory in the process enough to be able to contain the second assignment and stop the invocation of the second block. Important to note that if you delete from the local namespace immediately before the second assignment this shouldn’t affect your code since the reassignment would overwrite the variable anyway.

KX Community

Heap is a lot larger than used, how to find the cause?