-
Heap is a lot larger than used, how to find the cause?
Posted by nick_mospan on March 13, 2023 at 12:00 amI’ve got a process doing some calculations on a timer and sending updated table to another process. Its heap is more than 3x of used even after manual trigger of .Q.gc.
key value used 567774096 heap 1946157056 peak 2617245696 I’m using KDB+ 4.0 2021.04.26
Is memory fragmentation the only cause to it? How do I find which operation contributes to it the most?
Are there any other cases when kdb accumulates its internal memory or known bugs leading to memory leaks?
Thanks
nick_mospan replied 8 months, 2 weeks ago 4 Members · 9 Replies -
9 Replies
-
As a first step you could insert printouts of .Q.w[] in between the actual operations in the query, even breaking down expressions into single operator invocations if necessary. Additionally .Q.ts can be used to figure out the time and space used by an operation, similarly to ts but it also returns the result (it is parameterized like . (dot) for multi-parameter apply).
-
Hi Nick,
The previous comment of using .Q.w[] is a good start for isolating what part of the calculations are memory intensive and requiring a large heap allocation by the OS. Printing to standard out using 0N! after each expected memory intensive line will isolate that point in your code.
On the more under-the-hood side, this article by AquaQ is quite helpful to help understand. But to summarise and add some additional points:
- KDB allocates memory in powers of two. Meaning a vector of data will be placed in a memory block one power of 2 up from the raw data, leading to at most 2x memory used.
- Memory fragmentation may also be an issue depending on your aggregations – example here
- The Q process starts with a certain amount of heap allocation that is larger than the used space (this can be seen by starting a Q session and running .Q.w[] straight away). The process won’t go below this heap allocation by the OS on startup.
If you don’t think that a combination of these points contributes enough to cause the heap to be this much larger than used after calling .Q.gc[] I’d recommend invoking the script from the timer manually and investigating with .Q.w from there, as the heap does appear rather large even given the above. This would eliminate the issue of running garbage collection, and the timer function running again while investigating with .Q.w causing the numbers to be misleading.
-
Thanks, I found one of the causes – code that brings and refreshes a large table from another process.
I’m starting a fresh process and bringing in a table of 107Mb. The heap settles to 268Mb after .Q.gc[].
However after updating this table the heap jumps up to 469Mb and stays there.
What’s different between the first and second call to position:h”position” ? Why the heap does not go back to the initial 268Mb?
Here’s the console output:
q).Q.w[] used| 360512 heap| 67108864 peak| 67108864 wmax| 0 mmap| 0 mphy| 34359267328 syms| 686 symw| 37328 q)position:h"position" q).Q.w[] used| 226930848 heap| 402653184 peak| 402653184 wmax| 0 mmap| 0 mphy| 34359267328 syms| 1833 symw| 95932 q).Q.gc[] 134217728 q).Q.w[] used| 226930848 heap| 268435456 peak| 402653184 wmax| 0 mmap| 0 mphy| 34359267328 syms| 1834 symw| 95962 q)position:h"position" q).Q.gc[] 134217728 q).Q.w[] used| 226933216 heap| 469762048 peak| 603979776 wmax| 0 mmap| 0 mphy| 34359267328 syms| 1834 symw| 95962 q).Q.gc[] 0 q)count position 276765 q)-22!position 107637762
-
Hi Nick,
Here are the steps I did to attempt reproducing your issue:
Host Machine (Port 5000):
q)n:50000000 q)position:([]time:n?.z.p;sym:n?`ABC`APPL`WOW;x:n?10f)
Client Machine:
q)h:hopen`::5000 q).Q.w[] used| 357632 heap| 67108864 peak| 67108864 wmax| 0 mmap| 0 mphy| 8335175680 syms| 668 symw| 28560 q)position:h"position" q).Q.w[] used| 1610970544 heap| 2751463424 peak| 2751463424 wmax| 0 mmap| 0 mphy| 8335175680 syms| 672 symw| 28678 q).Q.gc[] 1073741824 q).Q.w[] used| 1610969232 heap| 1677721600 peak| 2751463424 wmax| 0 mmap| 0 mphy| 8335175680 syms| 673 symw| 28708 q)position:h"position" q).Q.w[] used| 1610969232 heap| 4362076160 peak| 4362076160 wmax| 0 mmap| 0 mphy| 8335175680 syms| 673 symw| 28708 q).Q.gc[] 2684354560 q).Q.w[] used| 1610969232 heap| 1677721600 peak| 4362076160 wmax| 0 mmap| 0 mphy| 8335175680 syms| 673 symw| 28708
As you can see in trying to replicate your issue, my example releases the expected amount of memory back to OS. Due to the number of records you have and the relative size of the table after, the issue I think you’re encountering is due to the data structure of position leading to memory fragmentation. As per my other reply the reference on code kx gives an example of this stating “nested data, e.g. columns of char vectors, or much grouping” will lead to fragmenting memory heavily, does this reflect your data?
To fix this I’d suggest the approach on the reference of serialise, release, deserialise. Or to extend further to your case: serialise, release, deserialise, release, IPC reassign, release. This will maintain a low memory footprint and try to remedy the memory fragmentation but you may still unavoidably have heap greater than used purely due to the data structure (however to a lesser extent to what you’re experiencing).
If memory fragmentation isn’t the cause can you give a bit more insight on the data structure of position as my attempt to replicate shows this problem is data specific.
-
might be worth checking if the objects are <64MB too
“During that return of memory, q checks if the capacity of the object is ?64MB. If it is and
g
is 1, the memory is returned immediately to the OS; otherwise, the memory is returned to the thread-local heap for reuse.Executing
.Q.gc[]
additionally attempts to coalesce pieces of the heap into their original allocation units and returns any units ?64MB to the OS.” – System commands in q | Basics | kdb+ and q documentation – Kdb+ and q documentation (kx.com) -
My table has 54 columns of various simple types, mainly floats, symbols, ints and timestamps. Each column is of around 2Mb in size.
I can reproduce it with your code by dropping n to 2000000, which makes columns similar in size to my case. .Q.gc[] does not help releasing the excess heap to the OS:
q).Q.w[] used| 50694464 heap| 134217728 peak| 201326592 wmax| 0 mmap| 0 mphy| 34359267328 syms| 696 symw| 37613
Each column with n:2000000 should be allocated 16777216 bytes of heap.
q)(-22!) each value flip position 16000014 8667837 16000014
What is the reason for such behaviour? Are these columns small enough to lead to memory fragmentation or there’s something else going on?
-
I wasn’t able to replicate the issue on my local machine running on KDB+ 4.0 2020.07.15:
My heap returned back to the level it was at the start of the Q session on release as expected.
However I was able to recreate the issue running KDB+ 4.0 Cloud Edition 2022.01.31.
So the issue seems to lie with QCE releasing back to OS. I’ll follow up internally on this to see if it’s a known issue and what can be done to minimise the heap used.
However, per the screenshot I wasn’t able to recreate the re-assigning of position via IPC call not lowering heap after running .Q.gc[] (heap is the same after GC and re-assigning this as initial assign and GC).
As a potential fix to this can you try before your second assignment of position purging it from memory:
delete position from `. .Q.gc[]
.Q.w[] // to inspect position:h”position”
.Q.w[] // to inspect .Q.gc[]
.Q.w[] // to inspect
-
To replicate the issue please copy position table twice, like you did with the cloud edition. It’s the second copy that takes and not releases the memory. I’m not running a cloud edition but the windows version:
KDB+ 4.0 2021.04.26 Copyright (C) 1993-2021 Kx Systems w64/ 8()core 32767MB
My theory is that the first copy creates the object in the first 64Mb block. For the second invocation of h”position” it had to create the second block and then assignment repoints the columns from the first to the second block. But because the first block has other objects already it cannot be freed. When the process is constantly updating this position table and at the same time serves other queries this situation repeats over and over slowly leading to a memory fragmentation that appears as a memory leak.
Is it possible to control the minimum block from command line? So knowing that a process is frequently creating “small” objects I could start it with 1Mb minimum block size instead of 64Mb?
-
Hi Nick,
Understood on the QCE version not being an issue. So in my initial response to this I wasn’t able to replicate the issue with n:50000000, if you look at that you see I call position twice and the heap returns to normal.
For n:2000000 I see the issue however so on the same page now:
Regardless, did you try my fix I suggested in the latest response – as it works for both QCE and Q:
See how if I delete position from the local namespace before reassigning it the heap returns to normal after GC.
I think your theory about the first block allocation then second block use on second IPC call is correct. The reason I didn’t see this for the n=50000000 case was because the data was of a size that the memory allocated was large enough to hold both the IPC read and what was currently in memory without allocating another block. For the data you’re using or the n=2000000 case the memory allocated was nearer to the amount taken up by the object in memory.
So my solution of deleting from the local namespace before calling again reduces the used memory in the process enough to be able to contain the second assignment and stop the invocation of the second block. Important to note that if you delete from the local namespace immediately before the second assignment this shouldn’t affect your code since the reassignment would overwrite the variable anyway.
Log in to reply.