Replies – Forums – Laura – Page 9

Laura

Administrator

April 5, 2023 at 12:00 am in reply to: How to download attachments from *.eml file

Hi KPC,

Looking at an example .eml file here. If you wanted to parse the attachments purely in KDB/Q without the use of Python libs (although I suggest using Python libs) I’d suggest something along the lines of:

read0 the *.eml file. Depending on the contents and if you want to interpret new lines literally or not you may find “c”$read1 a more appropriate solution
Use regex to locate the contents of the attachment, content type and encoding type (from the example looks to default to b64)
Decode the body of the attachment – for b64 decoding in KDB/Q it looks like this is a solution
b64Decode:{c:sum x=”=”;neg[c]_”c”$raze 256 vs’64 sv’0N 4#.Q.b6?x}?
Post-process the data further into Q objects if it’s suitable. E.g. if the filetype is a json you may want to utilise the .j.k json deserialiser for Q

The solution provided should be the preferred solution with embedPy. Adding to this there is a PyPi lib that claims to handle attachments too:
https://pypi.org/project/eml-parser/

Laura

Administrator

March 31, 2023 at 12:00 am in reply to: Dashboards Streaming (For Dummies)

Hey Roc,

Thanks. Don’t think the Platform stuff is going to fit the bill given I already have a backend with the TP stuff going on.

I’ve been working off the UI options, since Im using Dashboards, and have managed to get data streaming in but it appears to stop after 20-30 seconds.

Do I need to add anything to the TP sym.q file?

Once I’ve added the:

.u.snap:{tablename}

.ringBuffer.read:{[t;i] $[i<=count t; i#t;i rotate t]};

.ringBuffer.write:{[t;r;i] @[t;(i mod count value t)+til 1;:;r];};

.stream.i:0-1;

.stream.tablename:20000#tablename;

Is there any other requisite to stream data into Dashboards?

Laura

Administrator

March 22, 2023 at 12:00 am in reply to: Heap is a lot larger than used, how to find the cause?

Hi Nick,

Understood on the QCE version not being an issue. So in my initial response to this I wasn’t able to replicate the issue with n:50000000, if you look at that you see I call position twice and the heap returns to normal.

For n:2000000 I see the issue however so on the same page now:

Regardless, did you try my fix I suggested in the latest response – as it works for both QCE and Q:

See how if I delete position from the local namespace before reassigning it the heap returns to normal after GC.

I think your theory about the first block allocation then second block use on second IPC call is correct. The reason I didn’t see this for the n=50000000 case was because the data was of a size that the memory allocated was large enough to hold both the IPC read and what was currently in memory without allocating another block. For the data you’re using or the n=2000000 case the memory allocated was nearer to the amount taken up by the object in memory.

So my solution of deleting from the local namespace before calling again reduces the used memory in the process enough to be able to contain the second assignment and stop the invocation of the second block. Important to note that if you delete from the local namespace immediately before the second assignment this shouldn’t affect your code since the reassignment would overwrite the variable anyway.

Laura

Administrator

March 17, 2023 at 12:00 am in reply to: Heap is a lot larger than used, how to find the cause?

I wasn’t able to replicate the issue on my local machine running on KDB+ 4.0 2020.07.15:

My heap returned back to the level it was at the start of the Q session on release as expected.

However I was able to recreate the issue running KDB+ 4.0 Cloud Edition 2022.01.31.

So the issue seems to lie with QCE releasing back to OS. I’ll follow up internally on this to see if it’s a known issue and what can be done to minimise the heap used.

However, per the screenshot I wasn’t able to recreate the re-assigning of position via IPC call not lowering heap after running .Q.gc[] (heap is the same after GC and re-assigning this as initial assign and GC).

As a potential fix to this can you try before your second assignment of position purging it from memory:

delete position from `. .Q.gc[]

.Q.w[] // to inspect position:h”position”

.Q.w[] // to inspect .Q.gc[]

.Q.w[] // to inspect

Laura

Administrator

March 16, 2023 at 12:00 am in reply to: Type error from Q chk

As a simple starting point before investigating the data, type errors in Q suggest that you have provided the wrong datatype to a function. In the case of .Q.chk make sure you’re passing in a filepath as per the documentation which is a symbol type. If you could share the line of code you’re calling the .Q.chk in and the argument you’re providing, as well as an ls on the directory you’re performing the check on, that would help in isolating the problem. An example call of .Q.chk is:

.Q.chk[`:/path/to/dir]

N.B. You will get a type error if you call .Q.chk with a string:

q).Q.chk[“/path/to/dir”]

‘type [0]

.Q.chk[“/path/to/dir”]

Hope this helps!

Laura

Administrator

March 15, 2023 at 12:00 am in reply to: Heap is a lot larger than used, how to find the cause?

Hi Nick,

Here are the steps I did to attempt reproducing your issue:

Host Machine (Port 5000):

q)n:50000000 
q)position:([]time:n?.z.p;sym:n?`ABC`APPL`WOW;x:n?10f)

Client Machine:

q)h:hopen`::5000 
q).Q.w[] 
used| 357632 
heap| 67108864 
peak| 67108864 
wmax| 0 
mmap| 0 
mphy| 8335175680 
syms| 668 
symw| 28560 

q)position:h"position" 
q).Q.w[] 
used| 1610970544 
heap| 2751463424 
peak| 2751463424 
wmax| 0 
mmap| 0 
mphy| 8335175680 
syms| 672 
symw| 28678 

q).Q.gc[] 
1073741824 

q).Q.w[] 
used| 1610969232 
heap| 1677721600 
peak| 2751463424 
wmax| 0 
mmap| 0 
mphy| 8335175680 
syms| 673 
symw| 28708 

q)position:h"position" 
q).Q.w[] 
used| 1610969232 
heap| 4362076160 
peak| 4362076160 
wmax| 0 
mmap| 0 
mphy| 8335175680 
syms| 673 
symw| 28708 

q).Q.gc[] 
2684354560 

q).Q.w[] 
used| 1610969232 
heap| 1677721600 
peak| 4362076160 
wmax| 0 
mmap| 0 
mphy| 8335175680 
syms| 673 
symw| 28708

As you can see in trying to replicate your issue, my example releases the expected amount of memory back to OS. Due to the number of records you have and the relative size of the table after, the issue I think you’re encountering is due to the data structure of position leading to memory fragmentation. As per my other reply the reference on code kx gives an example of this stating “nested data, e.g. columns of char vectors, or much grouping” will lead to fragmenting memory heavily, does this reflect your data?

To fix this I’d suggest the approach on the reference of serialise, release, deserialise. Or to extend further to your case: serialise, release, deserialise, release, IPC reassign, release. This will maintain a low memory footprint and try to remedy the memory fragmentation but you may still unavoidably have heap greater than used purely due to the data structure (however to a lesser extent to what you’re experiencing).

If memory fragmentation isn’t the cause can you give a bit more insight on the data structure of position as my attempt to replicate shows this problem is data specific.

Laura

Administrator

March 14, 2023 at 12:00 am in reply to: Heap is a lot larger than used, how to find the cause?

Hi Nick,

The previous comment of using .Q.w[] is a good start for isolating what part of the calculations are memory intensive and requiring a large heap allocation by the OS. Printing to standard out using 0N! after each expected memory intensive line will isolate that point in your code.

On the more under-the-hood side, this article by AquaQ is quite helpful to help understand. But to summarise and add some additional points:

KDB allocates memory in powers of two. Meaning a vector of data will be placed in a memory block one power of 2 up from the raw data, leading to at most 2x memory used.
Memory fragmentation may also be an issue depending on your aggregations – example here
The Q process starts with a certain amount of heap allocation that is larger than the used space (this can be seen by starting a Q session and running .Q.w[] straight away). The process won’t go below this heap allocation by the OS on startup.

If you don’t think that a combination of these points contributes enough to cause the heap to be this much larger than used after calling .Q.gc[] I’d recommend invoking the script from the timer manually and investigating with .Q.w from there, as the heap does appear rather large even given the above. This would eliminate the issue of running garbage collection, and the timer function running again while investigating with .Q.w causing the numbers to be misleading.

Laura

Administrator

March 8, 2023 at 12:00 am in reply to: Parallelising .Q.dpft with default compression enabled

Tacking on here some further improvements Alex and myself discussed:

funcMem:{[d;p;f;t] i:iasc t f; 
    c:cols t; 
    is:(ceiling count[i]%count c) cut i; 
    tab:.Q.en[d;`. t]; 
    {[d;tab;c;t;f;i].[{[d;t;i;c;a]@[d;c;,;a t[c]i]}[d;tab;i;;]]peach flip(c;)(::;`p#)f=c:cols t}[d:.Q.par[d;p;t];tab;c;t;f;]each is; 
    @[d;`.d;:;f,c where not f=c]; t };

This makes the memory drawback less – theoretically this will be more memory efficient than the standard .Q.dpft. What the above is doing is slicing up the parted column into chunks, such that the maximum size of a chunk in memory of the table contains the same number of entries as a single column of the table (which is the maximum amount of data .Q.dpft holds in memory due to writing column-by-column).

The result of this will lead to the benefits of parallelisation as above without the memory drawback we have seen by simply adding peach.

My above statement I made of “more memory efficient than standard .Q.dpft“, I’ve claimed because the chunks are based on matching the number of elements of a column. .Q.dpft writing column-by-column means the maximum memory used would be for the biggest (in bytes) datatype column. The biggest for this new method would only contain part of that large datatype column at any one time, as well as other smaller datatypes, which will lead to at maximum the same memory usage of .Q.dpft in the case when the columns are of the same sized datatype.

Preliminary tests showed the maintained improvement in speed, with no memory drawback. However these tests were not standardised or conducted in an official unit testing framework. Would love to know the official results of this at some point – be that generated by myself or someone else who is curious.

Laura

Administrator

March 6, 2023 at 12:00 am in reply to: Report Management in kx dashboard

Hi , thanks for your question.

You’ll need to set up the report template to connect, and then use report management in Dashboards to connect to the source data populating the reports e.g. ds_gw_report.

See the documentation here for more information: Report Manager – KX Dashboards
Hope that helps!

Laura

Administrator

January 25, 2023 at 12:00 am in reply to: Multiple Chart, Single Cursor — Chart GL

Basics > Hover worked.

Y-Axis > Range > Selection Min and Max — hasnt worked so far. Do I need to click the min/max button too ?

Laura

Administrator

January 6, 2023 at 12:00 am in reply to: Beef with apply (@ and .)

Or, for that matter,

metaTbl . `ref`m

So, if you are thinking of defining objects by their paths, in this case the path would be `ref`m.

Laura

Administrator

January 3, 2023 at 12:00 am in reply to: workspace are not showing default

Hi ,

You have two options to get workspaces back:

1. Force a fresh launch of the learn.kx.com sandbox by opening in a new incognito/private browsing mode in another browser provider than used previously to access.

2. Use the new Academy sandbox which will replace learn.kx.com as the default Academy sandbox very soon – the three workspaces are:

Introduction to KX Developer

Introduction to SQL Interface

Fundamentals Capstone

Let us know if you try number #2 and have any feedback on the user experience! More on the new sandbox here.

Thanks,

Michaela

Laura

Administrator

December 2, 2022 at 12:00 am in reply to: Large Scale WindowJoins Questions

Im working in memory.

I load the file in its 800,000 x 7. Then run a bunch of updates to make the remaining 23 columns (30 or so total).

And then run the wj. And pass back the resulting table.

Im trying to simulate what it would be doing in a tp. Like as more data comes in the wj is going to continuously run slower until it hits max file for the day, 800k rows. So I figured loading the whole file in and doing it all at once would be a decent enough way to test what it’d be doing.

Laura

Administrator

December 1, 2022 at 12:00 am in reply to: Large Scale WindowJoins Questions

It took me significantly longer. But, I’m also dealing with 30 columns — would that matter, even though I’m just using mmm3 for the wj?

In testing, when I make the data table that im searching (data;(max:`mmm3)) smaller, things speed up. For example, I ran a 1 minute xbar on that table and the windowjoin now takes a couple seconds.

I dont understand what I could be doing wrong. At full scale, 800,000x30cols, it took like 30-40 minutes to complete a 5 minute lookback.

Laura

Administrator

November 30, 2022 at 12:00 am in reply to: Large Scale WindowJoins Questions

What if I’m not using a sym column? I’m going datetime to datetime for the windowjoin.

Would I get speed improvements by using a sym column?

Laura

Forum Replies Created