KX Community

Find answers, ask questions, and connect with our KX Community around the world.
KX Community Guidelines

Home Forums kdb+ Parallelising .Q.dpft with default compression enabled Re: Parallelising .Q.dpft with default compression enabled

  • Laura

    Administrator
    March 8, 2023 at 12:00 am

    Tacking on here some further improvements Alex and myself discussed:

    funcMem:{[d;p;f;t] i:iasc t f; 
        c:cols t; 
        is:(ceiling count[i]%count c) cut i; 
        tab:.Q.en[d;`. t]; 
        {[d;tab;c;t;f;i].[{[d;t;i;c;a]@[d;c;,;a t[c]i]}[d;tab;i;;]]peach flip(c;)(::;`p#)f=c:cols t}[d:.Q.par[d;p;t];tab;c;t;f;]each is; 
        @[d;`.d;:;f,c where not f=c]; t };

    This makes the memory drawback less – theoretically this will be more memory efficient than the standard .Q.dpft. What the above is doing is slicing up the parted column into chunks, such that the maximum size of a chunk in memory of the table contains the same number of entries as a single column of the table (which is the maximum amount of data .Q.dpft holds in memory due to writing column-by-column).

    The result of this will lead to the benefits of parallelisation as above without the memory drawback we have seen by simply adding peach.

    My above statement I made of “more memory efficient than standard .Q.dpft“, I’ve claimed because the chunks are based on matching the number of elements of a column. .Q.dpft writing column-by-column means the maximum memory used would be for the biggest (in bytes) datatype column. The biggest for this new method would only contain part of that large datatype column at any one time, as well as other smaller datatypes, which will lead to at maximum the same memory usage of .Q.dpft in the case when the columns are of the same sized datatype.

    Preliminary tests showed the maintained improvement in speed, with no memory drawback. However these tests were not standardised or conducted in an official unit testing framework. Would love to know the official results of this at some point – be that generated by myself or someone else who is curious.