-
Parallelising .Q.dpft with default compression enabled
.Q.dpft and why to parallelise it
.Q.dpft can be split into 4 steps- Find the index ordering of the table with iasc
- Enumerate table against sym file
- Save down table 1 column at a time, reordering and applying attributes as it goes
- Set .d file down
Since writing data down to disk is an IO-bound operation, attempting to parallelise any of the above steps would normally not yield any speed increase.But if you are saving data down with a default compression algorithm, .Q.dpft will spend an average of x amount of time per column compressing the data before it is written to disk.Parallelising this compression will allow the IO channels to be saturated more frequently than just using a single thread.Parallelising .Q.dpft
Very similar to .Q.dpft, except I replace the each-both with a peach{[d;p;f;t] i:iasc t f; tab:.Q.en[d;`. t]; .[{[d;t;i;c;a]@[d;c;:;a t[c]i]}[d:.Q.par[d;p;t];tab;i;;]]peach flip(c;)(::;`p#)f=c:cols t; @[d;`.d;:;f,c where not f=c]; t };
Testing
Before each test I would- Start a new q session
- Clear out the HDB directory
- Define default compression with .z.zd
- Create table to be saved down
By running the following code:// Set default compression and delete HDB .z.zd:17 2 6; system"rm -r /home/alivingston/HDB/*"; dir:`:/home/alivingston/HDB; // define parallelised .Q.dpft func:{[d;p;f;t] i:iasc t f; tab:.Q.en[d;`. t]; .[{[d;t;i;c;a]@[d;c;:;a t[c]i]}[d:.Q.par[d;p;t];tab;i;;]]peach flip(c;)(::;`p#)f=c:cols t; @[d;`.d;:;f,c where not f=c]; t }; // Create table n:10000000; trade:([]timestamp:.z.p+til n;sym:n?`2;a:n?0;b:n?1f;c:string n?`3;d:n?0b;e:n?0;f:n?1f;g:string n?`3;h:n?0b);
I then test the original .Q.dpft and the new function when slaves at set to 0, 2, 4 and 8, while logging RAM usage with top.\ts func[dir;.z.d;`sym;`trade] \ts .Q.dpft[dir;.z.d;`sym;`trade]
Results
For the following tables, time and space have been normalised against a reference run from .Q.dpft.threads| time space -------| ----------- 0 | 0.992 1 2 | 1.52 1.17 4 | 1.8 1.32 8 | 2.61 1.66 ?
In an attempt to limit memory usage I repeated this testing with automatic garbage collection enabled with -g 1threads| time space -------| ----------- 0 | 0.981 1 2 | 1.56 1.08 4 | 1.84 1.2 8 | 2.63 1.49
The parallelised .Q.dpft func with 2 threads ran 56% faster using 8% more RAM, while 8 threads was 163% faster using 50% more RAM.
Conclusion
Due to the extra memory required, this would likely not be sensible to run on an RDB at EOD.I think the best use case for this would be when attempting to ingest years of data into kdb as fast as possible where RAM isn’t an issue.Comments or critiques are more than welcome. I’d be interested to know if this should be avoided for the above use case or if there are any other issues with this that I have missed.
Log in to reply.