-
How do column files in a partitioned table locate the sym?
Hey all,
I asked a question of a friend with strong KDB Fu. My question was ‘how do the column files in a partitioned table know where the sym is?’
Ever the wise KDB Fu Master, rather than answering, his suggestion was that I post the question on KX Community. However, full of ‘Frontiersman Spirit’ I decided to charge off into the wilderness to try answering that question myself. The following is the result of my investigations. I hope you find it as useful as I did. I’m keen to hear back if anybody has understanding that might provide deeper insight.Prerequisites
First things to note, sym files work as here:
https://code.kx.com/q4m3/14_Introduction_to_Kdb%2B/#14621-the-sym-file
partitioned tables in Q work as here:
https://code.kx.com/q4m3/14_Introduction_to_Kdb%2B/#1431-partitions
There is a lot to read in the above – for now the key points are that sym files provide a way of efficiently storing much duplicated data and partitioned tables have a nested layout as given in the link. Second point – for this example to work, I opened a Q session and didn’t load the HDB. That means the sym file associated with my HDB wasn’t loaded.
Third point – recall for I/O:
- read0 will read text files
- read1 will read binary files.
- get will read Q data files. This is the one we use here.
The experiment
In a particular partition, the table is represented by a folder with the table name containing one file for each column and a .d file which lists the columns in the table.
So using get we can have a look at the content of the files in that table folder.
I find that collecting the content of .d in dContnt gives me that list of columns in the table I had expected.
dContnt: get `:/pathToHDB/pathToTable/.d
gives:
`ric`tStamp`ref`cnt`vwap`av`mi`mx`sumVol`pctlPrice`pctlVol`vwapBidPrice`sumBidSize`vwapAskPrice`sumAskSize
Also, each of the column files basically just contains a list (as expected). However, here we get to see the magic. Now I load my column containing enumerated syms, (here the column ric) using
ricContnt: get `:/pathToHDB/pathToTable/ric
I see:
`p#`sym!2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3
Here note that I’ve not loaded my HDB so my sym isn’t in memory. The returned result is just a standard enumerated list with a partition attribute applied (`p#). As with any other enumerated list, if the data that it is enumerated against isn’t already in memory, we see the underlying position in that list rather than the value at that position. If I load the sym file to memory that the list was enumerated against:
load `:/pathtoHDB/sym
then get the same data from that ‘ric’ column:
ricContent:get `:/pathToHDB/pathToTable/ric
This time I get this:
`p#`sym!`DJI`DJI`DJI`DJI`DJI`DJI`DJI`DJI`DJI`DJI`DJI`FTSE`FTSE`FTSE`FTSE`FTSE`FTSE`FTSE`FTSE`FTSE`FTSE
So now, with the data that the list was enumerated against in memory, we see the values at the position in the sym that the enumeration points at rather than just a numeric reference to the position.
Looking at the content of my sym, using
`sym[]
I see:
`ATOI`HSI`DJI`FTSE`.HSI`.HSLR`.LMQUI`.LMSER`.LMTEL`.LMTELE15`.N225`.NDX`.SPBLSPT`.STI`.STIA08`MAPL.SI`0001.HK`0001.HKC15`0002.HK`0003.HK`0004.HK`0005.HK`0006.HK`0008.HK`0010.HK`0011.HK`0012.HK`0013.HK
plus a lot more syms. You can see here that position 2 and 3 do correspond to the values we see returned. (remember the number base is 0 not 1). All seems well.
Conclusion
The enumerated columns in the partitioned tables are literally just enumerated lists. Any attributes of that list are applied at the front of the list exactly as they would be if you applied them in Analyst/Developer. If the HDB (and most importantly the sym file it contains) wasn’t already loaded to memory, the enumerated columns in the partitioned tables are returned as the naked pointers to the positions in the list. As soon as you load the sym file the column is enumerated against, the elements in that sym file are returned instead of the pointers.
I can’t find the part of the table that tells it where the sym is located on disk because the table doesn’t need or have that information.
Each enumerated column file contains the objected it is enumerated against and for the enumerated items to be displayed, that object has to be available in memory when the table is queried.
If you want to change the name of a partitioned table, you have to be sure to consistently rename the folder name containing the column files across all partitions and reload it. Also, the sym file location is arbitrary provided you have a process in place to load it to memory to use with the table. In theory, you could also call the sym file anything provided that thing is correctly referenced in the enumeration at the start of the column list and in memory.
Good practice is to locate the sym file in the root of the HDB and call it sym so it will be picked up and loaded without further thought.
Log in to reply.