KX Community

Find answers, ask questions, and connect with our KX Community around the world.

Home Forums kdb+ How do column files in a partitioned table locate the sym?

  • How do column files in a partitioned table locate the sym?

    Posted by simon_watson_sj on January 11, 2022 at 12:00 am

    Hey all,

    I asked a question of a friend with strong KDB Fu. My question was ‘how do the column files in a partitioned table know where the sym is?’
    Ever the wise KDB Fu Master, rather than answering, his suggestion was that I post the question on KX Community. However, full of ‘Frontiersman Spirit’ I decided to charge off into the wilderness to try answering that question myself.  The following is the result of my investigations. I hope you find it as useful as I did. I’m keen to hear back if anybody has understanding that might provide deeper insight.

    Prerequisites

    First things to note, sym files work as here:

    https://code.kx.com/q4m3/14_Introduction_to_Kdb%2B/#14621-the-sym-file

    partitioned tables in Q work as here:

    https://code.kx.com/q4m3/14_Introduction_to_Kdb%2B/#1431-partitions

    There is a lot to read in the above – for now the key points are that sym files provide a way of efficiently storing much duplicated data and partitioned tables have a nested layout as given in the link.   Second point – for this example to work, I opened a Q session and didn’t load the HDB. That means the sym file associated with my HDB wasn’t loaded.

    Third point – recall for I/O:

    • read0 will read text files
    • read1 will read binary files.
    • get will read Q data files. This is the one we use here.

    The experiment

    In a particular partition, the table is represented by a folder with the table name containing one file for each column and a .d file which lists the columns in the table.

    So using get we can have a look at the content of the files in that table folder.

    I find that collecting the content of .d in dContnt gives me that list of columns in the table I had expected.

    dContnt: get `:/pathToHDB/pathToTable/.d

    gives:

    `ric`tStamp`ref`cnt`vwap`av`mi`mx`sumVol`pctlPrice`pctlVol`vwapBidPrice`sumBidSize`vwapAskPrice`sumAskSize

    Also, each of the column files basically just contains a list (as expected). However, here we get to see the magic. Now I load my column containing enumerated syms, (here the column ric) using

    ricContnt: get `:/pathToHDB/pathToTable/ric

    I see:

    `p#`sym!2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3

    Here note that I’ve not loaded my HDB so my sym isn’t in memory. The returned result is just a standard enumerated list with a partition attribute applied (`p#). As with any other enumerated list, if the data that it is enumerated against isn’t already in memory, we see the underlying position in that list rather than the value at that position. If I load the sym file to memory that the list was enumerated against:

    load `:/pathtoHDB/sym

    then get the same data from that ‘ric’ column:

    ricContent:get `:/pathToHDB/pathToTable/ric

    This time I get this:

    `p#`sym!`DJI`DJI`DJI`DJI`DJI`DJI`DJI`DJI`DJI`DJI`DJI`FTSE`FTSE`FTSE`FTSE`FTSE`FTSE`FTSE`FTSE`FTSE`FTSE

    So now, with the data that the list was enumerated against in memory, we see the values at the position in the sym that the enumeration points at rather than just a numeric reference to the position.

    Looking at the content of my sym, using

    `sym[]

    I see:

    `ATOI`HSI`DJI`FTSE`.HSI`.HSLR`.LMQUI`.LMSER`.LMTEL`.LMTELE15`.N225`.NDX`.SPBLSPT`.STI`.STIA08`MAPL.SI`0001.HK`0001.HKC15`0002.HK`0003.HK`0004.HK`0005.HK`0006.HK`0008.HK`0010.HK`0011.HK`0012.HK`0013.HK

    plus a lot more syms. You can see here that position 2 and 3 do correspond to the values we see returned. (remember the number base is 0 not 1).  All seems well.

    Conclusion

    The enumerated columns in the partitioned tables are literally just enumerated lists. Any attributes of that list are applied at the front of the list exactly as they would be if you applied them in Analyst/Developer.  If the HDB (and most importantly the sym file it contains) wasn’t already loaded to memory, the enumerated columns in the partitioned tables are returned as the naked pointers to the positions in the list. As soon as you load the sym file the column is enumerated against, the elements in that sym file are returned instead of the pointers.

    I can’t find the part of the table that tells it where the sym is located on disk because the table doesn’t need or have that information.

    Each enumerated column file contains the objected it is enumerated against and for the enumerated items to be displayed, that object has to be available in memory when the table is queried.

    If you want to change the name of a partitioned table, you have to be sure to consistently rename the folder name containing  the column files across all partitions and reload it. Also, the sym file location is arbitrary provided you have a process in place to load it to memory to use with the table. In theory, you could also call the sym file anything provided that thing is correctly referenced in the enumeration at the start of the column list and in memory.

    Good practice is to locate the sym file in the root of the HDB and call it sym so it will be picked up and loaded without further thought.

     

    simon_watson_sj replied 2 months ago 2 Members · 1 Reply
  • 1 Reply
  • leahs

    Member
    November 1, 2022 at 12:00 am

    Super content here @simon_watson_sj

    Thanks for sharing with our fellow community members!

    Happy Coding ,

    Leah

Log in to reply.