KX Community

Find answers, ask questions, and connect with our KX Community around the world.

Home Forums kdb+ Lists, dictionaries, tables and lists of dictionaries

  • Lists, dictionaries, tables and lists of dictionaries

    Posted by simon_watson_sj on September 14, 2021 at 12:00 am

    Team,

    I’ve got an interest in data processing for higher dimensional objects so have been putting in a bit of thought about how best to represent sometimes quite nested data structures in KDB/Q.

    Using the Apply function has become part of my routine but as I’ve used it, I’ve come to realize that maybe the way it works at present could be a limited case of a more general model. I think these are the nubs of my thought bubble:

    1. lists and dictionaries are distinct and separate objects in KDB. However, even though they might be implemented as separate objects, wouldn’t it make more sense to consider a list as a special case of a dictionary where the keys are numbers?
    2. we can use ‘flip’ to move between a dictionary of lists and a table but actually, shouldn’t a table just be a special case of a dictionary of lists where all the lists are the same length? In that case, if we think back to what a function actually offers us, should we consider ‘flip’ as a function that primarily allows us to move from using dictionary type syntax to table type syntax even though the semantic representations are fundamentally of the same structure?

    The reason this is an issue for me is in my nested structures, I can have dictionaries, lists or even tables at various depths. The apply function works well mostly but I’ve found situations but I find it fails where (for instance) one of the layers branches off to a list of strings.

    What I’m thinking is, if the underlying object is equivalent, shouldn’t we regard the distinction between lists of dictionaries and tables as just a matter of the approach you choose to manipulate the object rather than a property of the object itself? In that case, wouldn’t a function such as Apply be better if it was agnostic about such things? Basically, you provide a set of keys and it should just operate on those keys, indifferent to whether it is traversing between dictionaries, lists or tables?

    I’ve had a crack at a generic ‘Apply’ that kind of does this by incrementally traversing a set of keys using ‘over’ and flipping the structure at that level when needed. I’d be happy to share my efforts (I can’t currently start Q since I rebuilt my computer on the weekend and for some reason, I’m not getting a new license when I submit my email for the Q install).

    However, before I disappear too far down this rabbit hole – are there arguments why it might make more sense to keep this separation between lists and dictionaries or tables and dictionaries of lists?

    simon_watson_sj replied 1 month, 3 weeks ago 3 Members · 4 Replies
  • 4 Replies
  • rocuinneagain

    Member
    September 14, 2021 at 12:00 am

    1. I might suggest to think about it the other way. Dictionaries are more like special paired lists.

    q)dict:`a`b!1 2
    q)lists:{(key x;value x)} dict
    q)dict
    a| 1
    b| 2
    q)lists
    a b
    1 2
    q)dict `a
    1
    q)lists[0]?`a
    0
    q)lists[1] lists[0]?`a / The same as: dict `a
    1

    https://code.kx.com/q/ref/find/

     

    2. Yes a list of conforming dictionaries is promoted a table

    q)(`a`b!1 2;`a`b!1 2)
    a b
    ---
    1 2
    1 2

    Importantly in memory the way it is actually stored is ‘flipped’ so it is a dictionary of lists. (no longer a list of dictionaries)

    q).Q.s1 (`a`b!1 2;`a`b!1 2)
    "+`a`b!(1 1;2 2)"

     

    This way the keys/column-names only need to be stored once for the whole table and not for each row.

    The columns then are vectors which is more efficient and performant.

     

    There are more details on indexing at depth here :

    https://code.kx.com/q4m3/3_Lists/#38-iterated-indexing-and-indexing-at-depth

     

    The “querying unstructured data” section of this blog may be of interest:
    https://kx.com/blog/kdb-q-insights-parsing-json-files/

    The code in it focuses on tables but can be adapted to lists/dictionaries as well:

    q)asLists:sample cols sample
    q)asLists[0;;`expiry]
    17682D19:58:45.000000000
    `
    `long$()
    ,""
    `long$()
    0N
    ,""

    q)@[`asLists;0;{(enlist[`]!enlist (::))(,)/:x}]
    `asLists
    q)asLists[0;;`expiry]
    17682D19:58:45.000000000
    ::
    ::
    ::
    ::
    ::
    ::
    q)fill:{n:count i:where (::)~/:y;@[y;i;:;n#x]}
    q)fill[0Wn]asLists[0;;`expiry]
    17682D19:58:45.000000000 0W 0W 0W 0W 0W 0W

    
    

     

     

  • sstantoncook

    Member
    September 15, 2021 at 12:00 am

    Hi Simon,

     

    I agree with Rian in term of the generalisation of k data types. I.e.

    • Atom is a scalar representation of a data type.
    • List is a vector representation of an Atom.
    • Dictionary is a keyed set of lists. The key can be a List of any type. The values can a list of any type, a list of lists and list of Dictionaries.
    • Table is a List of commonly keyed Dictionaries. You can see this easily when you put two dictionaries in a list, or enlist one of them.

    Your point about apply (@;.) – in both cases, dictionaries and lists, it works by indexing.

    Dictionaries require the key value to index and apply the function.

    Lists require the index to index to apply the function.

  • simon_watson_sj

    Member
    December 28, 2021 at 12:00 am

    Hey Rian/Sam,

    I finally got around to investigating this more fully.

    The issue I have comes down to the below example of a nested data structure called dsEg here.

    dsEg: (`doctype`html)!(enlist “html”;`text`body!(enlist”test”;enlist ([]a: `d`f`g;b: 23 43 777)));

    My problem is that I don’t know how to use apply (@;.) to get to the columns on that nested table ([]a: `d`f`g;b: 23 43 777).

    I feel like it should be

    cols .[dsEg;(`html;`body;0)]

    since the table is enlisted which means it’s in a single element list.

    However, I can’t get it, or any other approach to work using Apply alone so I can’t use Apply as a method to traverse any general data structure  in cases like the above where the structure descends within a nested table. The best I can do is get to the layer above and apply raze. I know that sounds like a small thing but the problem comes when nesting then continues down into the table – there is no way to use apply with a list of keys to get past that application of raze at the table level.

    Building a function which will allow that to work was ultimately the motivation for this whole tangent. I think I now appreciate that Apply is intended to be fully generic so I’m basically reinventing the wheel. However, to help me abandon my method, can you advise how Apply might be used with a list of keys to get to the column names or any other nested elements within that table?

    Regards,

    Simon

  • rocuinneagain

    Member
    January 6, 2022 at 12:00 am

    As in the example it is a nested generic list the items need to be dealt with one at a time. As the list could have many different tables or even different datatypes within it.

    q)cols each .[dsEg;(`html;`body)] 
    a b 
    q).[dsEg;(`html;`body);{cols each x}] 
    doctype| ,"html" 
    html   | `text`body!(,"test";,`a`b)

    The use of  :: may be useful to you if you have not been using it

    https://code.kx.com/q/ref/apply/#nulls-in-i

    It allows you to skips levels

    q).[dsEg;(`html;`body;::;`a)] 
    d f g //Better shown on an item with multiple entries in the list 
    q)dsEg2:(`doctype`html)!(enlist "html";`text`body!(enlist"test";2#enlist ([]a: `d`f`g;b: 23 43 777))); 
    q).[dsEg2;(`html;`body;::;`a)] 
    d f g d f g

     

    .Q.s1 may also be useful to you as it can help show the underlying structure of an item better than the console at times.

    https://code.kx.com/q/ref/dotq/#qs1-string-representation

    q).[dsEg;(`html;`body;::;`a)] 
    d f g //Looks like a symbol list type 11h but is in fact a single item egeneric list type 0h 
    q){-1 .Q.s1 x;} .[dsEg;(`html;`body;::;`a)] 
    ,`d`f`g //.Q.s1 output can be ugly but always shows exact structure

     

     

Log in to reply.