KX Community

Find answers, ask questions, and connect with our KX Community around the world.
KX Community Guidelines

Home Forums kdb Insights Pagination on query from on-disk/partitioned data (HDB)

  • Pagination on query from on-disk/partitioned data (HDB)

    Posted by sidpatel on February 14, 2024 at 12:00 am

    Hello All,

    Paginating through a table when querying in batches is quite a useful feature in SQL-like DBs. Understand that kdb being a column-based database, it is not the same expected behaviour as in SQL.

    In kdb+, however, there are a couple of similar implementations:

    The above 2 options have their own limitations. The first one only works on in-memory tables, and in the second one, any filters used are applied after row numbers (i) are selected, i.e. the number of required rows are filtered first and then the other where conditions are applied.

    My use case: I am trying to do pagination over a HDB table that is partitioned on-disk.

    One of the alternatives, although not ideal, I’ve found is to query the data from HDB first and once in memory, use the limit-offset like qsql query to achieve the pagination result, but this means, I have to query the entire dataset for every page. Other than creating a middle layer to store the entire data I need to page over in memory and create other APIs to query from this, I am happy to adopt any other more simpler approach.

    Has anyone run into this problem and has found any solution or using any alternatives that they can share? Maybe its just me, but I feel like this could be a very helpful feature.

    Thanks & Regards,

    Sid

    sidpatel replied 10 months, 3 weeks ago 2 Members · 2 Replies
  • 2 Replies
  • megan_mcp

    Administrator
    February 14, 2024 at 12:00 am

    Hi ,

    A suggestion could be to potentially go with a 2 step approach where you first find what the indices are (per partition) for the filter clause being applied – so not pulling the full result into memory. Then, based on all the indices returned, split these up into appropriate pages:

    q)n:1000000 
    q)tbl:([]time:n?.z.t;sym:n?`3;price:n?100f) 
    q).Q.dpft[`:.;;`sym;`tbl]@' 2024.02.12 2024.02.13 / write out two partitions `tbl`tbl 
    q)delete tbl from `. `. 
    q)l . 
    q)s:-100?sym / filter on 100 random symbols 
    q) q)pages:10 
    q)pageFilters:ungroup select idx:{ceiling[count[x]%y] cut x}[x;pages] by date from select date,i from tbl where sym in s / simple where clause used here on sym 
    q) 
    q).Q.cn tbl 
    q).Q.pn tbl| 10000000 10000000 
    q) 
    q)pageTable:{.Q.ind[tbl;(sum .Q.pn[`tbl] where date=x[`date])+x[`idx]]} 
    q)pageTable pageFilters[0] / page 0 
    date sym time price 
    ------------------------------------ 
    2024.02.12 bbb 03:28:48.269 62.03014 
    2024.02.12 ghl 05:24:11.985 19.59907 
    2024.02.12 iah 11:03:44.330 8.724017 
    q)pageTable pageFilters[1] / page 1 etc 
    date sym time price 
    ------------------------------------ 
    2024.02.12 iog 07:02:54.903 77.50292 
    2024.02.12 kec 08:49:43.130 41.2317 
    2024.02.12 kgi 10:16:09.595 34.10485 
    q) 
    q)pageTable each pageFilters / page all filters

     

    https://code.kx.com/q/ref/dotq/#qind-partitioned-index
    https://code.kx.com/q/ref/dotq/#cn-count-partitioned-table
    https://code.kx.com/q/ref/dotq/#pn-partition-counts 

    Let me know if this works!

    Many thanks,

    Megan 

  • sidpatel

    Member
    February 19, 2024 at 12:00 am

    Hi Megan,

    Thanks for the prompt response. I will note this and try to integrate in our system.

    Regards,

    Sid

Log in to reply.