Pagination on query from on-disk/partitioned data (HDB)

kdb Insights

Pagination on query from on-disk/partitioned data (HDB)

Posted by sidpatel on February 14, 2024 at 12:00 am
Hello All,

Paginating through a table when querying in batches is quite a useful feature in SQL-like DBs. Understand that kdb being a column-based database, it is not the same expected behaviour as in SQL.

In kdb+, however, there are a couple of similar implementations:
- limit-offset like behaviour using syntax shown here: https://code.kx.com/q/ref/select/#limit-expression
- using a virtual column i, which refers to the row number, as shown here: https://code.kx.com/q4m3/9_Queries_q-sql/#9332-the-virtual-column-i-in-where
The above 2 options have their own limitations. The first one only works on in-memory tables, and in the second one, any filters used are applied after row numbers (i) are selected, i.e. the number of required rows are filtered first and then the other where conditions are applied.

My use case: I am trying to do pagination over a HDB table that is partitioned on-disk.

One of the alternatives, although not ideal, I’ve found is to query the data from HDB first and once in memory, use the limit-offset like qsql query to achieve the pagination result, but this means, I have to query the entire dataset for every page. Other than creating a middle layer to store the entire data I need to page over in memory and create other APIs to query from this, I am happy to adopt any other more simpler approach.

Has anyone run into this problem and has found any solution or using any alternatives that they can share? Maybe its just me, but I feel like this could be a very helpful feature.

Thanks & Regards,

Sid
sidpatel replied 1 year, 2 months ago 2 Members · 2 Replies
2 Replies

megan_mcp

Administrator

February 14, 2024 at 12:00 am

Hi ,

A suggestion could be to potentially go with a 2 step approach where you first find what the indices are (per partition) for the filter clause being applied – so not pulling the full result into memory. Then, based on all the indices returned, split these up into appropriate pages:

q)n:1000000 
q)tbl:([]time:n?.z.t;sym:n?`3;price:n?100f) 
q).Q.dpft[`:.;;`sym;`tbl]@' 2024.02.12 2024.02.13 / write out two partitions `tbl`tbl 
q)delete tbl from `. `. 
q)l . 
q)s:-100?sym / filter on 100 random symbols 
q) q)pages:10 
q)pageFilters:ungroup select idx:{ceiling[count[x]%y] cut x}[x;pages] by date from select date,i from tbl where sym in s / simple where clause used here on sym 
q) 
q).Q.cn tbl 
q).Q.pn tbl| 10000000 10000000 
q) 
q)pageTable:{.Q.ind[tbl;(sum .Q.pn[`tbl] where date=x[`date])+x[`idx]]} 
q)pageTable pageFilters[0] / page 0 
date sym time price 
------------------------------------ 
2024.02.12 bbb 03:28:48.269 62.03014 
2024.02.12 ghl 05:24:11.985 19.59907 
2024.02.12 iah 11:03:44.330 8.724017 
q)pageTable pageFilters[1] / page 1 etc 
date sym time price 
------------------------------------ 
2024.02.12 iog 07:02:54.903 77.50292 
2024.02.12 kec 08:49:43.130 41.2317 
2024.02.12 kgi 10:16:09.595 34.10485 
q) 
q)pageTable each pageFilters / page all filters

https://code.kx.com/q/ref/dotq/#qind-partitioned-index
https://code.kx.com/q/ref/dotq/#cn-count-partitioned-table
https://code.kx.com/q/ref/dotq/#pn-partition-counts

Let me know if this works!

Many thanks,

Megan

sidpatel

Member
February 19, 2024 at 12:00 am

Hi Megan,

Thanks for the prompt response. I will note this and try to integrate in our system.

Regards,

Sid

KX Community

Pagination on query from on-disk/partitioned data (HDB)

megan_mcp

sidpatel