Partitioning Tables Intraday by Custom Fields?

kdb+

Partitioning Tables Intraday by Custom Fields?

Posted by leahs on November 4, 2021 at 12:00 am

Hello all,

Sharing a query from a client and would appreciate some feedback from the community:

Is it possible to partition data based on a time/date field other than the date in the TP log file?

Is this a feasible request and a feature that can be easily incorporated?

Looking forward to your input,

Leah

leahs replied 2 months ago 3 Members · 6 Replies
6 Replies

davidcrossey

Member
November 5, 2021 at 12:00 am
Hi Leah,

A kdb logfile typically has the form of (func;tabName;tabData), which can be as simple as f:{upsert[x;y]}. There isn’t anything special in terms of replaying the data into the kdb process, but more so to do with saving down the data.

Assuming you mean to save to another date, you could simply pull out a date from the data (after replay) and use it as a parameter for your save down logic.

If you mean at a higher level on partitioning the data differently across the entire database, data can be partitioned by any column of types date, month, year and int (1, 2), so depending on the requirements you could map values to a specific int based partition schema (3).

Hope this helps!

Kind regards,

David

References:
rocuinneagain

Member
November 5, 2021 at 12:00 am

Hi Leah,

This blog post will likely be of interest: https://kx.com/blog/partitioning-data-in-kdb/

It covers the basics of looking at hourly partitions and also fixed size partitions.

Regards,

Rian
rocuinneagain

Member
November 10, 2021 at 12:00 am

The Platform codebase is designed to write date partitioned HDB’s only.

The Intraday database – KX Platform exists to so that 24hrs of data does not need to be kept in memory, but it will again store to the HDB on a date partition only.
leahs

Member
November 10, 2021 at 12:00 am

Hi all,

The client has since provided feedback and has stated that their question was specifically aimed at DS_LOG_RECOVERY contained within KX Control.

They have already implemented their own version of intraday partitioning. However, they are wondering if DS_LOG_RECOVERY supports this?

In the instance parameters, they observed the following description under “date”:

“Date to save to in HDB, if 1970.01.01 then it takes the date from the TP logfiles”

Does this suggest that the DS_LOG_RECOVERY process supports or does not support intraday partitioning?

Any feedback here would be great.

Thanks,

Leah
leahs

Member
November 11, 2021 at 12:00 am

Hi all,

A follow up question here: is it possible to disable querying for a SINGLE table in the query router (disable instances with GW and QR)?

DS_LOG_REPLAY template uses these function to stop queries during writedown, but is it possible to do this on table by table basis to reduce downtime.

disableServers[];

// disable local services with QR

.ds.serv.disableLocal[.ds.serv.rdbClasses, .ds.serv.ihdbClasses, .ds.serv.hdbClasses];

Thanks,

Leah

rocuinneagain

Member

November 12, 2021 at 12:00 am

That would require for deep level changes to the codebase.

For example when a HDB is written to a new:

A new date folder is created.
Tables are written.
The HDB process is reloaded

If you reloaded the HDB before all tables were written you would have errors.

q)mkdir HDB 
q)cd HDB 
q)`:2021.01.01/tab1/ set ([] a:1 2 3) 
`:2021.01.01/tab1/ 
q)`:2021.01.01/tab2/ set ([] b:1 2 3) 
`:2021.01.01/tab2/ 
q)\l . 
q)select from tab1 
date       a 
------------ 
2021.01.01 1 
2021.01.01 2 
2021.01.01 3 

q)select from tab2 
date       b 
------------ 
2021.01.01 1 
2021.01.01 2 
2021.01.01 3 

q)`:2021.01.02/tab1/ set ([] a:4 5 6) 
`:2021.01.02/tab1/ 
q)\l . 
q)select from tab1 
date       a 
------------ 
2021.01.01 1 
2021.01.01 2 
2021.01.01 3 
2021.01.02 4 
2021.01.02 5 
2021.01.02 6 

q)select from tab2 
'./2021.01.02/tab2/b. 
OS reports: No such file or directory 
[0] select from tab2 ^

.Q.bv – would be one possible helpful extension which can in memory fill tables missing from partitions (using ` as argument to fill using first partition as a template)

q)\l HDB 
q)select from tab1 
date       a 
------------ 
2021.01.01 1 
2021.01.01 2 
2021.01.01 3 
2021.01.02 4 
2021.01.02 5 
2021.01.02 6 

q)select from tab2 //Table missing from latest partition is not found 
'tab2 
[0] select from tab2 ^ 
q).Q.bv` //Using ` first partition used as prototype and table is found 
q)select from tab2 
date       b 
------------ 
2021.01.01 1 
2021.01.01 2 
2021.01.01 3

(.Q.chk is unsuitable as it uses most recent partition as template to fills tables missing from partitions)

This is only one item that would be needed to implement what you asked.

Currently EOD is a single action for all tables, all code and processes involved would need updates to operate table by table. Meaning changes in RDB/IDB/HDB and others.

KX Community

Partitioning Tables Intraday by Custom Fields?

davidcrossey

rocuinneagain

rocuinneagain

leahs

leahs

rocuinneagain