KX Community

Find answers, ask questions, and connect with our KX Community around the world.
KX Community Guidelines

Home Forums kdb+ xbar & timestamp manipulation

  • xbar & timestamp manipulation

    Posted by jp on November 16, 2022 at 12:00 am

    Hi,

    I have a minute <DT(timestamp),O,H,L,C> table <tt>, from which I need to aggregate/process data for different intervals. In my case, as I need the equivalent of a <binr> (eg. hourly data from 10:00 to 11:00 must show on the output DT line yyyy.mm.ddD11:00:00), I use an offset value in the <select> statement.

    No problem for any interval <= daily and weekly, however I don’t understand what is going on for a ‘non-standard’ 2-day interval. Below is some basic data around 2013 new-year (01/01/13 is a Tuesday).

    /Daily interval... OK 
    10#select first O,last C by DT:(1 xbar DT.date)+0D16:00 from tt 
    DT                           | O       C 
    -----------------------------| --------------- 
    2012.12.31D16:00:00.000000000| 1405.22 1425.69 
    2013.01.02D16:00:00.000000000| 1426.19 1461.36 
    2013.01.03D16:00:00.000000000| 1462.42 1459.07 
    2013.01.04D16:00:00.000000000| 1459.37 1466.1 
    2013.01.07D16:00:00.000000000| 1466.47 1461.77 
    2013.01.08D16:00:00.000000000| 1461.89 1457.05 
    2013.01.09D16:00:00.000000000| 1457.15 1461.04 
    2013.01.10D16:00:00.000000000| 1461.02 1471.99 
    2013.01.11D16:00:00.000000000| 1472.12 1471.71 
    2013.01.14D16:00:00.000000000| 1472.05 1470.79 
    
    /2-Day interval... odd output 
    5#select first O, last C by DT:(2 xbar DT.date)+1D16:00 from tt 
    DT                           | O       C 
    -----------------------------| --------------- 
    2012.12.31D16:00:00.000000000| 1405.22 1425.69 /ok 
    2013.01.02D16:00:00.000000000| 1426.19 1459.07 /close?... should be 1461.36 (from 01.02) 
    2013.01.04D16:00:00.000000000| 1459.37 1466.1 /open? ... should be 1462.42 (from 01.03) 
    2013.01.06D16:00:00.000000000| 1466.47 1461.77 /Sunday data? taken from following Monday? 
    2013.01.08D16:00:00.000000000| 1461.89 1461.04 /open? ... should be 1466.37 (from 01.07)

    Questions:
    – Isn’t <xbar> expected to use ONLY the available DT.dates to build its bins, like it does for minute/hourly/daily/weekly intervals?

    – How should the <select> be built to yield, for either one of the following cases:

    a) if using 2-day buckets of calendar days, including when market is closed (no data): 
    DT                           | O       C 
    -----------------------------| --------------- 
    2012.12.31D16:00:00.000000000| 1405.22 1425.69 /from 12.31 to 01.01 (nothing on 01.01, so shows on 12.31) 
    2013.01.03D16:00:00.000000000| 1426.19 1459.07 /from 01.02 to 01.03 
    2013.01.04D16:00:00.000000000| 1459.37 1466.1 /from 01.04 to 01.05 (nothing on 01.05, so shows on 01.04) 
    2013.01.07D16:00:00.000000000| 1466.47 1461.77 /from 01.06 to 01.07 (nothing on 01.06) 
    2013.01.09D16:00:00.000000000| 1461.89 1461.04 /from 01.08 to 01.09 NOTE: In the above example, 2012.12.31 is 'day 0'; if 2012.12.31 is 'day 1', obviously different results, but same concept would apply 
    
    b) OR... what I would prefer, if using 2-day buckets of available data: 
    DT                           | O       C 
    -----------------------------| --------------- 
    2013.01.02D16:00:00.000000000| 1405.22 1461.36 /from 12.31 to 01.02 (nothing on 01.01) 
    2013.01.04D16:00:00.000000000| 1462.42 1466.1 /from 01.03 to 01.04 
    2013.01.08D16:00:00.000000000| 1466.47 1457.05 /from 01.07 to 01.08 (01.05 and 06 is a week-end) 
    2013.01.10D16:00:00.000000000| 1457.15 1471.99 /from 01.09 to 01.10 
    2013.01.14D16:00:00.000000000| 1472.12 1470.79 /from 01.11 to 01.14 (01.12 and 13 is a week-end)

    Unless there is a relatively standard way to present data for ‘n-day’ intervals that I am unaware of, either one would be fine with me, as long as I understand how it works(!)

    Thx
    JP

     

    jp replied 8 months, 2 weeks ago 2 Members · 2 Replies
  • 2 Replies
  • cillianreilly

    Member
    November 16, 2022 at 12:00 am

    I’m not sure what your complete data set looks like – but using a best guess, I get an expected result.

    q)tt:([]DT:`timestamp$2012.12.31 2013.01.02 2013.01.03 2013.01.04 2013.01.07 2013.01.08 2013.01.09 2013.01.10 2013.01.11 2013.01.14;O:1405.22 1426.19 1462.42 1459.37 1466.47 1461.89 1457.15 1461.02 1472.12 1472.05;C:1425.69 1461.36 1459.07 1466.1 1461.77 1457.05 1461.04 1471.99 1471.71 1470.79) 
    q)select first O,last C by DT:(2 xbar DT.date)+1D16:00 from tt 
    DT                           | O       C 
    -----------------------------| --------------- 
    2013.01.01D16:00:00.000000000| 1405.22 1425.69 // open 31st, close 31st 
    2013.01.03D16:00:00.000000000| 1426.19 1459.07 // open 2nd, close 3rd 
    2013.01.05D16:00:00.000000000| 1459.37 1466.1 // open 4th, close 4th 
    2013.01.07D16:00:00.000000000| 1466.47 1461.77 // open 7th, close 7th 
    2013.01.09D16:00:00.000000000| 1461.89 1461.04 // open 8th, close 9th 
    2013.01.11D16:00:00.000000000| 1461.02 1471.71 // open 10th, close 11th 
    2013.01.15D16:00:00.000000000| 1472.05 1470.79 // open 14th, close 14th

    I also don’t see how you can have 2012.12.31D16:00:00 in both results (1 day/2 day buckets), as it shouldn’t appear in the 2-day bucket query (unless you have some other row dated 2012.12.30 in your table you’re not showing us?)

    q)(2 xbar 2012.12.31)+0D16:00 1D16:00 
    2012.12.31D16:00:00.000000000 2013.01.01D16:00:00.000000000

    This looks like some data quality issue somewhere rather than issue with xbar – if you can provide a small stand alone sample of your data that replicates the problem, I can take a look using that.

  • jp

    Member
    November 16, 2022 at 12:00 am

    Feeling embarrassed in here… This is entirely my mistake; no data missing, but faulty statement transcription in the forum’s editor, while playing with the offset. The 2-day interval statement select does indeed yield the correct output.

    Thank you for your time.

Log in to reply.