Forum Replies Created

Page 13 of 21
  • rocuinneagain

    Member
    March 22, 2023 at 12:00 am in reply to: KDB+ and hyperthreading

    Hyperthreading does not cause problems for kdb+ but you should be aware of your core allocations to get the most out of your kdb+ licence.

    Example:

    • I have a 4 core 8 thread machine.
    • I have a 4 core kdb+ licence.
    • How best to allocate my licence?

    Your machine will list cores 0,1,2,3,4,5,6,7.

    You want to make sure you use your 4 core licence only once on each logical CPU core for best performance.

    Inspecting /proc/cpuinfo gives you the information you need.

     

    cpuinfo:.Q.id {{{(`$x[0])!x[1]}flip {ssr[;"t";""] each trim ":" vs x}each x y}[x] each {{x[0]+til 1+x[1]-x[0]}each flip (0^1+prev x;-1+x)}where x~:""}system"cat /proc/cpuinfo" select processor,physicalid,siblings,coreid,cpucores from cpuinfo 
    processor physicalid siblings coreid cpucores 
    --------------------------------------------- 
    ,"0" ,"0" ,"8" ,"0" ,"4" 
    ,"1" ,"0" ,"8" ,"0" ,"4" 
    ,"2" ,"0" ,"8" ,"1" ,"4" 
    ,"3" ,"0" ,"8" ,"1" ,"4" 
    ,"4" ,"0" ,"8" ,"2" ,"4" 
    ,"5" ,"0" ,"8" ,"2" ,"4" 
    ,"6" ,"0" ,"8" ,"3" ,"4" 
    ,"7" ,"0" ,"8" ,"3" ,"4"

     

     

    Here the output suggests I should use taskset 0,2,4,6 or 1,3,5,7 so that I only reuse each coreid once.

    This way I am getting best usage from your kdb+ licence as each one is taxing a full core without overlap.

    It’s possible on another OS/machine the layout may suggest 0-3 or 4-7.

     

    On this example if only kdb+ is running on the machine you may see a fractional perf benefit from disabling hyperthreading. It can depend on various hardware and workload variables.

     

    Lots of discussion available online on the topic:

    https://unix.stackexchange.com/questions/57920/how-do-i-know-which-processors-are-physical-cores

     

    Note: some extra checks and verifications should be done if using Virtual Machines as they may not pass through all correct detailed information through to the VM and you may need to verify at the hypervisor level.

     

    If you had unlimited cores licence the answer is a little more nuanced and use case specific in how you may choose to taskset your processes. Or you may choose to let not pin individual processes to cores but instead let the OS move tasks around for best efficiency.

    For machines heavily taxing IO hyperthreading can help here specifically with reading of data from disk.

     

    Once on to multi CPU systems NUMA also comes in to play and should be paid attention to.

    https://code.kx.com/q/kb/linux-production/#non-uniform-memory-access-numa-hardware

     

     

     

  • rocuinneagain

    Member
    March 16, 2023 at 12:00 am in reply to: Type error from Q chk

    There’s a thread with some info on stackoverflow:

    https://stackoverflow.com/questions/67272148/q-chk-gives-a-type-error-when-ran-on-a-hdb-how-can-i-debug-or-what-should-i

     

    Running strace -p pid may help show where .Q.chk is falling down.

     

    This is a basic outline of a helper function I created before to debug a part error when trying to load a HDB.

    It may be a good starting point for finding the root cause issue for you too.

    // https://code.kx.com/q/ref/system/#capture-stderr-output
    q)tmp:first system"mktemp" 
    q){d:1_string x;{y:string y;(y;"D"$y),{r:system x;show last r;$[1~"J"$last r;(1b;-1_r;"");(0b;();-1_r)]} "ls ",x,"/",y," > ",tmp," 2>&1;echo $? >> ",tmp,";cat ",tmp}[d] each key x} `:badHDB 
    q)tab:flip `part`date`osError`files`error!flip {d:1_string x;{y:string y;(y;"D"$y),{r:system x;$[0~"J"$last r;(0b;-1_r;"");(1b;();first r)]} "ls ",x,"/",y," > ",tmp," 2>&1;echo $? >> ",tmp,";cat ",tmp}[d] each key x} `:badHDB
    The resulting table:
    part          date       osError files   error 
    ----------------------------------------------------------------------------------------------------------- 
    "2001.01.01"  2001.01.01 0       ,"tab1" "" 
    "2001.01.01"             0       ,"tab2" "" 
    "2002.01.01"  2002.01.01 1       ()      "ls: cannot open directory 'badHDB/2002.01.01': Permission denied"
    For a larger HDB filter down to partitions with issues:
    select from tab where or[null date;osError]

     

  • rocuinneagain

    Member
    March 15, 2023 at 12:00 am in reply to: ODBC setup for KDB+ to query other databases

    ODBC:

    JDBC:

     

    EmbedPy is also an option which has a lot of flexibility.

    Some example of Pandas/PyODBC/SQLAlchemy usage below

     

    // https://github.com/KxSystems/embedPy 
    system"\l p.q"; 
    // https://github.com/KxSystems/ml 
    system"\l ml/ml.q"; 
    .ml.loadfile`:init.q; 
    odbc:.p.import[`pyodbc]; 
    pd:.p.import[`pandas]; 
    connectString:";" 
    sv {string[x],"=",y}(.)/:( (`Driver;"{ODBC Driver 17 for SQL Server}"); (`Server;"server.domain.com\DB01"); (`Database;"Data"); (`UID;"KX"); (`PWD;"password") ); 
    connSqlServer:odbc[`:connect][connectString]; 
    data:.ml.df2tab pd[`:read_sql]["SELECT * FROM tableName";connSqlServer]; 
    cursor:connSqlServer[`:cursor][]; 
    cursor[`:execute]["TRUNCATE FROM tableName"]; 
    connSqlServer[`:commit][]; 
    sa:.p.import`sqlalchemy; 
    engine:sa[`:create_engine]["mssql+pyodbc://KX:password@server.domain.com\DB01/Data?driver=ODBC+Driver+17+for+SQL+Server"]; 
    df:.ml.tab2df[data]; //Data to publish 
    df[`:to_sql]["tableName";engine; `if_exists pykw `append;`index pykw 0b];

     

     

  • rocuinneagain

    Member
    March 14, 2023 at 12:00 am in reply to: SSL_CIPHER_LIST RESET ciphers configured.

    Are all the ciphers you added available on the machine?

    Are they all contained in:

    /usr/bin/openssl ciphers -v
  • rocuinneagain

    Member
    February 10, 2023 at 12:00 am in reply to: kdb+ and websockets

    To make reusable as more of a library it’d need a few extensions.

    Like if more than one / in the URL.

    This would work with 1 or more:

    q)wsConn:{i:first where "/"=x;x:(i#x;i _x);(`$":ws://",x 0;"GET ",x[1]," HTTP/1.1rnHost: ",x[0],"rnrn")} 
    q)wsConn"localhost:8000/ws-replay/somethingelse?exchange=bitmex&from=2019-10-01&to=2019-10-02" 
    `:ws://localhost:8000 
    "GET /ws-replay/somethingelse?exchange=bitmex&from=2019-10-01&to=2019-10-02 HTTP/1.1rnHost: localhost:8000rnrn"

    Would also need to account for

    a) No / in the URL

    b)  Embedded / in the options (if allowed)

  • rocuinneagain

    Member
    February 7, 2023 at 12:00 am in reply to: kdb+ and websockets

    q connection:

    q)wsConn:{x:”/” vs x;h:(`$”:ws://”,x 0) “GET /”,x[1],” HTTP/1.1rnHost: “,x[0],”rnrn”;-1 h 1; h 0} q)wsConn”localhost:8000/ws-replay?exchange=bitmex&from=2019-10-01&to=2019-10-02″ HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade Sec-WebSocket-Accept: HSmrc0sMlYUkAGmm5OPpG2HaGWk= Date: Tue, 07 Feb 2023 12:12:27 GMT Server: Python/3.8 websockets/10.4 796i

    Basic Python websocket server:

    import asyncio 
    import logging 
    import websockets 
    logging.basicConfig( format="%(message)s", level=logging.DEBUG, ) 
    async def handler(websocket, path): 
        data = await websocket.recv() 
        reply = f"Data recieved as: {data}!" 
        await websocket.send(reply) 
        start_server = websockets.serve(handler, "localhost", 8000) 
        asyncio.get_event_loop().run_until_complete(start_server) 
        asyncio.get_event_loop().run_forever()

    Produces logs:

    = connection is CONNECTING < GET /ws-replay?exchange=bitmex&from=2019-10-01&to=2019-10-02 HTTP/1.1 < Host: localhost:8000 < Connection: Upgrade < Upgrade: websocket < Sec-WebSocket-Version: 13 < Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw== > HTTP/1.1 101 Switching Protocols > Upgrade: websocket > Connection: Upgrade > Sec-WebSocket-Accept: HSmrc0sMlYUkAGmm5OPpG2HaGWk= > Date: Tue, 07 Feb 2023 12:15:48 GMT > Server: Python/3.8 websockets/10.4 connection open = connection is OPEN

     

  • rocuinneagain

    Member
    February 3, 2023 at 12:00 am in reply to: How to instantly recognize schemas added to Ticker in HDB

    If you only have partitioned tables in your HDB you could load in the ticker schema file followed by reloading the HDB. Any existing tables in the HDB will be reloaded as normal and for your remaining in memory only tables loaded from the schema file you can add a dummy date column as a placeholder.

     

    \l sym.q l /path/to/HDB {update date:.z.d from x} each tables[] where not tables[] in .Q.pt

     

    Really in practice I would avoid this.

     

    The best way is to just add empty tables for the new table to the DB and then call l again as normal

    https://github.com/KxSystems/kdb/blob/master/utils/dbmaint.md#addtable

    You will want this anyway long term to prevent queries failing if older partitions are missing.

     

  • rocuinneagain

    Member
    February 3, 2023 at 12:00 am in reply to: SSL_CIPHER_LIST RESET ciphers configured.

    I see 31 by default on my machine but exporting the variable I see I can control it down to 3

    https://code.kx.com/q/kb/ssl/#tls-cipher-list

    $ q KDB+ 4.0 2021.07.12 Copyright (C) 1993-2021 Kx Systems 
    q)count ":" vs string (-26!)[]`SSL_CIPHER_LIST 
    31 
    $ export SSL_CIPHER_LIST="TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256" $ 
    q KDB+ 4.0 2021.07.12 Copyright (C) 1993-2021 Kx Systems 
    q)count ":" vs string (-26!)[]`SSL_CIPHER_LIST 
    3

    Can you replicate this on your system?

  • If you wanted to touch the files as little as possible you could work out ranges and overlaps

    //Create a spec ensuring to include some overlaps and gaps to test code better 
    q)spec:flip `inst`startDate`endDate!(`A`B`C;2022.01.01 2022.02.01 2022.06.01;2022.03.31 2022.04.30 2022.07.31) 
    q)spec 
    inst startDate endDate 
    -------------------------- 
    A 2022.01.01 2022.03.31 
    B 2022.02.01 2022.04.30 
    C 2022.06.01 2022.07.31 
    
    //Explode the ranges into their individual dates 
    q)ranges:ungroup select inst,date:startDate+til each 1+endDate-startDate from spec 
    
    //Now regroup to gather instruments by date 
    q)ranges:0!select inst by date from ranges 
    
    //Find cases where there are gaps or the instruments change 
    q)ranges:update dDate:deltas2024,dInst:differ inst from ranges 
    
    //Grab the indexes from your ranges table for the beginning and ending of our needed queries 
    q)rInds:{-1_x,'-1+next x}(exec i from ranges where (dDate>1) or dInst),count ranges 
    q)rInds 
    0 30 31 89 90 119 120 180 
    
    //Pulling out the indexes you can see each pair of rows forms a query 
    q)select from ranges where i in raze rInds 
    date inst dDate dInst 
    --------------------------- 
    2022.01.01 ,`A 8036 1 
    2022.01.31 ,`A 1 0 
    2022.02.01 `A`B 1 1 
    2022.03.31 `A`B 1 0 
    2022.04.01 ,`B 1 1 
    2022.04.30 ,`B 1 0 
    2022.06.01 ,`C 32 1 
    2022.07.31 ,`C 1 0 
    
    //Sample table of data 
    q)trades:([] date:2022.01.01+til 365;sym:365?`A`B`C)
    
    //Build and execute functional selects to query from disk 
    q)result:raze {?[`trades;((within;`date;x`date);(in;`sym;enlist x[`inst]0));0b;()]} each ranges each rInds 
    
    //Visually inspect result to see months and instruments pulled 
    q)select count i by sym,date.month from result 
    sym month | x 
    ----------| -- 
    A 2022.01 | 8 
    A 2022.02 | 11 
    A 2022.03 | 12 
    B 2022.02 | 8 .
    B 2022.03 | 9 
    B 2022.04 | 10 
    C 2022.06 | 15 
    C 2022.07 | 11

     

  • kdb+ is already automatically using multithreaded map-reduce for each of your subqueries without any changes as long as you started your process with -s

    https://code.kx.com/q/wp/multi-thread/#map-reduce-with-multi-threading

    //I'll create a test HDB to test against - you can use real data instead 
    .z.zd:17 2 6 //Turning compression on 
    n:1000000 //Write some partitions 
    {.Q.dd[`:HDB;x,`tab`] set .Q.en[`:HDB] 
    ([] sym:`p#asc n?`3;p:n?100.0;t:n?1000)} each 2022.01.01 + til 365 //Load the HDB l HDB //q was started with -s 4 for 4 secondary threads 
    q)system"s" 4i //Run our sample query 100 times for each of 0-4 secondary threads available 
    q)results:([] secondaryThreads:til 1+system"s"; 
    timings:{system"s ",string x;value"\t:100 select from tab where date within 2022.01.01 2022.03.31,sym=`aaa"}each til 1+system"s") //Results show that the threads are being used and our query is speeding up with more threads available 
    q)results secondaryThreads 
    timings 
    ------------------------ 
    0 12153 
    1 12225 
    2 7919 
    3 6222 
    4 5767

    You can see the benefit of threads by running through some sample queries

     

     

    If you wanted you could experiment with using the threads per subquery rather within each subquery.

    This would only require you change each to peach on your existing code to have threads execute each subquery

    result: raze loadContract peach spec

    For your query though I expect the existing code is probably the better balance of compute and memory.

  • rocuinneagain

    Member
    January 25, 2023 at 12:00 am in reply to: Most optimum way to search a column containing a list of values

    Can you make up a sample table with a few rows of data and share the q code?

    It’ll help people understand your use-case and make it easier for them to answer the question.

    You can see details on how to include a code block:

    https://community.kx.com/t5/Community-Blogs/Creating-q-code-block-with-syntax-highlighting-in-a-message/ba-p/11094

     

    1. What datatype are you using for ids?

    2. How many rows do you expect to be in the final table?

    3. Will the table live on disk or in memory?

    4. Will it receive a lot of updates or be mostly static?

    5. What will the average length of a chain be?

    6. You name the column ‘child’, so each node can only have one child? Or would  ‘children’ make more sense? as a list?

     

    One basic unoptimized example using .z.s self to get the parenting chain of each row

     

     

    q)tree:([] row:til 5;parent:0N 0 0 1 1) 
    q)tree 
    row parent
    ----------
    0
    1 0
    2 0
    3 1
    4 1
    
    q)getChain:{[c;r] $[null p:tree[r]`parent;c,p;.z.s[c,p;p]]} 
    q)getChain[();4] 
    1 0 0N 
    q)update chain:getChain[()] each row from tree 
    row parent chain
    -----------------
    0          ,0N
    1   0      0 0N
    2   0      0 0N
    3   1    1 0 0N
    4   1    1 0 0N

     

     

    These links to existing resources may also be of use:

  • rocuinneagain

    Member
    January 20, 2023 at 12:00 am in reply to: Installation for ARM64

    q may have been quarantined by the OS. Can you try to run :

    xattr -d com.apple.quarantine path/to/m64/q

     

  • rocuinneagain

    Member
    January 17, 2023 at 12:00 am in reply to: Installation for ARM64

    There is a note on this page which will be of help

    https://code.kx.com/q/learn/install/#step-4-confirm-success

     

    “If the spctl and xattr commands above have not authorized the OS to run q, open System Preferences > Security & Privacy.

    You should see a notification that q has been blocked and a button to override the block.

     Safely open apps on your Mac”

  • rocuinneagain

    Member
    January 2, 2023 at 12:00 am in reply to: the advantage of random segmentation

    It depends on the data/access-patterns/hardware. May make sense in your use case but not in others.

    If you had a DB with a very large number of unique values in the `p# field then it can make sense to group them together to make the attribute more efficient and have the disk doing more sequential reads which are faster.

    There are some notes on estimating attribute overheads on:

    https://aquaq.co.uk/adventure-in-retrieving-memory-size-of-kdb-object/

     

    In some IoT schemas where there can be millions of unique identifiers even these approaches would not be enough and introducing a hash column/lookup can be needed to not have the DB be slowed by larger attributes and small random read reads.

  • rocuinneagain

    Member
    January 2, 2023 at 12:00 am in reply to: Beef with apply (@ and .)

    A more direct path to that dictionary in this case would be:

    .[metaTbl;(`ref;`m)]

     

Page 13 of 21