KX Community

Find answers, ask questions, and connect with our KX Community around the world.
KX Community Guidelines

Home Forums kdb+ Transform Specific Column Values Re: Transform Specific Column Values

  • rocuinneagain

    Member
    August 17, 2021 at 12:00 am

    ssr is a much slower operation than the above examples.
    Using wildcards such as “*#” is also difficult as it matches more than one suffix (.e.g “#” and “^#”).
    To work around this you can choose the longest matching suffix.

    Using ‘like’ and ‘@’ where possible rather than ‘ssr’ as below is a big speed improvement.

    Searching for “*” is also difficult as it is a wildcard. Instead, I use tab “t” in it’s place.

    If “t” is possibly in your data you would need to change this for another character.

    symbology:.Q.id ("****";enlist ",")0:`:symbology.csv
    update searchNASDAQ:{"*",@[x;where x="*";:;"t"]} each NASDAQ from `symbology
    func:{s:string x;
     m:select from symbology where @[s;where s="*";:;"t"] like/:searchNASDAQ;
     l:max count each m`NASDAQ;
     c:first exec CMS from m where l=count each NASDAQ;
     `$$[c~();s;(neg[l]_s),c]
    };

    In a test it does seem to operate correctly.

    Overall it still has risks as if bad data is sent in the function cannot truly validate what it is doing. Ideally you would have root and suffix separated by a known delimiter such as a space ” ” in the source data.

    -6 sublist {([] symNASDAQ:n;symCMS:func each n:`$"AAPL",/:x)}symbology`NASDAQ
    symNASDAQ symCMS 
    ------------------
    AAPL# AAPLWI 
    AAPL^# AAPLRTWI
    AAPL-# AAPLPRWI
    AAPL.A# AAPLAWI 
    AAPL+# AAPLWSWI
    AAPL~ AAPLTEST

    The extra complexity does come at a cost of speed.

    \ts func each 10000#`$"AAPL+#"
    41 553776

    symbologyOld:`NASDAQ xkey .Q.id (“****”;enlist “,”)0:`:symbology.csv
    \ts {s:string x;`$(4#s),symbologyOld[4 _ s]`CQSSuffix} each 10000#`$”AAPL+#”
    12 554448
    \ts {s:string x;r:first where not s in .Q.A;`$(r#s),symbologyOld[r _ s]`CQSSuffix} each 10000#`$”AAPL+#”
    16 554464

    As you used in your original example .Q.fu is a great tool when performing an intensive task repeatedly.

    Extreme example with only one unique input:

    \ts .Q.fu[func each] 10000#`$"AAPL+#"
    0 394032

    One limitation of .Q.fu is that it has no memory between executions.

    One example of a way to bypass this would be a memory cache.

    The library https://github.com/gitrj95/q-memo could be used.

    In this use-case your function is fast enough that the cache is overkill and should not actually be used but would be useful for very slow operations you may have to run many times.

    l memo.k
    .memo.init[`.;10000h] //Create a cache with 10k limit
    .memo.mk[`func;`memoFunc;`cache.0] //Create a wrapped function to use the cache

    ts memoFunc each 10000#`$(“AAPL~”;”AAPL+#”)
    126 553936

    //The cache pre stores the results
    cache.0
    f a                | r 
    -------------------| ---------
    ::                 | :: 
    ..memoFunc ,`AAPL~ | `AAPLTEST
    ..memoFunc ,`AAPL+#| `AAPLWSWI