Replies – Forums – darrenwsun

darrenwsun

Member

March 19, 2024 at 11:01 am in reply to: What is a script? What is a library? How should I structure my project?

> when you are debugging and want to see all references to a function, how are you achieving this? Almost all q IDEs will be of no help

There are actually plenty of IDEs that supports looking up variable/function references, for example Intellij IDEA with plugin KdbInsideBrains or q.

darrenwsun

Member

March 19, 2024 at 11:01 am in reply to: How long should a name be?

As a dev using this language and a practitioner of the technology, I’m proud of it’s unprecedented performance and expressiveness in manipulating data. But when someone from other tech background complain about the steep learning curve and difficulty in reading (most) q code, I usually remain silent: I felt the pain and sometimes I still feel it by “how much succinct one can do something in q with very short names and chaining so many expressions.”

Don’t get me wrong – the language is beautiful, especially when looked from a mathematical context. It’s just that q lives in an era/ecosystem where a complex system usually involves multiple languages/stacks and the others adopt a very different perspective towards what it means by readability/clean.

darrenwsun

Member

June 15, 2023 at 12:00 am in reply to: md5 – getting the original string back.

I don’t think the reverse is readily available in q. The md5 encoding is designed to be difficult (not impossible though as hardware develops and new algorithms are found) to reverse, and there won’t be a single answer about the original input.

I assume you have a column of such md5 encoded values stored and you wish to get the original values. Either the original values should also be persisted, or if they are not supposed to be persisted as plain text (say they are passwords), then perhaps we’re not supposed to decode them anyway.

darrenwsun

Member

June 13, 2023 at 12:00 am in reply to: How does nested columns/lists fragment memory?

Note that the second query generates a table with compound/nested columns qty and price.

I don’t know why the second .Q.gc call takes significantly longer, given that the memory usage of the two queries are comparable (although the second takes slightly more). But I don’t think it’s relevant to fragmented memory; after all, the space of whole temporary result is released rather than part of it, as is the case from the .Q.gc doc. My suspect is that it just takes longer to garbage collection when it involves nested columns (aka lists of vectors) than simple columns (aka vectors).

darrenwsun

Member

June 13, 2023 at 12:00 am in reply to: How does nested columns/lists fragment memory?

For 2, precisely it depends. If we tweak the example to let the first element to be an integer (or any other value of atomic type), we will see some effective garbage collection.

q)v:{(10;10000#"b")} each til 100000 
q).Q.w[] 
used| 1643008048 
heap| 1677721600 
peak| 1677721600 
wmax| 0 mmap| 0 
mphy| 7978725376 
syms| 665 
symw| 28405 

q).glob.t:([]a:`long$()) 
q)`.glob.t upsert flip enlist[`a]!enlist v[;0] 
`.glob.t 

q).Q.gc[] 
1543503872 

q).Q.w[] 
used| 1406896 
heap| 134217728 
peak| 1677721600 
wmax| 0 
mmap| 0 
mphy| 7978725376 
syms| 668 
symw| 28496

My explanation towards the different behaviors is that in the above example, since v[;0] is an int vector, its elements have to be in consecutive memory and thus it is a value copy from the original list. As such deleting v allows recycling the memory taken by v. While with the earlier example, v[;0] is a list of “references” to the elements in the original list v, so deleting v doesn’t remove all references to the memory blocks used by v (the references are now in .glob.t).

For 3, the shorter answer is that kdb uses copy-on-write.

darrenwsun

Member

June 6, 2023 at 12:00 am in reply to: Running user defined aggregation on partitioned tables

To my knowledge, the best way you can do is to fetch the data into memory and apply the custom aggregate function to the retrieved data, like the below

update percentile:getPercentile price from select sym, price from trade where date>=.z.d-7 // and other filters

Certain function like sum and prd can reduce in a memory-effective way as they don’t need to keep the original values from each partition. However med or your getPercentile function cannot.

darrenwsun

Member

January 30, 2023 at 12:00 am in reply to: What does it actually mean to reload a HDB? why is this needed?

The reloading HDB part is as if “\l .” is run by HDB. This step for a partitioned database, among other things, loads serialized objects of the current directory (e.g. sym file) and caches things like splayed tables in the latest partition. It is necessary to make HDB pick up changes introduced by an external process, e.g. a new symbol may be introduced while saving the tables.

The relevant doc is https://code.kx.com/q/basics/syscmds/#l-load-file-or-directory

darrenwsun

Member

January 30, 2023 at 12:00 am in reply to: Why are feedhandlers not usually kdb?

Short answer, to my understanding, is that feed handler isn’t quite a good use case for KDB.

A feed handler needs to deal with data sources of various forms, and more generic languages like Java and C++ usually have richer support for those. Another thing to consider is parallel processing of those data feeds, which will be cumbersome if done in KDB given that in a usual setup, there is only one main thread processing incoming requests and the parallelism would call for a cluster of KDB processes.

darrenwsun

Member

September 17, 2022 at 12:00 am in reply to: .Q.qp for splayed table

Thanks @sujoy13. What confused me is “directory of a splayed table”, which I interpreted as directory t as in your example.

A follow-up question: what is the real runtime difference between l . and l t for this particular example? The latter form also maps the table without copying data from disk to memory. By the way t is layed out on the disk, it is a splayed table…

darrenwsun

Member

August 23, 2022 at 12:00 am in reply to: Unpack nested column in table

A generalized solution that doesn’t use hardcoded column names (provided as input) or assume the the cells in the nested column having the same length (padded with null for those with shorter length).

unpack:{[t;c] maxLen:max count each t[c]; 
              newCols:`$string[c] ,/: string 1+til maxLen; // concatenate the parts other than the specified column, with the unpacked parts 
             //(x;::;y) is the parsed form of x[;y] 
              ![t; (); 0b; enlist c] ,' ?[t; (); 0b; newCols!{(x;::;y)}'[c;til maxLen]] 
}

darrenwsun

Member

August 19, 2022 at 12:00 am in reply to: KDB table with an array as element

Both works, as (()) is interpreted the same as (). For this reason the first form is preferred. Note that the data type of price is determined by the data type of cell in the first row.

q)t:([] time:`time$(); price:()) 
q)meta t 
c    | t f a 
-----| ----- 
time | t 
price| 

q)`t upsert (.z.t; 10 11f) `t 
q)t 
time price 
------------------ 
21:42:07.028 10 11 

q)meta t 
c    | t f a 
-----| ----- 
time | t 
price| F 

q)`t upsert (.z.t; 10 11) `t 
q)t 
time price 
------------------ 
21:42:07.028 10 11 
21:44:30.724 10 11 

q)meta t 
c    | t f a 
-----| ----- 
time | t 
price| F

darrenwsun

Member

July 1, 2022 at 12:00 am in reply to: How to make externally-changed files tracked by Kx Developer

Thanks David, kindly let me know once you hear back.

I like many of the features from Kx Developer (notably qcumber and qdoc), but it being a web IDE isn’t on the list…

darrenwsun

Member

June 30, 2022 at 12:00 am in reply to: select avg (ignore null)

Thanks David for following up.

Ignoring null makes perfect sense, especially for avg/max/sum. Imagine otherwise, the result would be null as long as the input contains one null, effectively forcing everyone to explicitly coalesce.

I understand such change is not backward compatible, and given the ability to manually exclude null like what you shared, I’d agree it’s not worth doing it. However I’m curious if we could leave legacy burden behind, what would be the decision from language designers.

darrenwsun

Member

June 10, 2022 at 12:00 am in reply to: publish data to kdb uisng qpython

1. Doing this via qpython is no different from those via other interfaces (including q itself): you open a connection (e.g. to tickerplant, assuming a tp+rdb+hdb setup) and call a function that will get your data persisted (e.g. .u.upd[`trade; data]).

2. Generally speaking, backpopulating historical data in a right way isn’t straightforward: it could be done as simple as splaying data using set, one for each partition (date), if you have complete data to write; if you’ve got partial data to write (e.g. trades for a new instrument type), you may have to re-sort the combined data and re-apply attributes wherever needed. The q language itself is short of higher-level APIs that handles the nuances, but I find TorQ has a good collections of those, which makes the task easier.

darrenwsun

Member

May 24, 2022 at 12:00 am in reply to: What is the role played by key columns in a keyed table [ query join/performance ] ?

With qsql, the full column is searched before the result is presented. With key lookup, the search stops when it finds the first match. This is where the performance gain from key lookup comes from.

> The keys are basically nodes on a BST ( binary search tree ) which should look up in O(log(n)) time given n is size of the table

This isn’t true. KDB’s keyed table, or more generally a plain dictionary, does not use hashing techniques like Java’s HashMap. Lookup is done by searching through the table/list linearly. If one wishes less time for the lookup, grouped (conceptually the same as index) or parted (conceptually a space-optimized index that assumes data are sorted) attributes help.

darrenwsun

Forum Replies Created