

darrenwsun
Forum Replies Created
-
darrenwsun
MemberMarch 19, 2024 at 11:01 am in reply to: What is a script? What is a library? How should I structure my project?> when you are debugging and want to see all references to a function, how are you achieving this? Almost all q IDEs will be of no help
There are actually plenty of IDEs that supports looking up variable/function references, for example Intellij IDEA with plugin KdbInsideBrains or q.
-
As a dev using this language and a practitioner of the technology, I’m proud of it’s unprecedented performance and expressiveness in manipulating data. But when someone from other tech background complain about the steep learning curve and difficulty in reading (most) q code, I usually remain silent: I felt the pain and sometimes I still feel it by “how much succinct one can do something in q with very short names and chaining so many expressions.”
Don’t get me wrong – the language is beautiful, especially when looked from a mathematical context. It’s just that q lives in an era/ecosystem where a complex system usually involves multiple languages/stacks and the others adopt a very different perspective towards what it means by readability/clean.
-
I don’t think the reverse is readily available in q. The md5 encoding is designed to be difficult (not impossible though as hardware develops and new algorithms are found) to reverse, and there won’t be a single answer about the original input.
I assume you have a column of such md5 encoded values stored and you wish to get the original values. Either the original values should also be persisted, or if they are not supposed to be persisted as plain text (say they are passwords), then perhaps we’re not supposed to decode them anyway.
-
darrenwsun
MemberJune 13, 2023 at 12:00 am in reply to: How does nested columns/lists fragment memory?For 2, precisely it depends. If we tweak the example to let the first element to be an integer (or any other value of atomic type), we will see some effective garbage collection.
q)v:{(10;10000#"b")} each til 100000 q).Q.w[] used| 1643008048 heap| 1677721600 peak| 1677721600 wmax| 0 mmap| 0 mphy| 7978725376 syms| 665 symw| 28405 q).glob.t:([]a:`long$()) q)`.glob.t upsert flip enlist[`a]!enlist v[;0] `.glob.t q).Q.gc[] 1543503872 q).Q.w[] used| 1406896 heap| 134217728 peak| 1677721600 wmax| 0 mmap| 0 mphy| 7978725376 syms| 668 symw| 28496
My explanation towards the different behaviors is that in the above example, since v[;0] is an int vector, its elements have to be in consecutive memory and thus it is a value copy from the original list. As such deleting v allows recycling the memory taken by v. While with the earlier example, v[;0] is a list of “references” to the elements in the original list v, so deleting v doesn’t remove all references to the memory blocks used by v (the references are now in .glob.t).
For 3, the shorter answer is that kdb uses copy-on-write.
-
darrenwsun
MemberJune 13, 2023 at 12:00 am in reply to: How does nested columns/lists fragment memory?Note that the second query generates a table with compound/nested columns qty and price.
I don’t know why the second .Q.gc call takes significantly longer, given that the memory usage of the two queries are comparable (although the second takes slightly more). But I don’t think it’s relevant to fragmented memory; after all, the space of whole temporary result is released rather than part of it, as is the case from the .Q.gc doc. My suspect is that it just takes longer to garbage collection when it involves nested columns (aka lists of vectors) than simple columns (aka vectors).
-
darrenwsun
MemberJune 6, 2023 at 12:00 am in reply to: Running user defined aggregation on partitioned tablesTo my knowledge, the best way you can do is to fetch the data into memory and apply the custom aggregate function to the retrieved data, like the below
update percentile:getPercentile price from select sym, price from trade where date>=.z.d-7 // and other filters
Certain function like
sum
andprd
can reduce in a memory-effective way as they don’t need to keep the original values from each partition. Howevermed
or yourgetPercentile
function cannot. -
Short answer, to my understanding, is that feed handler isn’t quite a good use case for KDB.
A feed handler needs to deal with data sources of various forms, and more generic languages like Java and C++ usually have richer support for those. Another thing to consider is parallel processing of those data feeds, which will be cumbersome if done in KDB given that in a usual setup, there is only one main thread processing incoming requests and the parallelism would call for a cluster of KDB processes.
-
darrenwsun
MemberJanuary 30, 2023 at 12:00 am in reply to: What does it actually mean to reload a HDB? why is this needed?The reloading HDB part is as if “\l .” is run by HDB. This step for a partitioned database, among other things, loads serialized objects of the current directory (e.g. sym file) and caches things like splayed tables in the latest partition. It is necessary to make HDB pick up changes introduced by an external process, e.g. a new symbol may be introduced while saving the tables.
The relevant doc is https://code.kx.com/q/basics/syscmds/#l-load-file-or-directory
-
Thanks @sujoy13. What confused me is “directory of a splayed table”, which I interpreted as directory
t
as in your example.A follow-up question: what is the real runtime difference between
l .
andl t
for this particular example? The latter form also maps the table without copying data from disk to memory. By the wayt
is layed out on the disk, it is a splayed table… -
A generalized solution that doesn’t use hardcoded column names (provided as input) or assume the the cells in the nested column having the same length (padded with null for those with shorter length).
unpack:{[t;c] maxLen:max count each t[c]; newCols:`$string[c] ,/: string 1+til maxLen; // concatenate the parts other than the specified column, with the unpacked parts //(x;::;y) is the parsed form of x[;y] ![t; (); 0b; enlist c] ,' ?[t; (); 0b; newCols!{(x;::;y)}'[c;til maxLen]] }
-
Both works, as
(())
is interpreted the same as()
. For this reason the first form is preferred. Note that the data type ofprice
is determined by the data type of cell in the first row.q)t:([] time:`time$(); price:()) q)meta t c | t f a -----| ----- time | t price| q)`t upsert (.z.t; 10 11f) `t q)t time price ------------------ 21:42:07.028 10 11 q)meta t c | t f a -----| ----- time | t price| F q)`t upsert (.z.t; 10 11) `t q)t time price ------------------ 21:42:07.028 10 11 21:44:30.724 10 11 q)meta t c | t f a -----| ----- time | t price| F
-
darrenwsun
MemberJuly 1, 2022 at 12:00 am in reply to: How to make externally-changed files tracked by Kx DeveloperThanks David, kindly let me know once you hear back.
I like many of the features from Kx Developer (notably qcumber and qdoc), but it being a web IDE isn’t on the list…
-
Thanks David for following up.
Ignoring null makes perfect sense, especially for avg/max/sum. Imagine otherwise, the result would be null as long as the input contains one null, effectively forcing everyone to explicitly coalesce.
I understand such change is not backward compatible, and given the ability to manually exclude null like what you shared, I’d agree it’s not worth doing it. However I’m curious if we could leave legacy burden behind, what would be the decision from language designers.
-
1. Doing this via qpython is no different from those via other interfaces (including q itself): you open a connection (e.g. to tickerplant, assuming a tp+rdb+hdb setup) and call a function that will get your data persisted (e.g. .u.upd[`trade; data]).
2. Generally speaking, backpopulating historical data in a right way isn’t straightforward: it could be done as simple as splaying data using set, one for each partition (date), if you have complete data to write; if you’ve got partial data to write (e.g. trades for a new instrument type), you may have to re-sort the combined data and re-apply attributes wherever needed. The q language itself is short of higher-level APIs that handles the nuances, but I find TorQ has a good collections of those, which makes the task easier.
-
darrenwsun
MemberMay 24, 2022 at 12:00 am in reply to: What is the role played by key columns in a keyed table [ query join/performance ] ?Regarding why q-sql doesn’t do key lookup internally if the search is on the key, short answer is that they are not equivalent. Consider a keyed table where there are duplicate keys (yes this is allowed in KDB, there is no such thing as “primary key constraint” in this world), q-sql may return a table of multiple rows, while the key lookup form returns a dictionary representing the first entry (you know why – the search stops at first matched entry)