-
Does KDB have string data type? What is a string in KDB?
Hi,
I am trying to understand what is meant by the concept of a String in KDB.
Initially when I read the documentation on datatypes, which is related to the serialization and deserialization of data to/from disk or network sockets, I read that there is no String datatype in KDB.
Please find the referenced page below.
https://code.kx.com/q/basics/datatypes/
There is no string datatype. On this site, string is a synonym for character vector (type 10h). In q, the nearest equivalent to an atomic string is the symbol.
I understand that Symbols are atomic, and interned, and that these data types are distinct from Strings.
My question is really in two parts. Firstly what is a String?
- Is there nothing more to it than “a string is the same as a list of characters”?
- This seems unlikely, given that a String could be unicode. Since a character is an 8 bit piece of data, clearly a character cannot hold all Unicode code-points. So it would seem there should be some distinction?
Secondly:
- In relation to serialization, KDB defines “char” (atom), “list of char”, “symbol” (atom) and “list of symbol”. It does not discuss strings. I wonder if anyone can comment on that?
- Finally, assuming that a String is the same as a list of char, it would seem slightly strange that there is no datatype for list of string, in other words list of list of char. As far as I can see there are 5 things which can be serialized, rather than 4. Those things are: char, list of char, string, symbol and list of symbol. I am confused further by this because the serialization format for a list of char is the same as a symbol, except for the code used for the datatype. (10 for list of char, -11 for symbol) From the point of view of some other application which deserializes data which has been serialized by KDB, I struggle to understand how a Symbol would differ from a list of char. The fact that Symbols are atomic and interned is an internal implementation detail of KDB. This detail is not relevant once the data has been serialized to a file as a stream of bytes.
Sorry my question perhaps isn’t particularly coherent. I’m just a bit confused by various factors.
To provide some context on what I am actually trying to achieve – I am writing a serialization and deserialization library for another language which is not directly supported by KDB.
This is why I am looking into detail at how KDB performs serialization and what the precise semantics are.
Thank you in advance for any feedback and comments.
code.kx.com
Data types | Basics | kdb+ and q documentation - kdb+ and q documentation
Every kdb+ object has a data type. This page tabulates the datatypes.
Log in to reply.