-
Compression for null string column
Hi,
We’re seeing compressed null string columns take up more space on disk than expected. Would anyone be able to shine some light on this behaviour?
Example:
q)n:10000000;tab:([]time:n#.z.p;val:n?1000;str:n#enlist “”);(<backtick>:tab/;17;2;5) set tab
<backtick>:tab/
q)-21!<backtick>:tab/str
compressedLength | 14074225
uncompressedLength| 80004096
algorithm | 2i
logicalBlockSize | 17i
zipLevel | 5i
q)-21!<backtick>$”:tab/str#”
compressedLength | 24189
uncompressedLength| 20004096
algorithm | 2i
logicalBlockSize | 17i
zipLevel | 5i
According to this page, “the non-sharp file is a serialized q list of integers representing the lengths of each sublist of the original list.”
For a null string column we’d expect the non-sharp file to just contain zeroes, which should compress better than what we’re seeing.
Using 4.0 2020.06.18
Thanks,
Eoghan
Log in to reply.