-
Fast file reading
Posted by roryk on June 4, 2024 at 4:41 amI was testing the speed of loading in a file with one word on each line, and found reading it as a CSV is faster and uses less memory than read0, despite being a more roundabout method. Out of interest, is there a reason why? And is there anything faster?
\ts:100 read0 words
2297 18874480
\ts:100 first (1#”*”;” “)0: words
1003 14680704sujoy replied 5 months, 1 week ago 3 Members · 6 Replies -
6 Replies
-
Hi @roryk
If the content of the file contains spaces, 0: only loads a subset of the data.
It could also just be the read vs mmap performance.
Hope this helps.
Thanks,
Megan
-
Hi, the file doesn’t contain spaces, it’s just one word per line. So the results are the same. If mmap is faster, is there a reason read0 doesn’t use it, or is it just a potential optimisation that hasn’t been implemented?
It doesn’t really matter as only a couple of milliseconds for even a fairly large file, but it would be nice to have a deeper understanding of the performance of various operations.
-
Hi @roryk
I reached out to one of our developers on this and this was their response:
“Actually looks like read0 is using a load of memcmp calls & scanning for \n
where 0: is using memchr to find it in a single call:q)\ts:1 (1#"*";"-")0:`:testf 221 36800 seconds usecs/call calls function -------- ----------- --------- -------------------- 0.095358 95 1003 memchr 0.060194 106 564 memmove 240799 0.000319 memchr("qwertyuiopasdfghjklzxcvbnm\nqwert"..., '\n', 13311) = 0x7fa1eaf3d0d7 240799 0.005258 memchr("qwertyuiopasdfghjklzxcvbnm\nqwert"..., '-', 26) = 0 240799 0.005147 memmove(0x7fa1e6b121d0, "qwertyuiopasdfghjklzxcvbnm", 26) = 0x7fa1e6b121d0 q) \ts:1 read0 `:testf 2161 52624
seconds usecs/call calls function -------- ----------- --------- -------------------- 1.332051 98 13505 memcmp 0.102397 97 1046 memmove 240799 0.000144 memcmp(0x4bf876, 0x7fa1e6b7f410, 1, 113) = 0xffffff93 240799 0.000152 memcmp(0x4bf876, 0x7fa1e6b7f411, 1, 119) = 0xffffffa5 240799 0.000172 memcmp(0x4bf876, 0x7fa1e6b7f412, 1, 101) = 0xffffff98 240799 0.000160 memcmp(0x4bf876, 0x7fa1e6b7f413, 1, 114) = 0xffffff96 240799 0.000144 memcmp(0x4bf876, 0x7fa1e6b7f414, 1, 116) = 0xffffff91 240799 0.000144 memcmp(0x4bf876, 0x7fa1e6b7f415, 1, 121) = 0xffffff95 240799 0.000144 memcmp(0x4bf876, 0x7fa1e6b7f416, 1, 117) = 0xffffffa1 240799 0.000154 memcmp(0x4bf876, 0x7fa1e6b7f417, 1, 105) = 0xffffff9b 240799 0.000162 memcmp(0x4bf876, 0x7fa1e6b7f418, 1, 111) = 0xffffff9a 240799 0.000236 memcmp(0x4bf876, 0x7fa1e6b7f419, 1, 112) = 0xffffffa9 240799 0.000150 memcmp(0x4bf876, 0x7fa1e6b7f41a, 1, 97) = 0xffffff97 240799 0.000149 memcmp(0x4bf876, 0x7fa1e6b7f41b, 1, 115) = 0xffffffa6 240799 0.000149 memcmp(0x4bf876, 0x7fa1e6b7f41c, 1, 100) = 0xffffffa4 240799 0.000222 memcmp(0x4bf876, 0x7fa1e6b7f41d, 1, 102) = 0xffffffa3 240799 0.000175 memcmp(0x4bf876, 0x7fa1e6b7f41e, 1, 103) = 0xffffffa2 240799 0.000313 memcmp(0x4bf876, 0x7fa1e6b7f41f, 1, 104) = 0xffffffa0 240799 0.000311 memcmp(0x4bf876, 0x7fa1e6b7f420, 1, 106) = 0xffffff9f 240799 0.000312 memcmp(0x4bf876, 0x7fa1e6b7f421, 1, 107) = 0xffffff9e 240799 0.000239 memcmp(0x4bf876, 0x7fa1e6b7f422, 1, 108) = 0xffffff90 240799 0.000149 memcmp(0x4bf876, 0x7fa1e6b7f423, 1, 122) = 0xffffff92 240799 0.000148 memcmp(0x4bf876, 0x7fa1e6b7f424, 1, 120) = 0xffffffa7 240799 0.000218 memcmp(0x4bf876, 0x7fa1e6b7f425, 1, 99) = 0xffffff94 240799 0.000156 memcmp(0x4bf876, 0x7fa1e6b7f426, 1, 118) = 0xffffffa8 240799 0.000148 memcmp(0x4bf876, 0x7fa1e6b7f427, 1, 98) = 0xffffff9c 240799 0.000208 memcmp(0x4bf876, 0x7fa1e6b7f428, 1, 110) = 0xffffff9d 240799 0.000326 memcmp(0x4bf876, 0x7fa1e6b7f429, 1, 109) = 0 240799 0.000270 memmove(0x7fa1e6b121d0, "qwertyuiopasdfghjklzxcvbnm", 26) = 0x7fa1e6b121d0 240799 0.000174 memmove(0x7fa1e6b75f78, "\300!\261\346\241\177\0\0", 8) = 0x7fa1e6b75f78
-
I can follow up on this further if you would like to know why read0 doesn’t use memchr (&/ mmap)?
-
Hi Megan,
Thanks for the information!
It would be interesting to know the reason, but I don’t want to take up too much of a dev’s time for a random curiosity. If someone is free to look into it and would want to, then it would be interesting to know but no problem if not.
-
-
-
-
-
I think, read0 (0::) is reading 1 line at a time, while 0: is picking up columns. 0: will be faster, if you exactly know which position you are looking at.
Log in to reply.