KX Community

Find answers, ask questions, and connect with our KX Community around the world.
KX Community Guidelines

Home Forums kdb+ Fast file reading

  • Fast file reading

    Posted by roryk on June 4, 2024 at 4:41 am

    I was testing the speed of loading in a file with one word on each line, and found reading it as a CSV is faster and uses less memory than read0, despite being a more roundabout method. Out of interest, is there a reason why? And is there anything faster?

    \ts:100 read0 words
    2297 18874480
    \ts:100 first (1#”*”;” “)0: words
    1003 14680704

    sujoy replied 5 months, 1 week ago 3 Members · 6 Replies
  • 6 Replies
  • megan_mcp

    Administrator
    June 5, 2024 at 11:27 am

    Hi @roryk

    If the content of the file contains spaces, 0: only loads a subset of the data.

    It could also just be the read vs mmap performance.

    Hope this helps.

    Thanks,

    Megan

    • roryk

      Member
      June 6, 2024 at 5:59 am

      Hi, the file doesn’t contain spaces, it’s just one word per line. So the results are the same. If mmap is faster, is there a reason read0 doesn’t use it, or is it just a potential optimisation that hasn’t been implemented?

      It doesn’t really matter as only a couple of milliseconds for even a fairly large file, but it would be nice to have a deeper understanding of the performance of various operations.

      • megan_mcp

        Administrator
        June 7, 2024 at 12:00 pm

        Hi @roryk

        I reached out to one of our developers on this and this was their response:

        “Actually looks like read0 is using a load of memcmp calls & scanning for \n
        where 0: is using memchr to find it in a single call:

        q)\ts:1 (1#"*";"-")0:`:testf 
        221 36800   
        seconds  usecs/call     calls      function 
        -------- ----------- --------- -------------------- 
        0.095358          95      1003 memchr 
        0.060194         106       564 memmove  
        240799   0.000319 memchr("qwertyuiopasdfghjklzxcvbnm\nqwert"..., '\n', 13311)                              
        = 0x7fa1eaf3d0d7 
        240799   0.005258 memchr("qwertyuiopasdfghjklzxcvbnm\nqwert"..., '-', 26)                                  
        = 0 
        240799   0.005147 memmove(0x7fa1e6b121d0, "qwertyuiopasdfghjklzxcvbnm", 26)                                
        = 0x7fa1e6b121d0
        
        q) \ts:1 read0 `:testf
        2161 52624
        seconds usecs/call calls function
        -------- ----------- --------- --------------------
        1.332051 98 13505 memcmp
        
        0.102397 97 1046 memmove
        
        240799 0.000144 memcmp(0x4bf876, 0x7fa1e6b7f410, 1, 113) = 0xffffff93
        
        240799 0.000152 memcmp(0x4bf876, 0x7fa1e6b7f411, 1, 119) = 0xffffffa5
        
        240799 0.000172 memcmp(0x4bf876, 0x7fa1e6b7f412, 1, 101) = 0xffffff98
        
        240799 0.000160 memcmp(0x4bf876, 0x7fa1e6b7f413, 1, 114) = 0xffffff96
        
        240799 0.000144 memcmp(0x4bf876, 0x7fa1e6b7f414, 1, 116) = 0xffffff91
        
        240799 0.000144 memcmp(0x4bf876, 0x7fa1e6b7f415, 1, 121) = 0xffffff95
        
        240799 0.000144 memcmp(0x4bf876, 0x7fa1e6b7f416, 1, 117) = 0xffffffa1
        
        240799 0.000154 memcmp(0x4bf876, 0x7fa1e6b7f417, 1, 105) = 0xffffff9b
        
        240799 0.000162 memcmp(0x4bf876, 0x7fa1e6b7f418, 1, 111) = 0xffffff9a
        
        240799 0.000236 memcmp(0x4bf876, 0x7fa1e6b7f419, 1, 112) = 0xffffffa9
        
        240799 0.000150 memcmp(0x4bf876, 0x7fa1e6b7f41a, 1, 97) = 0xffffff97
        
        240799 0.000149 memcmp(0x4bf876, 0x7fa1e6b7f41b, 1, 115) = 0xffffffa6
        
        240799 0.000149 memcmp(0x4bf876, 0x7fa1e6b7f41c, 1, 100) = 0xffffffa4
        
        240799 0.000222 memcmp(0x4bf876, 0x7fa1e6b7f41d, 1, 102) = 0xffffffa3
        
        240799 0.000175 memcmp(0x4bf876, 0x7fa1e6b7f41e, 1, 103) = 0xffffffa2
        
        240799 0.000313 memcmp(0x4bf876, 0x7fa1e6b7f41f, 1, 104) = 0xffffffa0
        
        240799 0.000311 memcmp(0x4bf876, 0x7fa1e6b7f420, 1, 106) = 0xffffff9f
        
        240799 0.000312 memcmp(0x4bf876, 0x7fa1e6b7f421, 1, 107) = 0xffffff9e
        
        240799 0.000239 memcmp(0x4bf876, 0x7fa1e6b7f422, 1, 108) = 0xffffff90
        
        240799 0.000149 memcmp(0x4bf876, 0x7fa1e6b7f423, 1, 122) = 0xffffff92
        
        240799 0.000148 memcmp(0x4bf876, 0x7fa1e6b7f424, 1, 120) = 0xffffffa7
        
        240799 0.000218 memcmp(0x4bf876, 0x7fa1e6b7f425, 1, 99) = 0xffffff94
        
        240799 0.000156 memcmp(0x4bf876, 0x7fa1e6b7f426, 1, 118) = 0xffffffa8
        
        240799 0.000148 memcmp(0x4bf876, 0x7fa1e6b7f427, 1, 98) = 0xffffff9c
        
        240799 0.000208 memcmp(0x4bf876, 0x7fa1e6b7f428, 1, 110) = 0xffffff9d
        
        240799 0.000326 memcmp(0x4bf876, 0x7fa1e6b7f429, 1, 109) = 0
        
        240799 0.000270 memmove(0x7fa1e6b121d0, "qwertyuiopasdfghjklzxcvbnm", 26) = 0x7fa1e6b121d0
        
        240799 0.000174 memmove(0x7fa1e6b75f78, "\300!\261\346\241\177\0\0", 8) = 0x7fa1e6b75f78
        • megan_mcp

          Administrator
          June 7, 2024 at 12:10 pm

          @roryk

          I can follow up on this further if you would like to know why read0 doesn’t use memchr (&/ mmap)?

          • roryk

            Member
            June 9, 2024 at 7:24 pm

            Hi Megan,
            Thanks for the information!
            It would be interesting to know the reason, but I don’t want to take up too much of a dev’s time for a random curiosity. If someone is free to look into it and would want to, then it would be interesting to know but no problem if not.

  • sujoy

    Member
    June 16, 2024 at 3:46 am

    I think, read0 (0::) is reading 1 line at a time, while 0: is picking up columns. 0: will be faster, if you exactly know which position you are looking at.

Log in to reply.