KX Community

Find answers, ask questions, and connect with our KX Community around the world.
KX Community Guidelines

Home Forums kdb+ Trouble With Huge CSVs

  • Trouble With Huge CSVs

    Posted by Laura on July 9, 2021 at 12:00 am

    Greetings All,

    I’ve got a couple 40gb CSVs that I’m hoping to perform some joins on.

    I do not know the column format, or headers, or if headers are even in the csv.

    Im working with a good bit of memory, with 256gb accessible.

    Loading the files into memory clearly doesn’t work — as expected the program crashes.

    So  made my way here (loading from large files page). I understand I’ll have to convert my csvs to splayed tables, save those tables down and then work from there instead of using the csvs.

    I’m able to see the rows inside the csv with .Q.fs[0N!]`:file.csv — I still don’t know the entirety of whats inside though.

    I go through this little bit,

     

    and obviously it’s too big and crashes the program. I try to insert the rows directly into a table on disk with .Q.fs[{`:newfile upsert flip colnames!(“DFFFFIS”;“,”)0:x}]`:file.csv and that crashes too

    Should I be chunking this and going from that angle or is there a better way to do this? 

     

    Laura replied 9 months ago 2 Members · 2 Replies
  • 2 Replies
  • Laura

    Administrator
    July 9, 2021 at 12:00 am

    I’ve chunked with .Q.fs[{`trade insert flip colnames!(“**********”;”,”)0:x}]`:filename and it runs until it crashed.

    Did some more research and thought it could be a gc issue, so I added a gc call but that didn’t help me either.

    Dumb question, is this bc I’m using w32 instead of w64?

  • rocuinneagain

    Member
    July 11, 2021 at 12:00 am

    Yes the w32 version has a limit to how much memory it can address, w64 does not have this restriction.

     

    You could also stream the data to an on disk table:

    .Q.fs[{`:trade/ upsert flip colnames!("**********";",")0:x}]`:filename
    trade:get `:trade/

Log in to reply.