KX Community

Find answers, ask questions, and connect with our KX Community around the world.
KX Community Guidelines

Home Forums kdb+ Need help to read hdf. file written in python to kdb+? Re: Need help to read hdf. file written in python to kdb+?

  • rocuinneagain

    Member
    June 2, 2022 at 12:00 am

    This code creates a basic table in a file written by KX:

     

    t:([] AA:1 2;BB:3 4;CC:5 6) 
    .hdf5.createFile["byKX.h5"] 
    .hdf5.createGroup["byKX.h5";"project"] 
    .hdf5.writeData[fname;"project/table";t]

     

    If we expand out what it is doing to match the documentation we can create the exact same file with:

    https://code.kx.com/q/interfaces/hdf5/hdf5-types/#tables-and-dictionaries

     

    t:([] AA:1 2;BB:3 4;CC:5 6) 
    .hdf5.createFile["diy.h5"] 
    .hdf5.createGroup["diy.h5";"project"] 
    .hdf5.createGroup["diy.h5";"project/table"] 
    {.hdf5.writeData["diy.h5";"project/table/",string x;t x]} each cols t 
    .hdf5.writeAttr["diy.h5";"project/table";"datatype_kdb";"table"] 
    .hdf5.writeAttr["diy.h5";"project/table";"kdb_columns";cols t]

     

    Finally this would be the python equivalent:

     

    import h5py as h5 
    import pandas as pd 
    import numpy as np 
    df = pd.DataFrame({"AA":[1, 2], "BB":[3, 4], "CC":[5, 6]}) 
    f = h5.File('forKX.h5','w') 
    project = f.create_group("project") 
    table = project.create_group("table") 
    table.attrs["datatype_kdb"] = np.array( [ord(c) for c in 'table'], dtype=np.int8) 
    table.attrs["kdb_columns"] = [x.encode('ascii') for x in df.columns] for col in df.columns: table[col] = df[col].to_numpy() 
    f.close()

     

    All three read in the same way:

     

    q).hdf5.readData[“byKX.h5″;”project/table”] AA BB CC ——– 1 3 5 2 4 6 q).hdf5.readData[“diy.h5″;”project/table”] AA BB CC ——– 1 3 5 2 4 6 q).hdf5.readData[“forKX.h5″;”project/table”] AA BB CC ——– 1 3 5 2 4 6

     

     

    h5dump is useful to inspect h5 files.

    https://support.hdfgroup.org/HDF5/doc/RM/Tools/h5dump.htm

    Used it will print the shapes and types of the contents of your file

     

    h5dump diy.h5

     

    The main 3 takeaways are:

    1. Only supported types are available with this interface https://code.kx.com/q/interfaces/hdf5/hdf5-types/#type-mapping

    2. In the real world tabular data you have from another source in a .h5 will not read straight in to a kdb+ table. You will need to extract the data column by column as I showed in a previous example

    3. If you have a file the interface is unable to read you can still use embedPy to manipulate the data and transfer to kdb+