Home › Forums › kdb+ › Need help to read hdf. file written in python to kdb+? › Re: Need help to read hdf. file written in python to kdb+?
-
This code creates a basic table in a file written by KX:
t:([] AA:1 2;BB:3 4;CC:5 6) .hdf5.createFile["byKX.h5"] .hdf5.createGroup["byKX.h5";"project"] .hdf5.writeData[fname;"project/table";t]
If we expand out what it is doing to match the documentation we can create the exact same file with:
https://code.kx.com/q/interfaces/hdf5/hdf5-types/#tables-and-dictionaries
t:([] AA:1 2;BB:3 4;CC:5 6) .hdf5.createFile["diy.h5"] .hdf5.createGroup["diy.h5";"project"] .hdf5.createGroup["diy.h5";"project/table"] {.hdf5.writeData["diy.h5";"project/table/",string x;t x]} each cols t .hdf5.writeAttr["diy.h5";"project/table";"datatype_kdb";"table"] .hdf5.writeAttr["diy.h5";"project/table";"kdb_columns";cols t]
Finally this would be the python equivalent:
import h5py as h5 import pandas as pd import numpy as np df = pd.DataFrame({"AA":[1, 2], "BB":[3, 4], "CC":[5, 6]}) f = h5.File('forKX.h5','w') project = f.create_group("project") table = project.create_group("table") table.attrs["datatype_kdb"] = np.array( [ord(c) for c in 'table'], dtype=np.int8) table.attrs["kdb_columns"] = [x.encode('ascii') for x in df.columns] for col in df.columns: table[col] = df[col].to_numpy() f.close()
All three read in the same way:
q).hdf5.readData[“byKX.h5″;”project/table”] AA BB CC ——– 1 3 5 2 4 6 q).hdf5.readData[“diy.h5″;”project/table”] AA BB CC ——– 1 3 5 2 4 6 q).hdf5.readData[“forKX.h5″;”project/table”] AA BB CC ——– 1 3 5 2 4 6
h5dump is useful to inspect h5 files.
https://support.hdfgroup.org/HDF5/doc/RM/Tools/h5dump.htm
Used it will print the shapes and types of the contents of your file
h5dump diy.h5
The main 3 takeaways are:
1. Only supported types are available with this interface https://code.kx.com/q/interfaces/hdf5/hdf5-types/#type-mapping
2. In the real world tabular data you have from another source in a .h5 will not read straight in to a kdb+ table. You will need to extract the data column by column as I showed in a previous example
3. If you have a file the interface is unable to read you can still use embedPy to manipulate the data and transfer to kdb+