-
Need help to read hdf. file written in python to kdb+?
Posted by mshk on May 30, 2022 at 12:00 amHi Community,
I am using the https://code.kx.com/q/interfaces/hdf5/ and trying to read HDF file (written in python) in kdb+ (q)?
Attached HDF file and error message.
Please advise how to read this file?
File from iOS
Download Binaryrgds,
Marion
mshk replied 7 months, 3 weeks ago 3 Members · 3 Replies -
3 Replies
-
The error message JPEG looks like source code; I dont see an HDF file attached. Did I miss something?
-
If you run through the example in q then inspect the created file in python you will see what the interface expects
https://code.kx.com/q/interfaces/hdf5/examples/#create-a-dataset
The table columns are stored individually inside groups
>>>data = h5.File('experiments.h5', 'r') >>> data['experiment2_tables'] <HDF5 group "/experiment2_tables" (1 members)> >>> data['experiment2_tables/tab_dset'] <HDF5 group "/experiment2_tables/tab_dset" (5 members)> >>> data['experiment2_tables/tab_dset/class'] <HDF5 dataset "class": shape (10000,), type "<i2">
(The filename must end in ‘.h5’)
For you to store data from Python you should match this style using groups for columns.
import h5py as h5 import pandas as pd df = pd.DataFrame({"AA":[1, 2], "BB":[3, 4], "CC":[5, 6]}) f = h5.File('forKX.h5','w') project = f.create_group("project") table = project.create_group("table") for col in df.columns: table[col] = df[col].to_numpy() f.close()
kdb+ still does not know you intend this data to be a table.
As outline in the docs https://code.kx.com/q/interfaces/hdf5/hdf5-types/#tables-and-dictionaries
Attributes would be needed.
Without the attributes you can reshape in to a table like so:
q){flip x!{.hdf5.readData["forKX.h5";"project/table/",string x]} each x}`AA`BB`CC AA BB CC -------- 1 3 5 2 4 6
-
This code creates a basic table in a file written by KX:
t:([] AA:1 2;BB:3 4;CC:5 6) .hdf5.createFile["byKX.h5"] .hdf5.createGroup["byKX.h5";"project"] .hdf5.writeData[fname;"project/table";t]
If we expand out what it is doing to match the documentation we can create the exact same file with:
https://code.kx.com/q/interfaces/hdf5/hdf5-types/#tables-and-dictionaries
t:([] AA:1 2;BB:3 4;CC:5 6) .hdf5.createFile["diy.h5"] .hdf5.createGroup["diy.h5";"project"] .hdf5.createGroup["diy.h5";"project/table"] {.hdf5.writeData["diy.h5";"project/table/",string x;t x]} each cols t .hdf5.writeAttr["diy.h5";"project/table";"datatype_kdb";"table"] .hdf5.writeAttr["diy.h5";"project/table";"kdb_columns";cols t]
Finally this would be the python equivalent:
import h5py as h5 import pandas as pd import numpy as np df = pd.DataFrame({"AA":[1, 2], "BB":[3, 4], "CC":[5, 6]}) f = h5.File('forKX.h5','w') project = f.create_group("project") table = project.create_group("table") table.attrs["datatype_kdb"] = np.array( [ord(c) for c in 'table'], dtype=np.int8) table.attrs["kdb_columns"] = [x.encode('ascii') for x in df.columns] for col in df.columns: table[col] = df[col].to_numpy() f.close()
All three read in the same way:
q).hdf5.readData[“byKX.h5″;”project/table”] AA BB CC ——– 1 3 5 2 4 6 q).hdf5.readData[“diy.h5″;”project/table”] AA BB CC ——– 1 3 5 2 4 6 q).hdf5.readData[“forKX.h5″;”project/table”] AA BB CC ——– 1 3 5 2 4 6
h5dump is useful to inspect h5 files.
https://support.hdfgroup.org/HDF5/doc/RM/Tools/h5dump.htm
Used it will print the shapes and types of the contents of your file
h5dump diy.h5
The main 3 takeaways are:
1. Only supported types are available with this interface https://code.kx.com/q/interfaces/hdf5/hdf5-types/#type-mapping
2. In the real world tabular data you have from another source in a .h5 will not read straight in to a kdb+ table. You will need to extract the data column by column as I showed in a previous example
3. If you have a file the interface is unable to read you can still use embedPy to manipulate the data and transfer to kdb+
Log in to reply.