rocuinneagain
Forum Replies Created
-
rocuinneagain
MemberJune 17, 2022 at 12:00 am in reply to: for hdb query, why is it fast when including where date>$date$Hi renbright –
There are some posts showing how to embed code which will make your question more clear:
- Adding inline code to your questions & messages
- Creating q code block with syntax highlighting in a message
Can you share more complete example?
The second of these queries is faster because it reads less data:
//Retrieves data from all dates select from table //Retrieves only the last 5 days of data select from table where date>(.z.d-5)
-
1.
An example data publisher is included in documentation https://qpython.readthedocs.io/en/latest/usage-examples.html#data-publisher
2.
For going straight to HDB you can write a custom
.u.upd
yourself to write direct- https://code.kx.com/q/ref/upsert/#splayed-table
- https://code.kx.com/q/ref/asc/#sorting-data-on-disk
- https://code.kx.com/q/ref/set-attribute/#grouped-and-parted
Or use a file based batch load approach to backfill.
Articles:
- https://code.kx.com/q/wp/data-loaders/
- https://kx.com/blog/kx-product-insights-parsing-data-in-kdb/
- https://kx.com/blog/kdb-q-insights-parsing-json-files/
Documentation:
- https://code.kx.com/q/ref/dotj/
- https://code.kx.com/q/ref/file-text/#load-csv
- https://code.kx.com/q/interfaces/arrow/
- https://code.kx.com/q/interfaces/hdf5/
-
rocuinneagain
MemberJune 2, 2022 at 12:00 am in reply to: Need help to read hdf. file written in python to kdb+?This code creates a basic table in a file written by KX:
t:([] AA:1 2;BB:3 4;CC:5 6) .hdf5.createFile["byKX.h5"] .hdf5.createGroup["byKX.h5";"project"] .hdf5.writeData[fname;"project/table";t]
If we expand out what it is doing to match the documentation we can create the exact same file with:
https://code.kx.com/q/interfaces/hdf5/hdf5-types/#tables-and-dictionaries
t:([] AA:1 2;BB:3 4;CC:5 6) .hdf5.createFile["diy.h5"] .hdf5.createGroup["diy.h5";"project"] .hdf5.createGroup["diy.h5";"project/table"] {.hdf5.writeData["diy.h5";"project/table/",string x;t x]} each cols t .hdf5.writeAttr["diy.h5";"project/table";"datatype_kdb";"table"] .hdf5.writeAttr["diy.h5";"project/table";"kdb_columns";cols t]
Finally this would be the python equivalent:
import h5py as h5 import pandas as pd import numpy as np df = pd.DataFrame({"AA":[1, 2], "BB":[3, 4], "CC":[5, 6]}) f = h5.File('forKX.h5','w') project = f.create_group("project") table = project.create_group("table") table.attrs["datatype_kdb"] = np.array( [ord(c) for c in 'table'], dtype=np.int8) table.attrs["kdb_columns"] = [x.encode('ascii') for x in df.columns] for col in df.columns: table[col] = df[col].to_numpy() f.close()
All three read in the same way:
q).hdf5.readData[“byKX.h5″;”project/table”] AA BB CC ——– 1 3 5 2 4 6 q).hdf5.readData[“diy.h5″;”project/table”] AA BB CC ——– 1 3 5 2 4 6 q).hdf5.readData[“forKX.h5″;”project/table”] AA BB CC ——– 1 3 5 2 4 6
h5dump is useful to inspect h5 files.
https://support.hdfgroup.org/HDF5/doc/RM/Tools/h5dump.htm
Used it will print the shapes and types of the contents of your file
h5dump diy.h5
The main 3 takeaways are:
1. Only supported types are available with this interface https://code.kx.com/q/interfaces/hdf5/hdf5-types/#type-mapping
2. In the real world tabular data you have from another source in a .h5 will not read straight in to a kdb+ table. You will need to extract the data column by column as I showed in a previous example
3. If you have a file the interface is unable to read you can still use embedPy to manipulate the data and transfer to kdb+
-
rocuinneagain
MemberJune 2, 2022 at 12:00 am in reply to: Need help to read hdf. file written in python to kdb+?If you run through the example in q then inspect the created file in python you will see what the interface expects
https://code.kx.com/q/interfaces/hdf5/examples/#create-a-dataset
The table columns are stored individually inside groups
>>>data = h5.File('experiments.h5', 'r') >>> data['experiment2_tables'] <HDF5 group "/experiment2_tables" (1 members)> >>> data['experiment2_tables/tab_dset'] <HDF5 group "/experiment2_tables/tab_dset" (5 members)> >>> data['experiment2_tables/tab_dset/class'] <HDF5 dataset "class": shape (10000,), type "<i2">
(The filename must end in ‘.h5’)
For you to store data from Python you should match this style using groups for columns.
import h5py as h5 import pandas as pd df = pd.DataFrame({"AA":[1, 2], "BB":[3, 4], "CC":[5, 6]}) f = h5.File('forKX.h5','w') project = f.create_group("project") table = project.create_group("table") for col in df.columns: table[col] = df[col].to_numpy() f.close()
kdb+ still does not know you intend this data to be a table.
As outline in the docs https://code.kx.com/q/interfaces/hdf5/hdf5-types/#tables-and-dictionaries
Attributes would be needed.
Without the attributes you can reshape in to a table like so:
q){flip x!{.hdf5.readData["forKX.h5";"project/table/",string x]} each x}`AA`BB`CC AA BB CC -------- 1 3 5 2 4 6
-
rocuinneagain
MemberMay 24, 2022 at 12:00 am in reply to: What is the role played by key columns in a keyed table [ query join/performance ] ?For optimisations in qsql queries you should look to attributes.
Here applying the unique attribute to the Name column:
q)t3:update `u#Name from t1 q)\ts do[100000;select from t3 where Name=`Anand] 64 1808
-
rocuinneagain
MemberMay 24, 2022 at 12:00 am in reply to: What is the role played by key columns in a keyed table [ query join/performance ] ?You can see the speed improvement if you index rather than query using qsql:
q)select from t1 where Name=`Anand Name Age Salary ---------------- Anand 10 1000 q)select from t2 where Name=`Anand Name | Age Salary -----| ---------- Anand| 10 1000 q)t2`Anand Age | 10 Salary| 1000 q)\ts do[100000;select from t1 where Name=`Anand] 164 1808 q)\ts do[100000;select from t2 where Name=`Anand] 174 1808 q)\ts do[100000;t2`Anand] 71 960
-
This becomes easier by splitting the problem in 2 parts.
- Finding the indexes where needed number of 1b is hit
- Finding the indexes where needed number of 0b is hit
These are the only indexes that matter. Then a prototype list can be populated with nulls before having the important indexes overlaid. Then the prevailing values are carried forward by fills.
q)f:{o:count[z]#0N;o:@[o;where x=x msum z;:;1];y:y+1;"b"$0^fills @[o;where y=y msum not z;:;0]} q)f[3;2] 00001111000111001101000b 00000011110001111111110b q)f[3;2] 110010111001111b 000000001111111b
-
Is the
/
for ofdo
what you are looking for?https://code.kx.com/q/ref/accumulators/#do
q)750 +/2 2 754
-
Something more like:
q)750{(x[0]+1;x[1]+x[2];x[2])}/0
0 2 750 1500 2
Or use a dictionary for readability:
q)750{x[`ID]+:1;x[`y]:sum x`y`z;x}/`ID`y`z!0 0 2
ID| 750
y | 1500
z | 2
-
Are you on Windows/Linux/Mac?
Esc
works in Windows (PowerShell + CMD)- In Linux (Bash)
Ctrl
+E
,Ctrl
+U
https://stackoverflow.com/a/16687377/4256419
On Linux/Mac rlwrap is suggested.
-
rocuinneagain
MemberMay 5, 2022 at 12:00 am in reply to: Creating q code block with syntax highlighting in a messageInstructions now also added for adding inline code to your questions & messages.
-
rocuinneagain
MemberApril 29, 2022 at 12:00 am in reply to: Question about query that stuck the KDB processhttps://code.kx.com/q4m3/6_Functions/#677-more-over-iteration
The final overload of
/
is equivalent to a while loop in imperative programming. It provides a declarative way to specify a test to end the iteration. Thinking functionally, we provide a predicate function that is applied to the result at each step. The iteration continues as long as the predicate result is1b
and stops otherwise. For example, we stop the computation of the Fibonacci sequence once it exceeds 1000 as follows.q)fib:{x,sum -2#x} q)fib/[{1000>last x}; 1 1] 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597
Your code accidentally boiled down to an infinite loop as the result never moves to 0b
1 1/[sums;1]
/Each iteration it runs 1 1[1] / Which returns 1 /Then sums 1 /This returns 1 /This is a non zero value which gets treated as 1b so the loop starts again /Infinite loop
-
rocuinneagain
MemberApril 29, 2022 at 12:00 am in reply to: Question about query that stuck the KDB processUsing parse helps to see what is happening.
Parsing a simplified version of your query helped to show the order of execution in the parse tree:
q)-1 .Q.s1 last value last parse "select sums(size)/sum(size) from tab"; ((/;`size);+;(sum;`size))
The q equivalent can then be shown to have the same parse tree
q)-1 .Q.s1 parse"size/[sums;sum size]"; ((/;`size);+;(sum;`size))
-
rocuinneagain
MemberApril 29, 2022 at 12:00 am in reply to: [checkHost.sh] Failed error when installing Platform on Ubuntu 20.04https://code.kx.com/platform/deployment/lin_prerequisites/#host-configuration
Kdb+ licenses (
k4.lic
) are tied to the hostname of a machine and specifically the hostname in FQDN (<HOSTNAME>.<DOMAINNAME>
) format.The installer expects the hostname to be the the required FQDN format .i.e
server1.company.com
-
rocuinneagain
MemberApril 22, 2022 at 12:00 am in reply to: Insert a not matching dictionary to a tableq){k:key y;x upsert enlist (k where k in cols x)#y}[x;td2] a b c ------- 1 I 10 2 J 20 3 K 30 99 a