rocuinneagain
Forum Replies Created
-
rocuinneagain
MemberMay 11, 2023 at 12:00 am in reply to: Orphan memory in KDB process using RserveThe src is available at https://github.com/KxSystems/embedR/blob/fa5101b64e15f9ba0aa5c20affc0cd041fb41bc0/src/rserver.c#L458
Have you tried calling
Rcmd “rm(temp)” Rcmd “gc()”
q does not manage the memory for R – you must still delete variables and call garbage collection
-
rocuinneagain
MemberMay 5, 2023 at 12:00 am in reply to: Async broadcast to websocket handles using internal function -25!-25! is for use with IPC handles only. Not websocket handles.
The reason is for IPC handles there is a serialization step and here -25! is efficient in allowing this to only be run once for many handles.
For websocket handles data is sent directly without any serialization step, -25! would offer no benefit here.
- https://code.kx.com/q/basics/internal/#-25x-async-broadcast
- https://code.kx.com/q/basics/internal/#-38x-socket-table
q){([]h)!-38!h:.z.H}[] h | p f ---| --- 612| w t 580| q t q)-25!(enlist 612i;"test") '612 is not an ipc handle [0] -25!(enlist 612i;"test") ^ q)-25!(enlist 580i;"test")
For websockets if there is a large operation like converting a table to JSON you can control this being done once and then sent to multiple websockets:
neg[webSockethandles]@:.j.j bigTable
Wrapping the same in a helper:
wsBroadcast:{[handles;data] neg[(),handles]@:data} wsBroadcast[myWebsocketHandles] .j.j bigTable
-
rocuinneagain
MemberMay 5, 2023 at 12:00 am in reply to: mmap increasing every time table is queriedAll columns in the splayed table should have the same number of rows so there was some issue with the writedown of this data. This is most likely the source of the issue.
Can you recreate the data from source/backup/TP-logs?
When you read/write you are losing 33199-22210=10989 rows of data from the “good” columns.
-
rocuinneagain
MemberMay 3, 2023 at 12:00 am in reply to: Issues with Outputs of Strings and Symbolsq makes a balance when displaying about levels of detail.
Always giving full exact information would make all output very messy and more like code rather than nicely displayed information for human to view.
There are some tools and tips you can use to inspect items in more detail during development and debugging.
Using
.Q.s1
is most useful I find to drill down. (Along with usualtype
,count
etc)https://code.kx.com/q/ref/dotq/#s1-string-representation
-1 .Q.s1 itemToInspect
A script designed to display items in detail nicely:
https://github.com/LeslieGoldsmith/dpy
-
The locking is done by the system call lockf (see link in previous message), this locks the file at the system level. No other process can get a lock on the file until the first process releases the lock.
-
rocuinneagain
MemberApril 21, 2023 at 12:00 am in reply to: mmap increasing every time table is queried- Do you know what version of kdb+ wrote the files?
- Is the data compressed?
- Are there attributes on any of the effected columns?
- Are any of the columns linked columns?
- If you go through the columns one at a time do their counts all match?
- Can you read the bad partitions in q and write them to a temp HDB location. Do these rewritten files still show the same memory behaviour when you load the temp HDB?
-
rocuinneagain
MemberApril 14, 2023 at 12:00 am in reply to: mmap increasing every time table is queriedFirst suggestion would be to test against the latest version of 3.5.
Several fixes were released after the version you are using.
-
.Q.en
uses?
which does locking- https://code.kx.com/q/ref/dotq/#en-enumerate-varchar-cols
- https://code.kx.com/q/ref/enum-extend/#filepath
“The file is locked at a process level for writing during
.Q.en
only. Avoid reading from any file which may be being written to.”“The system call used is https://linux.die.net/man/3/lockf.”
-
rocuinneagain
MemberApril 5, 2023 at 12:00 am in reply to: How to download attachments from *.eml fileKdb+ can parse binary files as nicely shown at the recent KX meetup by
https://community.kx.com/t5/Events/KX-Community-Meetup-New-York/ba-p/13880
File formats can get complicated though so this can be a lot of work.
If I wanted to do this task quickly I would either:
a) Use a system call to a command line tool to extract the files on disk and then read them in from there.
Writing them to current directory or using mktemp command to write in /var/tmp
b) Wrap some exiting python code using EmbedPy to extract the email and attachments to JSON and read in to kdb+ this way.
Similar to how I did for XML with https://github.com/rianoc/qXML
Some discussions on the topic in Python world:
(Note: I have not tested these for functionality or safety)
-
For KX Platform Dashboards on the backend follow:
The UI options** are covered on:
https://code.kx.com/dashboards/datasources/#streaming
(**as well as backend if instead using Dashboards Direct)
-
Most modern shells will be UTF-8 encoded rather than ASCII.
kdb+ is storing the information correctly – it is just being displayed differently than you expect.
On my machine:
$ locale LANG=C.UTF-8 LANGUAGE= LC_CTYPE=”C.UTF-8″ LC_NUMERIC=”C.UTF-8″ LC_TIME=”C.UTF-8″ LC_COLLATE=”C.UTF-8″ LC_MONETARY=”C.UTF-8″ LC_MESSAGES=”C.UTF-8″ LC_PAPER=”C.UTF-8″ LC_NAME=”C.UTF-8″ LC_ADDRESS=”C.UTF-8″ LC_TELEPHONE=”C.UTF-8″ LC_MEASUREMENT=”C.UTF-8″ LC_IDENTIFICATION=”C.UTF-8″ LC_ALL=
UTF-8 encoding for �� 0xC3 0xB4
Testing this in kdb+
q)`char$0xC3B4
“ô”
q)`$`char$0xC3B4
`��i-code>
For your shell to print the extended ASCII table characters you will need to make some changes outside of kdb+.
-
rocuinneagain
MemberMarch 28, 2023 at 12:00 am in reply to: Orphan memory in KDB process using Rserve- Is it the embedR project that your are using?
- Are you on the latest release of the code? https://github.com/KxSystems/embedR/releases
- Is it possible share a small generic piece of code to reproduce the issue?
-
Hyperthreading does not cause problems for kdb+ but you should be aware of your core allocations to get the most out of your kdb+ licence.
Example:
- I have a 4 core 8 thread machine.
- I have a 4 core kdb+ licence.
- How best to allocate my licence?
Your machine will list cores 0,1,2,3,4,5,6,7.
You want to make sure you use your 4 core licence only once on each logical CPU core for best performance.
Inspecting
/proc/cpuinfo
gives you the information you need.cpuinfo:.Q.id {{{(`$x[0])!x[1]}flip {ssr[;"t";""] each trim ":" vs x}each x y}[x] each {{x[0]+til 1+x[1]-x[0]}each flip (0^1+prev x;-1+x)}where x~:""}system"cat /proc/cpuinfo" select processor,physicalid,siblings,coreid,cpucores from cpuinfo processor physicalid siblings coreid cpucores --------------------------------------------- ,"0" ,"0" ,"8" ,"0" ,"4" ,"1" ,"0" ,"8" ,"0" ,"4" ,"2" ,"0" ,"8" ,"1" ,"4" ,"3" ,"0" ,"8" ,"1" ,"4" ,"4" ,"0" ,"8" ,"2" ,"4" ,"5" ,"0" ,"8" ,"2" ,"4" ,"6" ,"0" ,"8" ,"3" ,"4" ,"7" ,"0" ,"8" ,"3" ,"4"
Here the output suggests I should use taskset 0,2,4,6 or 1,3,5,7 so that I only reuse each coreid once.
This way I am getting best usage from your kdb+ licence as each one is taxing a full core without overlap.
It’s possible on another OS/machine the layout may suggest 0-3 or 4-7.
On this example if only kdb+ is running on the machine you may see a fractional perf benefit from disabling hyperthreading. It can depend on various hardware and workload variables.
Lots of discussion available online on the topic:
https://unix.stackexchange.com/questions/57920/how-do-i-know-which-processors-are-physical-cores
Note: some extra checks and verifications should be done if using Virtual Machines as they may not pass through all correct detailed information through to the VM and you may need to verify at the hypervisor level.
If you had unlimited cores licence the answer is a little more nuanced and use case specific in how you may choose to taskset your processes. Or you may choose to let not pin individual processes to cores but instead let the OS move tasks around for best efficiency.
For machines heavily taxing IO hyperthreading can help here specifically with reading of data from disk.
Once on to multi CPU systems NUMA also comes in to play and should be paid attention to.
https://code.kx.com/q/kb/linux-production/#non-uniform-memory-access-numa-hardware
-
There’s a thread with some info on stackoverflow:
Running
strace -p pid
may help show where.Q.chk
is falling down.This is a basic outline of a helper function I created before to debug a
part
error when trying to load a HDB.It may be a good starting point for finding the root cause issue for you too.
// https://code.kx.com/q/ref/system/#capture-stderr-output q)tmp:first system"mktemp" q){d:1_string x;{y:string y;(y;"D"$y),{r:system x;show last r;$[1~"J"$last r;(1b;-1_r;"");(0b;();-1_r)]} "ls ",x,"/",y," > ",tmp," 2>&1;echo $? >> ",tmp,";cat ",tmp}[d] each key x} `:badHDB q)tab:flip `part`date`osError`files`error!flip {d:1_string x;{y:string y;(y;"D"$y),{r:system x;$[0~"J"$last r;(0b;-1_r;"");(1b;();first r)]} "ls ",x,"/",y," > ",tmp," 2>&1;echo $? >> ",tmp,";cat ",tmp}[d] each key x} `:badHDB
The resulting table:part date osError files error ----------------------------------------------------------------------------------------------------------- "2001.01.01" 2001.01.01 0 ,"tab1" "" "2001.01.01" 0 ,"tab2" "" "2002.01.01" 2002.01.01 1 () "ls: cannot open directory 'badHDB/2002.01.01': Permission denied"
For a larger HDB filter down to partitions with issues:select from tab where or[null date;osError]
-
rocuinneagain
MemberMarch 15, 2023 at 12:00 am in reply to: ODBC setup for KDB+ to query other databasesODBC:
- kdb/odbc.pdf at master johnanthonyludlow/kdb GitHub
- kdb/odbc.txt at master KxSystems/kdb GitHub
- kdb/odbc.k at master KxSystems/kdb GitHub
JDBC:
EmbedPy is also an option which has a lot of flexibility.
Some example of Pandas/PyODBC/SQLAlchemy usage below
// https://github.com/KxSystems/embedPy system"\l p.q"; // https://github.com/KxSystems/ml system"\l ml/ml.q"; .ml.loadfile`:init.q; odbc:.p.import[`pyodbc]; pd:.p.import[`pandas]; connectString:";" sv {string[x],"=",y}(.)/:( (`Driver;"{ODBC Driver 17 for SQL Server}"); (`Server;"server.domain.com\DB01"); (`Database;"Data"); (`UID;"KX"); (`PWD;"password") ); connSqlServer:odbc[`:connect][connectString]; data:.ml.df2tab pd[`:read_sql]["SELECT * FROM tableName";connSqlServer]; cursor:connSqlServer[`:cursor][]; cursor[`:execute]["TRUNCATE FROM tableName"]; connSqlServer[`:commit][]; sa:.p.import`sqlalchemy; engine:sa[`:create_engine]["mssql+pyodbc://KX:password@server.domain.com\DB01/Data?driver=ODBC+Driver+17+for+SQL+Server"]; df:.ml.tab2df[data]; //Data to publish df[`:to_sql]["tableName";engine; `if_exists pykw `append;`index pykw 0b];