

pcarroll
Forum Replies Created
-
pcarroll
MemberMay 24, 2022 at 12:00 am in reply to: What is the role played by key columns in a keyed table [ query join/performance ] ?Just wanted to codify some of the above:
q)tab:([]sym:-50000?`6;px:50000?10) q)ktab:`sym xkey tab // wanted to show the last record to convince that if kdb // has to search through entire list, no speed gain is evident q)-1#tab sym px --------- obafmn 6 q)-1#ktab sym | px ------| -- obafmn| 6 q)\ts do[100000;select from tab where sym=`obafmn] 2034 66240 q)\ts do[100000;select from ktab where sym=`obafmn] 2051 66240 // below no speed gain, but memory usage is reduced q)\ts do[100000;ktab`obafmn] 2079 960 // however this does mean any other *valid* request would be faster q)rand key ktab sym| kfpfmd q)\ts do[100000;ktab`kfpfmd] 1169 960 // showing point above with grouped attribute applied to keyed column q)gktab:`sym xkey update `g#sym from tab q)\ts do[100000;select from gktab where sym=`obafmn] 89 1808 // final point I wanted to make showing duplicate keys q)`a`a!(1;1) a| 1 a| 1
Thanks and for contributing here.
-
If you included the full command it would help more . Generally speaking there might be permission issues for you to access the bucket and/or issues with the loading of the table. Some good points to confirm on this would be…
- Can you use aws cli to copy the table outside of q prompt?
- Can you load the table if it were stored on file storage rather than local?
-
Folks, I might need to add this to the JIRA board as it is now becoming priority numero uno, but I have an interesting Friday conclusion!
1) So, last we left off, we had counted the letter usage in a given position, I decided to divide this by the number of words to assign scores to each letter in each position
// some sanity checks q)sum each letFreq 3145 3145 3145 3145 3145 q)count wordz 3145 // looks good q)letScore:letFreq%3145 q)letScore "sbcaptmdfglrehwkvnojiquyzx"!0.1383148 0.08426073 0.08330684 0.06868045 0.061.. "aoeriluhntpywcmsdbgxvkfqzj"!0.1558029 0.1236884 0.117965 0.1033386 0.0848966.. "aiorenlutsdgmbcpvywfkzhxjq"!0.1224165 0.1023847 0.0899841 0.08744038 0.07599.. "eainrsctolugdmkpvfhbwzyxj"!0.1243243 0.08139905 0.07980922 0.07567568 0.0705.. "eytnlrhasdkopmcgifxwbzuv"!0.1799682 0.1357711 0.09189189 0.08076312 0.062957..
I feel like this makes senses as an e in position five should be more valuable based on frequency than an a in position three. Although I might be putting the in front of the .2) I then compared all the 5 letter words to find out the score of each letter in each wordq)wordzScore:wordz!@'[letScore;]each wordz "aaron"| 0.06868045 0.1558029 0.08744038 0.06104928 0.08076312 "ababa"| 0.06868045 0.0063593 0.1224165 0.01240064 0.05373609 "aback"| 0.06868045 0.0063593 0.1224165 0.0645469 0.04038156 "abase"| 0.06868045 0.0063593 0.1224165 0.06486486 0.1799682 "abash"| 0.06868045 0.0063593 0.1224165 0.06486486 0.05532591
3) And if we sum these and desc sort we get the following
q)desc sum each wordzScore "sauce"| 0.6 "saute"| 0.6 "salle"| 0.5974563 "caine"| 0.5971383 "slate"| 0.5879173
So there we have it! Are sauce and saute the optimum starting words? If not they are certainly big contenders for “most popular baby names for twins in 2022!” I will work on how we solve after line 1, but for now I must take a break before MB Games and Hasbro take action against me.
Paulyc
-
Here is my morning!1) Get me some words…
q)wordz:system "curl http://wiki.puzzlers.org/pub/wordlists/unixdict.txt";
2) Think about any any obvious misdirections?q)words where words like "*'*" "aren't" "d'etat"
Okay we need to lose these guysq)wordz:wordz where all each wordz in .Q.a;
And we only want to get the 5 letter words so…q)wordz:wordz where 5=count each words
3) So my next thought was to look at most popular letters in each location for 5 letter wordsq)letFreq:desc each count each 'group each flip wordz q)letFreq "sbcaptmdfglrehwkvnojiquyzx"!435 265 262 216 194 179 174 159 148 146 131 115 .. "aoeriluhntpywcmsdbgxvkfqzj"!490 389 371 325 267 260 247 174 99 94 67 57 50 4.. "aiorenlutsdgmbcpvywfkzhxjq"!385 322 283 275 239 205 198 193 161 114 103 99 8.. "eainrsctolugdmkpvfhbwzyxj"!391 256 251 238 222 204 203 203 192 190 117 101 9.. "eytnlrhasdkopmcgifxwbzuv"!566 427 289 254 198 180 174 169 167 128 127 104 65..
I decided to flip these just on the off chance it spells out something humorous , also I had to take 26 of each because as you can imagine, no 5 letter word ends in q (for example) so the flipping would give a ‘length error
q)flip 26#'key each letFreq "saaee" "boiay" "ceoit" "arrnn" "pierl"
Alas Earwax! It would have pretty cool if that first line was a word right?
Anyway, my next thought would be to assign scores to letters in words in a given location i.e. if a word starts with s it gets the most points for that location, b is the second most points? Hopefully then we would find the optimum real word? Would love to know if anyone is thinking differently on this? -
Some updates,
I did find some minor updates that can be made here:
1) I found a script of words that are being used in wordle, makes clean-up easier!q)wordz:system "curl https://gist.githubusercontent.com/cfreshman/a03ef2cba789d8cf00c08f767e0fad7b/raw/a9e55d7e0c08100ce62133a1fa0d9c4f0f542f2c/wordle-answers-alphabetical.txt";
2) I was also informed that we could use the following method as curl is linux specific.
q)wordz:"n" vs .Q.hg"https://gist.githubusercontent.com/cfreshman/a03ef2cba789d8cf00c08f767e0fad7b/raw/a9e55d7e0c08100ce62133a1fa0d9c4f0f542f2c/wordle-answers-alphabetical.txt"
-
Also, this just brings up good questions about symbols versus strings & when and why we use them.
Many languages including q intern strings so that only one copy of any distinct value is stored. Any other references to the same string are stored as just that, references.
This helps with memory management, and improves performance on equality comparison i.e.// assuming col1 is symbol column q)select from tab where col1 in `a`b
Storing the same string e.g. “New York Stock Exchange” repeatedly in memory would require a greater amount of memory than its symbol counterpart.
Because q typically stores the string pool entirely in memory (that’s what your sym file is in a typical kdb+ database) consideration should be given as to what columns are symbol and which are string.
If the string list contains many short repeated values this may be a good candidate for casting to a symbol