KX Community

Find answers, ask questions, and connect with our KX Community around the world.

Home Forums kdb+ Wordle kdb

  • Wordle kdb

    Posted by pcarroll on January 21, 2022 at 12:00 am

    Hey all,
    Like it no doubt has been for most of you, my last month has been a hazy fever dream of playing wordle with anyone and everyone who will compete with me. For those of you who do not know what I am talking about, I have only one question for you, how is everything going under that rock? 

    If you go to the reading room section of code.kx.com people have supplied cool tools for solving other puzzles out there, like scrabble or (what I am now learning, some people refer to as) klondike #mindblown.

    I wanted to check if anyone is working on a solution for wordle, if you check out the scrabble link there is a cool txt file you can curl to get a word list (supplied by wiki.puzzlers.org) .

    system "curl http://wiki.puzzlers.org/pub/wordlists/unixdict.txt"

    Should get us started, the idea being ‘build a tool kit to give you the optimum word to submit at every point’ and ultimately solve the the puzzle in the fewest number of iterations.

    Obviously, there is some element of luck involved at the opening level, so I am wishing you all bonne chance and bon courage!

    PaulyC

    pcarroll replied 2 months ago 5 Members · 8 Replies
  • 8 Replies
  • pcarroll

    Member
    January 21, 2022 at 11:00 am

    Some updates,

    I did find some minor updates that can be made here:
    1) I found a script of words that are being used in wordle, makes clean-up easier!

    q)wordz:system "curl https://gist.githubusercontent.com/cfreshman/a03ef2cba789d8cf00c08f767e0fad7b/raw/a9e55d7e0c08100ce62133a1fa0d9c4f0f542f2c/wordle-answers-alphabetical.txt";

    2) I was also informed that we could use the following method as curl is linux specific.

    q)wordz:"n" vs .Q.hg"https://gist.githubusercontent.com/cfreshman/a03ef2cba789d8cf00c08f767e0fad7b/raw/a9e55d7e0c08100ce62133a1fa0d9c4f0f542f2c/wordle-answers-alphabetical.txt"

     

  • dbaker

    Member
    January 21, 2022 at 11:00 am

    I love this post, what a fun way to use the power of q to solve the daily challenges of Wordle, great work all!

  • pcarroll

    Member
    January 22, 2022 at 11:00 am
    Here is my morning!
    1) Get me some words…
    q)wordz:system "curl http://wiki.puzzlers.org/pub/wordlists/unixdict.txt";
    2) Think about any any obvious misdirections?
    q)words where words like "*'*" "aren't" "d'etat"
    Okay we need to lose these guys
    q)wordz:wordz where all each wordz in .Q.a;
    And we only want to get the 5 letter words so…
    q)wordz:wordz where 5=count each words
    3) So my next thought was to look at most popular letters in each location for 5 letter words
    q)letFreq:desc each count each 'group each flip wordz 
    q)letFreq 
    "sbcaptmdfglrehwkvnojiquyzx"!435 265 262 216 194 179 174 159 148 146 131 115 .. 
    "aoeriluhntpywcmsdbgxvkfqzj"!490 389 371 325 267 260 247 174 99 94 67 57 50 4.. 
    "aiorenlutsdgmbcpvywfkzhxjq"!385 322 283 275 239 205 198 193 161 114 103 99 8.. 
    "eainrsctolugdmkpvfhbwzyxj"!391 256 251 238 222 204 203 203 192 190 117 101 9.. 
    "eytnlrhasdkopmcgifxwbzuv"!566 427 289 254 198 180 174 169 167 128 127 104 65..

    I decided to flip these just on the off chance it spells out something humorous , also I had to take 26 of each because as you can imagine, no 5 letter word ends in q (for example) so the flipping would give a ‘length error

    q)flip 26#'key each letFreq "saaee" "boiay" "ceoit" "arrnn" "pierl"

    Alas Earwax! It would have pretty cool if that first line was a word right?

    Anyway, my next thought would be to assign scores to letters in words in a given location i.e. if a word starts with s it gets the most points for that location, b is the second most points? Hopefully then we would find the optimum real word? Would love to know if anyone is thinking differently on this?
  • jbetz34

    Member
    January 22, 2022 at 11:00 am

    Hi Paul,

    Awesome brain teaser!
    I was inspired by your scoring method to continue development beyond line 1.

    To begin, I used the same import and sanitize functions that you did:

    wordz:system “curl http://wiki.puzzlers.org/pub/wordlists/unixdict.txt“; wordz:wordz where all (5=count each wordz;all each wordz in .Q.a);

    However, instead of using a dictionary format to calculate letter scores, I used a table format. Table format would simplify the query/filtering process as we continue to guess possible words.

    I started with a base table of all the wordz, separating each letter into a positional column in roman numerals (i.e I,II,III –> 1,2,3).

    // create base word table 
    w:([] word:wordz; I:wordz[;0]; II:wordz[;1]; III:wordz[;2]; IV:wordz[;3]; V:wordz[;4]);
    // note that all columns are string type 
    q) w 
    word I II III IV V 
    --------------------- 
    "aaron" a a r o n 
    "ababa" a b a b a 
    "aback" a b a c k 
    "abase" a b a s e 
    "abash" a b a s h

    Next, to replicate the wordzScore using the table format ,  I created a generic probability function that would accept a table of words similar to the base table and a column to analyze. It returns a dictionary of letter probability within that positional column

    // generic wordzScore function 
    // x - words table 
    // y - column to analyze 
    prb:{{x%sum x} ?[x;();y;(count;`i)]}; 
    q)prb[w;`I] 
    a| 0.06868045 
    b| 0.08426073 
    c| 0.08330684 
    d| 0.05055644 
    e| 0.03561208

    I applied that function to each column,  summed across columns and applied back to the original table:

    // apply probability func across columns 
    s:?[t;();0b;n!{(@;x y;y)}[prb[t]] each n:`I`II`III`IV`V];
    // sum across columns and apply to base table 
    t:update score:(exec sum (I;II;III;IV;V) from s) from t; 
    q) t 
    word I II III IV V score 
    ------------------------------- 
    "sauce" s a u c e 0.6 
    "saute" s a u t e 0.6 
    "salle" s a l l e 0.5974563 
    "caine" c a i n e 0.5971383 
    "slate" s l a t e 0.5879173

    Great! Now we are all caught up to where you left off, but in table format.

    One of the most important aspects to wordle is understanding and properly responding to the clues that the game will give after each guess. There are 3 options for each letter:

    • it is not in the word
    • it is in the word, but not in correct position
    • it is in the word and in the correct position

    We can use these clues to limit our words table and recalculate the wordzScore at each step to make sure we take the best guess at each opportunity. I viewed this in 3 parts:

    1. Record guesses and clues from game
    2. Build filter clauses from these clues
    3. Use these filters to recalculate wordzScore

    Starting with the easiest step, #3. Lets put the wordzScore calculation in a function that accepts a parse tree of ‘where’ clauses as an argument:

    // where w exists globally and is base word table 
    // where prb exists globally and is letter probability function 
    topList:{[wc] t:?[`w;wc;0b;()]; s:?[t;();0b;n!{(@;x y;y)}[prb[t]] each n:`I`II`III`IV`V]; 
    t:update score:(exec sum (I;II;III;IV;V) from s) from t; `score xdesc t }

    Then, lets create a table where we can record our guesses, clues from the game and the resulting word filters from that guess (can you tell I love tables?). Simple enough:

    guessTable:([]guess:();clues:();wc:())

    Now comes the tricky part, how to we build a function to generate filters based on the clues. Here’s what I came up with. It is not the most elegant and can definitely be optimized, but it gets the job done.

    // expecting string guess (g) 
    // list of longs corresponding to clues (c) 
    // 0 - not in word; 
    // -1 - correct letter, wrong spot; 
    // 1 - correct letter, correct spot; 
    guess:{[g;c] // if letter is in wrong spot, filter out words where that letter is in that position column 
    wc:{(not;(in;`I`II`III`IV`V@y;x y))}[g] each where c=-1; // if letter is in wrong spot, also filter for words where that letter is in the word column 
    wc,:{(in/:;x y;`word)}[g] each where c=-1; // if letter is not in word, filter out words with that letter in the word column 
    wc,:{(not;(in/:;x y;`word))}[g] each where c=0; // if letter is in the right spot, filter for words with that letter in that position column 
    wc,:{(in;`I`II`III`IV`V@y;x y)}[g] each where c=1; // upsert the guess, clues and filters to the guessTable and display 
    `guessTable upsert enlist (g;c;wc); :guessTable }

    Tying it all together, lets try it out on a sample 5 letter word like “water”:

    q)l qWordle.q 
    q)1#topList() 
    word I II III IV V score 
    --------------------------- 
    "sauce" s a u c e 0.6 // manually entering clues 
    q)guess["sauce";0 1 0 0 -1] 
    guess clues wc 
    ----------------------------------------------------------------------------------------------------------------------------------- 
    "sauce" 0 1 0 0 -1 (~:;(in;`V;"e")) (in/:;"e";`word) (~:;(in/:;"s";`word)) (~:;(in/:;"u";`word)) (~:;(in/:;"c";`word)) (in;`II;"a") 
    q)1#topList raze guessTable`wc 
    word I II III IV V score 
    ----------------------------- 
    "hater" h a t e r 2.62069 // manually entering clues 
    q)guess["hater";0 1 1 1 1] 
    guess clues wc 
    ------------------------------------------------------------------------------------------------------------------------------------- 
    "sauce" 0 1 0 0 -1 ((~:;(in;`V;"e"));(in/:;"e";`word);(~:;(in/:;"s";`word));(~:;(in/:;"u";`word));(~:;(in/:;"c";`word));(in;`II;"a")) 
    "hater" 0 1 1 1 1 ((~:;(in/:;"h";`word));(in;`II;"a");(in;`III;"t");(in;`IV;"e");(in;`V;"r")) 
    q)topList raze guessTable`wc topList raze guessTable`wc 
    word I II III IV V score 
    ------------------------------ 
    "bater" b a t e r 4.111111 
    "dater" d a t e r 4.111111 
    "eater" e a t e r 4.111111 
    "later" l a t e r 4.111111 
    "mater" m a t e r 4.111111 
    "pater" p a t e r 4.111111 
    "rater" r a t e r 4.111111 
    "tater" t a t e r 4.111111 
    "water" w a t e r 4.111111

     

    At this rate, I don’t think we will get it in 6 guesses. But who knew there were so many “*ater” words?
    I am sure there are many improvements to be made here, but I have also spent too much time on this today.

    Thanks Paul, looking forward to more awesome content like this.

    – James

  • rocuinneagain

    Member
    January 22, 2022 at 11:00 am

    I didn’t fully read the question, got distracted by the idea of a q implementation of the game.

    https://github.com/rianoc/qWordle

  • pcarroll

    Member
    January 23, 2022 at 11:00 am

    Folks, I might need to add this to the JIRA board as it is now becoming priority numero uno, but I have an interesting Friday conclusion!

    1) So, last we left off, we had counted the letter usage in a given position, I decided to divide this by the number of words to assign scores to each letter in each position

    // some sanity checks 
    q)sum each letFreq 3145 3145 3145 3145 3145 
    q)count wordz 3145 // looks good 
    q)letScore:letFreq%3145 
    q)letScore 
    "sbcaptmdfglrehwkvnojiquyzx"!0.1383148 0.08426073 0.08330684 0.06868045 0.061.. 
    "aoeriluhntpywcmsdbgxvkfqzj"!0.1558029 0.1236884 0.117965 0.1033386 0.0848966.. 
    "aiorenlutsdgmbcpvywfkzhxjq"!0.1224165 0.1023847 0.0899841 0.08744038 0.07599.. 
    "eainrsctolugdmkpvfhbwzyxj"!0.1243243 0.08139905 0.07980922 0.07567568 0.0705.. 
    "eytnlrhasdkopmcgifxwbzuv"!0.1799682 0.1357711 0.09189189 0.08076312 0.062957..
    I feel like this makes senses as an e in position five should be more valuable based on frequency than an a in position three. Although I might be putting the  in front of the .
    2) I then compared all the 5 letter words to find out the score of each letter in each word
    q)wordzScore:wordz!@'[letScore;]each wordz 
    "aaron"| 0.06868045 0.1558029 0.08744038 0.06104928 0.08076312 
    "ababa"| 0.06868045 0.0063593 0.1224165 0.01240064 0.05373609 
    "aback"| 0.06868045 0.0063593 0.1224165 0.0645469 0.04038156 
    "abase"| 0.06868045 0.0063593 0.1224165 0.06486486 0.1799682 
    "abash"| 0.06868045 0.0063593 0.1224165 0.06486486 0.05532591

    3) And if we sum these and desc sort we get the following

    q)desc sum each wordzScore 
    "sauce"| 0.6 
    "saute"| 0.6 
    "salle"| 0.5974563 
    "caine"| 0.5971383 
    "slate"| 0.5879173

    So there we have it! Are sauce and saute the optimum starting words? If not they are certainly big contenders for “most popular baby names for twins in 2022!”  I will work on how we solve after line 1, but for now I must take a break before MB Games and Hasbro take action against me.

    Paulyc

  • rocuinneagain

    Member
    January 24, 2022 at 11:00 am

    Another kdb implementation of the game:  https://github.com/psaris/wordle

  • psaris

    Member
    January 25, 2022 at 11:00 am

    https://github.com/psaris/wordle  demonstrates how wordle is an extension to the mastermind game (https://github.com/psaris/mm).

    the README.md describes the mastermind algorithm and how it can be used to solve wordle.

    an example demonstrating how the algorithm guesses a random word is demonstrated below:

     

    q)l mm/mm.q 
    q)l wordle.q 
    q).mm.score:.mm.veca .wordle.scr 
    q)C:asc upper read0 `:answers.txt 
    q)G:asc C,upper read0 `:guesses.txt 
    q)g:"SOARE" 
    q)a:.mm.onestep `.mm.maxent 
    q).mm.summary each .mm.game[a;G;C;g] 
    rand C n guess score 
    -------------------- 
    2309 "SOARE" " G G" 28 "GLITZ" " " 
    8 "HEAVE" " GG G" 1 "PEACE" "GGGGG"

     

    the speed at finding the optimal solution depends critically on the scoring function.

    this is my implementation.  eager for improvements!

     

    / redefine the mastermind scoring functions 
    .mm.scr:{[g;c] g[w:(i:group e_g=c) 1b]:" "; 
    / identify and skip where equal 
    i@:where count[c]>i:g ? c i 0b; 
    / identify where misplaced 
    s:@[" G" e;i except w;:;"Y"]; / generate score s}

     

    sample usage:

    q).mm.scr["RIGHT";"RIGHT"] "GGGGG" 
    q).mm.scr["RIGHT";"WRONG"] "Y Y "

     

    and an example with duplicate letters:

    q).mm.scr["RIITE";"RIGHT"] "GG Y " 
    q).mm.scr["RIGHT";"RIITE"] "GG Y"

     

Log in to reply.