KX Community

Find answers, ask questions, and connect with our KX Community around the world.
KX Community Guidelines

Home Forums KX Academy Issues with Filling Missing Values in Dictionaries

  • Issues with Filling Missing Values in Dictionaries

    Posted by mn_12 on April 15, 2023 at 12:00 am

    Hi,

     

    I am new to q/kdb+ and working on the exercises (more specifically, the segment Dictionary mapping) for the module Dictionaries.

    At present, I am uncertain about a few things even after spending much time pondering on them.

     

    1.

    newDict is:

    Italy        | "eu" 
    Spain        | "eu" 
    Norway       | "eu" 
    Brazil       | "sa" 
    United States| "us" 
    Yemen        | "as" 
    Mexico       | "sa" 
    Albania      | "eu" 
    Japan        | "as" 
    Australia    | ""

    The question requires us to fill in any missing values in newDict to “na”.

    I tried “na”^newDict but it results in evaluation error: length. After reading the documentation for the fill operator ^, I am still unable to understand why and do not know if it is even possible to use this operator in this case.

    (The solution provided is newDict[where 0=count each newDict]:enlist”na”. )

     

    2.

    To better understand the solution provided, I created a dictionary myDict:

    1| "eu" 
    2| "" 
    3| "us" 
    4| ""

    I noticed that myDict[where 0=count each myDict]:enlist”na” results in evaluation error: length.

    It works only if there is one missing value in myDict but not more which I do not understand why.

    (This means if newDict (in 1.) had more than one missing value, the solution provided would not work.)

    myDict[where 0=count each myDict]:(“na”;”na”) works. However, I am unable to find an efficient way to do so if myDict has numerous associations and many missing values.

     

    3.

    My guess as to why the fill operator ^ does not work for both newDict and myDict is that the values are strings.

    I created one more dictionary mySecondDict:

    col1|      
    col2|  5 6 
    col3|

    99^mySecondDict works and I get:

    col1| 99 99 99 
    col2| 99 5  6  
    col3| 99 99 99

    However, I am uncertain if my guess is correct.

     

    I would really appreciate any assistance with these issues.

     

    Thank you very much.

    mn_12 replied 3 months, 2 weeks ago 3 Members · 3 Replies
  • 3 Replies
  • gyorokpeter-kx

    Member
    April 15, 2023 at 12:00 am

    1. The fill operator works on atomic values. When you try to use it between a dictionary and a string, it attempts to match up each character in the string with the corresponding element in the dictionary, so it tries to do “eu”^”n” (first element) and “eu”^”a” (second element) but there it can’t go further as the list on the right doesn’t have enough elements, but even if it did, this is not the operation that you want. Since a string is a list of characters, ^ cannot be used to fill an empty one with a non-empty one like it would be possible with atoms. So your only option is to check the length of the string and replace it if it’s zero.

    2. When assigning by index, the item on the right must have the same number of elements as the number of indices you are assigning. Once again there is special handling for atoms in that you can assign them to more than one index at once, but a string is a list, so that doesn’t work. Instead you need to provide as many copies of the string as there are indices. It can be simplified a bit by assigning the index list to an intermediate variable that you can take the count of:

    ind:where 0=count each myDict; 
    myDict[ind]:count[ind]#enlist"na"

    3. As explained in 1., fill works element-wise, so the 99 is expanded to match the size of the dictionary (3×3) and then every element is filled in with the 99 individually. This works because the atom can extend to any size, but lists don’t auto-extend so they must match in length.

  • roryk

    Member
    April 15, 2023 at 12:00 am

    You can use fill, you just have to convert to symbols and back.

    string `na^`$d

    The provided solution is probably better though.

  • gyorokpeter-kx

    Member
    April 17, 2023 at 12:00 am

    Note that there should be some consideration whether using a symbol is appropriate for the data in question. Chances are, if it is, you should already have been using symbols in the first place. Casting arbitrary strings to symbols will cause them to persist in memory until the process exits. This might not have an impact if you are just trying out toy examples, but it can lead to “symbol leak” if it happens on a production system that processes a large amount of data.

Log in to reply.