KX Community

Find answers, ask questions, and connect with our KX Community around the world.
KX Community Guidelines

Home Forums KDB.AI KDB.AI Cosine similarity

  • KDB.AI Cosine similarity

    Posted by hari__ on January 18, 2024 at 12:00 am

    When using negative values in vectors, the cosine similarity is calculated incorrectly. The distance is more than 1 or less than -1 more than often if the vector contains negative values. The vector dimension size we use is 768.

    hari__ replied 9 months, 2 weeks ago 2 Members · 4 Replies
  • 4 Replies
  • Laura

    Administrator
    January 19, 2024 at 12:00 am

    * editing after speaking to KDB.AI team *

    Hi  ,

    Thanks for your message. Can you confirm you’re using ‘metric’: ‘CS’ in your definition to assign cosine similarity?

    And can you give us some more information to we can try recreate this?

    • index config
    • server version
    • client version
    • descriptive stats on inserted data data (dimension, outliers, presence of constant vectors)
    • descriptive stats on query data data (dimension, outliers, presence of constant vectors)
    • are there any outlier values in the data
    Would also like to know
    • if indexes look off or is it just distances
    • how far off 1/-1 are distances

    Feel free to email me lkerr@kx.com,  or send me a DM in the community if you’d prefer to.

    Thanks,

    Laura

  • hari__

    Member
    January 19, 2024 at 12:00 am

    the dimension size for the above one is 768

  • hari__

    Member
    January 19, 2024 at 12:00 am

    Schema config: “

    'columns': [{
    'name': 'stock_name',
    'pytype': 'str'
    },
    {
    'name': 'Date',
    'pytype': 'str'
    },
    {'name': 'embeddings',
                           'vectorIndex': {'dims': len(tables[table_name][stock_names[0]]['embeddings'].iloc[0]), 'metric': 'CS', 'type': 'flat'} 
                        }
    I’m using kdb ai cloud version, and the outliers are negative values below -30e-3. No major outliers as this is an output of distilbert.  The distances look off, manually calculating it, it seems like its not scaled (dot product and cosine similarity give me similar distances). The distance is as high as 60
  • Laura

    Administrator
    January 22, 2024 at 12:00 am

    Hi  ,

    Thanks for reaching out about this topic!

    After investigation from the KDB.AI team, I can confirm they were able to replicate your issue, and this will be addressed in an upcoming release.

    Keep an eye on KDB.AI Cloud Release Notes to see when the next release is.

    Thanks again for reaching out!

    Laura

Log in to reply.