KX Community

Find answers, ask questions, and connect with our KX Community around the world.

Home Forums kdb+ md5 – getting the original string back.

  • md5 – getting the original string back.

    Posted by bindhusri on June 15, 2023 at 12:00 am

    Hello,

     

    Based on this link https://code.kx.com/q/ref/md5/

    I understand using md5 keyword we could convert a string into an encoded value. Was wondering if there is a way to get the original string back.

     

    Any inputs on this would be great. Thanks in advance!

     

    bindhusri replied 2 months, 1 week ago 4 Members · 4 Replies
  • 4 Replies
  • darrenwsun

    Member
    June 15, 2023 at 12:00 am

    I don’t think the reverse is readily available in q. The md5 encoding is designed to be difficult (not impossible though as hardware develops and new algorithms are found) to reverse, and there won’t be a single answer about the original input.

    I assume you have a column of such md5 encoded values stored and you wish to get the original values. Either the original values should also be persisted, or if they are not supposed to be persisted as plain text (say they are passwords), then perhaps we’re not supposed to decode them anyway.

  • gyorokpeter-kx

    Member
    June 15, 2023 at 12:00 am

    It would be helpful if you could describe your use case.

    MD5 is a hash function. The main characteristic of a hash function is that it reduces the amount of information contained in the input. There is no way to get back the exact original data as there are infinitely many possible byte sequences that lead to the same hash. Usually when you want to use the hash for anything, it’s for comparing against another hash to check if the two are equal (an example is downloading a file, where you can check if your download is corrupted by comparing its hash to the one found on the website).

  • Laura

    Administrator
    June 16, 2023 at 12:00 am

    The other comments address well that md5 is a hashing function which is implicitly non-reversible. It’s also not a very secure hashing algorithm as it’s vulnerable to both collision attacks and length extension attacks (more reading here for interest).

    Very much dependent on your use case, if there’s a known fixed list of strings that the original messages can be you can “decode” the data. Take the scenario where there are 2 users in a message chat, Alice and Bob, and we have a table that has those users md5 hashed to hide their identity, but we know prior that the users are Alice and Bob. You can see who sent what like this:

    t:([]time:10#.z.p;users:10?`Alice`Bob;message:10?10); 
    t:update users:{md5 x} each string users from t; 
    lookup:(md5 "Alice";md5 "Bob")!`Alice`Bob; 
    t:update users:lookup[users] from t;

    This can be extended to any number of users (and other use cases) but has the prerequisite of there being a known fixed list of users.

    If however, your use case is that you are storing large amounts of data and don’t want a lookup like this, I’d suggest not using MD5 for your encryption at rest but encrypting using public/private keys so you are able to decrypt your data while still having security at rest. You could use say OpenSSL and integrate with KDB/Q (example of encrypting/decrypting data here).

    Unfortunately (or rather fortunately) if you don’t know what the values are in your table and are trying to retrieve them knowing they’ve been md5 hashed this isn’t possible outside of brute-force attacks, the likes of which underpin password cracking.

    I hope one of these responses addresses your use case!

  • bindhusri

    Member
    June 20, 2023 at 12:00 am

    Thanks for your responses. I got a new understanding regarding md5.

Log in to reply.