Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Sign In with Facebook Sign In with Google Sign In with OpenID

Categories

We have migrated to a new platform! Please note that you will need to reset your password to log in (your credentials are still in-tact though). Please contact lee@programmersheaven.com if you have questions.
Welcome to the new platform of Programmer's Heaven! We apologize for the inconvenience caused, if you visited us from a broken link of the previous version. The main reason to move to a new platform is to provide more effective and collaborative experience to you all. Please feel free to experience the new platform and use its exciting features. Contact us for any issue that you need to get clarified. We are more than happy to help you.

compressing a short string

iapiap Posts: 6Member
Is there a way to compress a formatted 20 character string, with limited number of possible characters (64 characters: a-z A-z 0-9 space and comma, to be exact)?
I think the LZH algorithm, or any other indexing algorithm, would not help us here, because the string is not long enough to have patterns.

Do you know a way?

Comments

  • JonathanJonathan Posts: 2,914Member
    : Is there a way to compress a formatted 20 character string, with limited
    : number of possible characters (64 characters: a-z A-z 0-9 space and
    : comma, to be exact)?
    If you want to do it simply, then if you've got 64 possible characters, then you can obviously represent them in 6 bits, saving you 2 bits per character = total saving of 40 bits which is 5 bytes. So you can represent it in 15 bytes rather than 20 without even using a compression algorithm, as such.

    : I think the LZH algorithm, or any other indexing algorithm, would not
    : help us here, because the string is not long enough to have patterns.
    :
    Yeah, Lempel Ziv probably isn't the way if repetitions are unlikely. Have you considered something as simple as Huffman coding? If you can come up with a constant encoding it may work out OK - a variable encoding gives better compression, but then you need to store the encodings table, which will be relatively expensive even if done efficiently! If you can assume vowels are more common you can give them shorter codes.

    Run length encoding? Don't know much about this to be honest. Burrows Wheeler could also be worth a look, but it's kinda complicated from what I've seen.

    Jonathan

    ###
    for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
    (tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
    /(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");

  • iapiap Posts: 6Member
    : If you want to do it simply, then if you've got 64 possible characters, then you can obviously represent them in 6 bits, saving you 2 bits per character = total saving of 40 bits which is 5 bytes. So you can represent it in 15 bytes rather than 20 without even using a compression algorithm, as such.
    :

    I was thinking aboput this solution... But i think I want to explore the algorithmic approach. Just for the sake of learning...

    : Yeah, Lempel Ziv probably isn't the way if repetitions are unlikely. Have you considered something as simple as Huffman coding? If you can come up with a constant encoding it may work out OK - a variable encoding gives better compression, but then you need to store the encodings table, which will be relatively expensive even if done efficiently! If you can assume vowels are more common you can give them shorter codes.
    :
    : Run length encoding? Don't know much about this to be honest. Burrows Wheeler could also be worth a look, but it's kinda complicated from what I've seen.
    :
    : Jonathan
    :
    : ###
    : for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
    : (tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
    : /(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");

    Can you explain briefly about the huffman code? I always thought it relays on patterns and indexes like LZH (acctually, LZH is Lemple Ziv Huffman), Wich we agreed, does not fits here...
Sign In or Register to comment.