compressing a short string - Programmers Heaven

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories

compressing a short string

iapiap Posts: 6Member
Is there a way to compress a formatted 20 character string, with limited number of possible characters (64 characters: a-z A-z 0-9 space and comma, to be exact)?
I think the LZH algorithm, or any other indexing algorithm, would not help us here, because the string is not long enough to have patterns.

Do you know a way?

Comments

  • JonathanJonathan Posts: 2,914Member
    : Is there a way to compress a formatted 20 character string, with limited
    : number of possible characters (64 characters: a-z A-z 0-9 space and
    : comma, to be exact)?
    If you want to do it simply, then if you've got 64 possible characters, then you can obviously represent them in 6 bits, saving you 2 bits per character = total saving of 40 bits which is 5 bytes. So you can represent it in 15 bytes rather than 20 without even using a compression algorithm, as such.

    : I think the LZH algorithm, or any other indexing algorithm, would not
    : help us here, because the string is not long enough to have patterns.
    :
    Yeah, Lempel Ziv probably isn't the way if repetitions are unlikely. Have you considered something as simple as Huffman coding? If you can come up with a constant encoding it may work out OK - a variable encoding gives better compression, but then you need to store the encodings table, which will be relatively expensive even if done efficiently! If you can assume vowels are more common you can give them shorter codes.

    Run length encoding? Don't know much about this to be honest. Burrows Wheeler could also be worth a look, but it's kinda complicated from what I've seen.

    Jonathan

    ###
    for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
    (tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
    /(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");

  • iapiap Posts: 6Member
    : If you want to do it simply, then if you've got 64 possible characters, then you can obviously represent them in 6 bits, saving you 2 bits per character = total saving of 40 bits which is 5 bytes. So you can represent it in 15 bytes rather than 20 without even using a compression algorithm, as such.
    :

    I was thinking aboput this solution... But i think I want to explore the algorithmic approach. Just for the sake of learning...

    : Yeah, Lempel Ziv probably isn't the way if repetitions are unlikely. Have you considered something as simple as Huffman coding? If you can come up with a constant encoding it may work out OK - a variable encoding gives better compression, but then you need to store the encodings table, which will be relatively expensive even if done efficiently! If you can assume vowels are more common you can give them shorter codes.
    :
    : Run length encoding? Don't know much about this to be honest. Burrows Wheeler could also be worth a look, but it's kinda complicated from what I've seen.
    :
    : Jonathan
    :
    : ###
    : for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
    : (tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
    : /(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");

    Can you explain briefly about the huffman code? I always thought it relays on patterns and indexes like LZH (acctually, LZH is Lemple Ziv Huffman), Wich we agreed, does not fits here...
Sign In or Register to comment.