Deletion of repetition words in a file - Programmers Heaven

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories

Welcome to the new platform of Programmer's Heaven! We apologize for the inconvenience caused, if you visited us from a broken link of the previous version. The main reason to move to a new platform is to provide more effective and collaborative experience to you all. Please feel free to experience the new platform and use its exciting features. Contact us for any issue that you need to get clarified. We are more than happy to help you.

Deletion of repetition words in a file

meshtebmeshteb Posts: 47Member
Compliments Everyone!!

How do I delete repetition words in a file.If my file is like this:

ABE
ABE
TWO
TWO
TWO
IT
IT
IT
IT

So I want it to have only:
ABE
TWO
IT

Thanx in advance.

Comments

  • JonathanJonathan Posts: 2,914Member
    : Compliments Everyone!!
    :
    : How do I delete repetition words in a file.If my file is like this:
    :
    : ABE
    : ABE
    : TWO
    : TWO
    : TWO
    : IT
    : IT
    : IT
    : IT
    :
    : So I want it to have only:
    : ABE
    : TWO
    : IT
    Put the words in an array, one word in each element. Then use a hash to keep track of words you've seen and use grep to only copy those you haven't. For example:-

    my %temp = ();
    @uniques = grep { $temp{$_}++ == 0 } @allwords;

    Jonathan

    ###
    for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
    (tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
    /(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");

  • XfactorXfactor Posts: 343Member
    : : Compliments Everyone!!
    : :
    : : How do I delete repetition words in a file.If my file is like this:
    : :
    : : ABE
    : : ABE
    : : TWO
    : : TWO
    : : TWO
    : : IT
    : : IT
    : : IT
    : : IT
    : :
    : : So I want it to have only:
    : : ABE
    : : TWO
    : : IT
    : Put the words in an array, one word in each element. Then use a hash to keep track of words you've seen and use grep to only copy those you haven't. For example:-
    :
    : my %temp = ();
    : @uniques = grep { $temp{$_}++ == 0 } @allwords;
    :
    : Jonathan

    Correct me if I'm wrong but the way Jonathan has it, it will only select the words that have repititions. The way I understand it is to have every word only show up once in the file, even if the word is used once.
    If this is the case, then you can take out the grep part of that code and have:
    @uniques = { $temp{$_}++ == 0 } @allwords;
    I think that will work. If not, then I know someone will be able to tell you right. If what Jonathan gave you is what you are looking for, then I guess you have another way to look at it.
  • JonathanJonathan Posts: 2,914Member
    : : : Compliments Everyone!!
    : : :
    : : : How do I delete repetition words in a file.If my file is like this:
    : : :
    : : : ABE
    : : : ABE
    : : : TWO
    : : : TWO
    : : : TWO
    : : : IT
    : : : IT
    : : : IT
    : : : IT
    : : :
    : : : So I want it to have only:
    : : : ABE
    : : : TWO
    : : : IT
    : : Put the words in an array, one word in each element. Then use a hash to keep track of words you've seen and use grep to only copy those you haven't. For example:-
    : :
    : : my %temp = ();
    : : @uniques = grep { $temp{$_}++ == 0 } @allwords;
    : :
    : : Jonathan
    :
    : Correct me if I'm wrong but the way Jonathan has it, it will only
    : select the words that have repititions. The way I understand it is
    : to have every word only show up once in the file, even if the word
    : is used once.
    It will show up if only used once. Remember that the ++ operator casues evaluation before incrementation. If you evaluate a (currently) non-existent hash value in numeric context you get 0, 0 == 0 is true so the item is copied into @uniques. If it's seen again, we hash value for that word exists and is 1. 1 == 0 is false, so it's not included in the list again.

    A nice side effect of this method is that you get a hash that tells you how many times each word appears as a bonus.

    I didn't test it before posting because I was pretty certain it was right - I have tested it with this script:-

    my @allwords = qw(it it and it be poo and hey);
    my %temp = ();
    @uniques = grep { $temp{$_}++ == 0 } @allwords;
    print "$_
    " for (@uniques);

    And when I run that I get:-

    it
    and
    be
    poo
    hey

    : If this is the case, then you can take out the grep part of that
    : code and have:
    : @uniques = { $temp{$_}++ == 0 } @allwords;
    : I think that will work. If not, then I know someone will be able to
    : tell you right. If what Jonathan gave you is what you are looking
    : for, then I guess you have another way to look at it.
    Tried that out of curiosity (didn't think it looked right)...and...

    Array found where operator expected at test.pl line 4, near "} "
    (Missing operator before ?)
    syntax error at test.pl line 4, near "} @allwords"
    Execution of test.pl aborted due to compilation errors.

    I *think* what I wrote does solve the problem - but I may still be missing something obvious. :-)

    Jonathan

    ###
    for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
    (tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
    /(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");

  • WeirdofreakWeirdofreak Posts: 439Member
    Personally, I'd use a regex - something like s/(w+)(?=.*?1)//g
  • JonathanJonathan Posts: 2,914Member
    : Personally, I'd use a regex - something like s/(w+)(?=.*?1)//g
    Then split on the whitespace if you want the words in an array...nice. :-) But you'd have had to join them into a single scalar first - well, you can do that at the file read I guess using the diamond operator. But you get the words seperated by a newline character so to use your method you'd have to add the s modifier.

    Either way works, the context of the problem will determine which is most suitable.

    TMTOWTDI.

    Jonathan

    ###
    for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
    (tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
    /(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");

  • meshtebmeshteb Posts: 47Member
    Thanx for your responses. I've attached a code below,which it does not give me required results.It does not delete repetitive words.I dont know where I went wrong.

    #!/usr/local/bin/perl -w

    my $words="C:/WINNT/Profiles/Administrator/Desktop/testwordlist.txt";

    my $out="D:/output.txt";

    # open prompt file
    open my $fh, "< $words" || die ("Unable to open prompt file $prompt");

    open my $wfh, "> $out" || die ("Can't create output file!");

    while ( $words = <$fh>)
    {
    my %temp = ();
    @uniques = grep { $temp{$_}++ == 0 } $words;

    print $wfh "$_" for (@uniques);
    }

    close $fh;
    close $wfh;



    : : Personally, I'd use a regex - something like s/(w+)(?=.*?1)//g
    : Then split on the whitespace if you want the words in an array...nice. :-) But you'd have had to join them into a single scalar first - well, you can do that at the file read I guess using the diamond operator. But you get the words seperated by a newline character so to use your method you'd have to add the s modifier.
    :
    : Either way works, the context of the problem will determine which is most suitable.
    :
    : TMTOWTDI.
    :
    : Jonathan
    :
    : ###
    : for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
    : (tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
    : /(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");
    :
    :

  • JonathanJonathan Posts: 2,914Member
    : Thanx for your responses. I've attached a code below,which it does
    : not give me required results.It does not delete repetitive words.I
    : dont know where I went wrong.
    Hmmm...let's see...

    : while ( $words = <$fh>)
    : {
    : my %temp = ();
    : @uniques = grep { $temp{$_}++ == 0 } $words;
    :
    : print $wfh "$_" for (@uniques);
    : }
    Here's your problem. You have to provide grep with an array - look it up in the docs. So you collect the words into an array first:-

    my @words = <$fh>; # Read every word into an array
    close $fh; # We're done reading now, close the file.
    my %temp = ();
    @uniques = grep { $temp{$_}++ == 0 } $words;
    print $wfh "$_" for (@uniques);

    You don't use a while loop to read the file here, you grab it all in one go.

    Jonathan

    ###
    for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
    (tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
    /(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");

  • meshtebmeshteb Posts: 47Member
    The code you send me only ouput the following line as the content of the output file:

    C:/WINNT/Profiles/Administrator/Desktop/testwordlist.txt

    I was using the following code,what is wrong here?

    #!/usr/local/bin/perl -w

    my $words="C:/WINNT/Profiles/Administrator/Desktop/testwordlist.txt";

    my $out="D:/output.txt";

    # open prompt file
    open my $fh, "< $words" || die ("Unable to open prompt file $prompt");

    open my $wfh, "> $out" || die ("Can't create output file!");

    # Read every word into an array
    my @words = <$fh>;
    close $fh;

    my %temp = ();
    @uniques = grep { $temp{$_}++ == 0 } $words;
    print $wfh "$_" for (@uniques);

    close $wfh;








  • JonathanJonathan Posts: 2,914Member
    : The code you send me only ouput the following line as the content of the output file:
    :
    : C:/WINNT/Profiles/Administrator/Desktop/testwordlist.txt
    :
    : I was using the following code,what is wrong here?
    :
    : #!/usr/local/bin/perl -w
    :
    : my $words="C:/WINNT/Profiles/Administrator/Desktop/testwordlist.txt";
    :
    : my $out="D:/output.txt";
    :
    : # open prompt file
    : open my $fh, "< $words" || die ("Unable to open prompt file $prompt");
    :
    : open my $wfh, "> $out" || die ("Can't create output file!");
    :
    : # Read every word into an array
    : my @words = <$fh>;
    : close $fh;
    :
    : my %temp = ();
    : @uniques = grep { $temp{$_}++ == 0 } $words;
    ARGH! Sorry, I blindly copied your smaller mistake while fixing your bigger one. This line should be:-
    @uniques = grep { $temp{$_}++ == 0 } @words;

    : print $wfh "$_" for (@uniques);
    :
    : close $wfh;
    Should work now.

    Jonathan

    ###
    for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
    (tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
    /(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");

Sign In or Register to comment.