Perl script to check doc_b against doc_a for inconsistence

Hi folks,

I'm going to make a script checking inconsistence on 2 documents, say doc_a and doc_b and have no idea how to start.

doc_b is reproduced from doc_a, (original document) not with 'copy and paste' command.

Making it simple first, as highlighted in following example, an one line document:-

Original document "doc_a"[code]Check this link to sea what scannars are supported by SANE[/code]
Already having 2 typing mistakes

The reproduced document "doc_b" must maintain these 2 mistakes for consistence.
[code]check thes link to sea what scannars are suppurted by SeNE[/code]
Unfortunately another 3 typing mistakes were further made;

What I expect to have in the printout is;[code]
Original Mistake Line No. Word No.
this thes 1 2
supported suppurted 1 9
SANE SeNE 1 11[/code]
not just printing out their contents and saying "differ"

Kindly advise how to start. TIA



  • Well, you may want to check out diff first, if you're doing this because you need a tool rather than because you feel like it. It's more suited to large documents with big differences though, rather than trying to catch individual words.

    First you'd want to split each line into the separate words. If the words are different, print them out with the line/word number. You may want to use tabstops to align them, or [grey]" " x 10 - (length $word)[/grey]. If the array from the original file is longer than the other one, print out [grey]splice @arr1, $#arr2[/grey] with a message saying that that's what's missing, and if @arr2 is longer, do [grey]split @arr2, $#arr1[/grey] and say that it shouldn't be there. It won't be very good if you add a word in the middle of a line ("foo bar baz quux" becoming "foo bar baz blech quux" will tell you that quux shouldn't be there, rather than blech), but it should suffice.

    You'll probably want to not store the file in an array to save memory. Instead, do [grey]while (<$file>) { ... }[/qrey] unless you need to keep it for some reason.
  • Hi

    Tks for your advice.

    I'm only a newbei on Perl. I'll use following as starting point.

    open (FILE, "doc_a.txt") or die;
    @doc_a = ;
    close FILE;

    open (FILE, "doc_b.txt") or die;
    @doc_b = ;
    close FILE;

    $n_a = @doc_a;
    $n_b = @doc_b;

    if ($n_a != $n_b) {
    print "Error: documents are not the same length";

    else {
    for my $i (0 .. $#doc_a) {
    my @line_a = split(/ /,$doc_a[$i]);
    my @line_b = split(/ /,$doc_b[$i]);

    - End -

    I have not resolved how to have the mistakes (mistyping words on doc_b, inconsistent to doc_a) printed out in a table with line number and word number as demonstrated in my first posting. Could you please give me some suggestion. Tks.

    : You'll probably want to not store the file in an array to save memory. Instead, do [grey]while (<$file>) { ... }[/qrey] unless you need to keep it for some reason.

    Could you please advise me more detail how to achieve it. TIA

  • Untested, but try this.

    [code]print "Original Mistake Line Word
    my $short = ($#line_a > $#line_b ? $#line_b : $#line_a);
    for my $j (0 .. $short) {
    my ($a, $b, $l, $w) = ($line_a[$j], $line_b[$j], $i + 1, $j + 1); # redundant, but looks better
    print "$a $b $l $w
    " if $a ne $b;
    if ($#line_a > $#line_b) {
    print "File b is missing '", @line_a[@line_b .. $#line_a], "' on line $l
    "; # I'm not sure if you need to explicitly scalarise @line_b or not
    } elsif ($#line_b > $#line_a) {
    print "File b should not have '", @line_b[@line_a .. $#line_b], "' on line $l
    "; # agan, you may need to scalarise @line_a

    The formatting will probably get messed up with that method, but it looks the nicest in code form :-). You may want to look at formats - this sort of thing is what I think they were made for, although they're slightly archaic. I think you'd want something like
    [code]format STDOUT =
    @<;<<<<<<<<<<< @<;<<<<<<<<<<< @<;<<<<< @<;<<<<<

    $a, $b, $l, $w,
    But I don't know much about them at all, including how to actually use them, so you're on your own there.
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!