Please help me how I can make this functions?

Hello,

I have task to make two PHP functions:

1. To clean HTML file of all HTML tags and to separate all words with maximum 1 blank (space)
2. To clean result from first function of some 'ignore words'

For example: If somebody call:

$content = function1 ("test.html");
$content = function2 ($content, "ignore.dat");

And if content of test.html is:



This is title


Some code here

This is text

After function1 result in $content has to be:

This is title
This is text

And if I define ignore words in ignore.dat as "This" and "is" I have to have result in $content:

title
text

I can solve problem by classic way - split line by line on some characters <, >, ... but always I can miss something and this can take a lot of time. Can somebody help me how I can solve this problem using regular expressions.

Thanks a lot

LJUBA

Comments

  • LJUBA,

    PHP comes with a manual...:


    try
    [code=pre]
    RemoveHTMLandPHPtags([$IgnoreWord]) {
    $file1 = fopen("inputfile.html","r"); // Open a file for reading...
    $file2 = fopen("outputfile.txt","w"); // Open a file for writing...
    while(!feof($file1)) { // while the end of the file is not found do:
    $line = fgets($file1,1024); // read a line from the file
    strip_tags($line);
    if(IgnoreWord) {
    if(strchr($line,IgnoreWord) {
    str_replace($IgnoreWord,"",$line); // removes the $IgnoreWord string from the line...
    }
    }
    fputs($file2,$line); // write it to the outputfile
    }
    fclose($file1);
    fclose($file2);
    }
    [/code]

    I haven't tested this but it is what I shook from my sleave watching the manual, by the way, you ccan download it from www.php.net (You have to look hard, but it is there...)
    If you can't find it I have a dutch version here...

    ;-)
    -mac-
    mailto:mac@mac-doggie.nl
    The Netherlands



Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories

In this Discussion