Please help me how I can make this functions?


I have task to make two PHP functions:

1. To clean HTML file of all HTML tags and to separate all words with maximum 1 blank (space)
2. To clean result from first function of some 'ignore words'

For example: If somebody call:

$content = function1 ("test.html");
$content = function2 ($content, "ignore.dat");

And if content of test.html is:

This is title

Some code here

This is text

After function1 result in $content has to be:

This is title
This is text

And if I define ignore words in ignore.dat as "This" and "is" I have to have result in $content:


I can solve problem by classic way - split line by line on some characters <, >, ... but always I can miss something and this can take a lot of time. Can somebody help me how I can solve this problem using regular expressions.

Thanks a lot



  • LJUBA,

    PHP comes with a manual...:

    RemoveHTMLandPHPtags([$IgnoreWord]) {
    $file1 = fopen("inputfile.html","r"); // Open a file for reading...
    $file2 = fopen("outputfile.txt","w"); // Open a file for writing...
    while(!feof($file1)) { // while the end of the file is not found do:
    $line = fgets($file1,1024); // read a line from the file
    if(IgnoreWord) {
    if(strchr($line,IgnoreWord) {
    str_replace($IgnoreWord,"",$line); // removes the $IgnoreWord string from the line...
    fputs($file2,$line); // write it to the outputfile

    I haven't tested this but it is what I shook from my sleave watching the manual, by the way, you ccan download it from (You have to look hard, but it is there...)
    If you can't find it I have a dutch version here...

    The Netherlands

