Parsing HTML with RegExp: could you help me?

File 1:

import urllib

hdl = urllib.urlopen("file://localhost/D:/page.htm")
html = hdl.read()
hdl.close()

text_file = open("file.txt", "w")
text_file.write(html)
text_file.close()

File 2:

import re

text_file = open("file.txt", "r")
contents = text_file.read()
text_file.close()

p = re.compile('(?<=starting_html_tag).*(?=ending_html_tag)')
m = p.search(contents)
if m:
print 'Match found: ', m.group()
else:
print 'No match'

The first script opens a particular web page and reads it into a txt file. The second script opens the txt file and looks for contents between tags 'starting_html_tag' and 'ending_html_tag'.

The problem is that the second script doesn't find anything at all. It prints 'No match'. What's the matter? [code][/code][code][/code][code][/code][code][/code][code][/code][code][/code]

Comments

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories

In this Discussion