I need help in parsing unicode webpages & downloading jpeg image files via Perl scripts.
I read http://www.cs.utk.edu/cs594ipm/perl/crawltut.html
about using LWP or HTTP or get($url) functions & libraries. But the content returned is always garbled. I have used get($url) on a non-unicode webpage and the content is returned in perfect ascii.
But now I want to parse http://www.tom365.com/movie_2004/html/5507.html
and the page I get back is garbled encoded. I have read about Encode but don't know how to use it.
I need a Perl script to parse that above page and extract the URL for the image in this pattern:
If anyone knows how to do this parsing unicode webpages then I'd be very grateful.
0 · ·