Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories

Welcome to the new platform of Programmer's Heaven! We apologize for the inconvenience caused, if you visited us from a broken link of the previous version. The main reason to move to a new platform is to provide more effective and collaborative experience to you all. Please feel free to experience the new platform and use its exciting features. Contact us for any issue that you need to get clarified. We are more than happy to help you.

I have a trouble with parsing a string for my use

laxori666laxori666 Posts: 4Member
OK, I have a string that I got from a website (when I'm not making the program it'll read it all the time...)

Here is the program:
--- START ---

import urllib2

# file = urllib2.urlopen("http://www.tremorseven.com/aim/deepaim.php?job=view")

# print "URL Opened: " + file.geturl()
# URLInfo = file.read()
URLInfo = """
Deep Thoughts by Jack Handey

#235: When this girl at the museum asked me whom I liked better, Monet or Manet, I said, "I like mayonnaise." She just stared at me, so I said it again, louder. Then she left. I guess she went to try to find some mayonnaise for me.

Refresh | add Deep Thoughts to your aim

please support this service.

a service of tremorseven.com
"""
print "Contents of URL: "
print URLInfo

for x in range(0, len(URLInfo)):
if (URLInfo[x] == '#'):
for y in range(x, x + 5):
if (URLInfo[y] == ':'):
NumberStr = URLInfo[x+1:y]
print "Number of Deep Thought: " + NumberStr
StartOfThought = y+2
break

for z in range(StartOfThoughts, len(URLInfo)):
if (URLInfo[z] == '<':
EndOfThought = z

print "Contents of Deep Thought:"
print URLInfo[StartOfThought: EndOfThought]

---- END ----

I search for the # (this works), then i search for the ":" (this works) and i retrieve the number of this deep thought. Then i search for the position after the ":" for a "<" (this doesn't work).

And it does not work (dum dum dum). Any help would be appreciated.

Comments

  • laxori666laxori666 Posts: 4Member
    Sorry, apparently this messageboard doesn't like tabs... i guess you'll have to add them if you decide to run it yourself.
  • laxori666laxori666 Posts: 4Member
    This isn't the problem, but replace "StartOfThoughts" with "StartOfThought", and add in the ")" after "'<'", those didn't show up.
  • infidelinfidel Posts: 2,900Member
    First off, HTML doesn't like tabs, so you need to use special tags to keep preformatted text properly formatted. On PH, those tags are [leftbr]code[rightbr], and [leftbr]/code[rightbr]

    : OK, I have a string that I got from a website (when I'm not making the program it'll read it all the time...)
    :
    : Here is the program:
    : --- START ---
    :
    : import urllib2
    :
    : # file = urllib2.urlopen("http://www.tremorseven.com/aim/deepaim.php?job=view")
    :
    : # print "URL Opened: " + file.geturl()
    : # URLInfo = file.read()
    : URLInfo = """
    : Deep Thoughts by Jack Handey#235: When this girl at the museum asked me whom I liked better, Monet or Manet, I said, "I like mayonnaise." She just stared at me, so I said it again, louder. Then she left. I guess she went to try to find some mayonnaise for me. Refresh |
    add Deep Thoughts to your aim
    please support this service.a service of tremorseven.com
    : """
    : print "Contents of URL: "
    : print URLInfo
    :
    : for x in range(0, len(URLInfo)):
    : if (URLInfo[x] == '#'):
    : for y in range(x, x + 5):
    : if (URLInfo[y] == ':'):
    : NumberStr = URLInfo[x+1:y]
    : print "Number of Deep Thought: " + NumberStr
    : StartOfThought = y+2
    : break
    :
    : for z in range(StartOfThoughts, len(URLInfo)):
    : if (URLInfo[z] == '<':
    : EndOfThought = z
    :
    : print "Contents of Deep Thought:"
    : print URLInfo[StartOfThought: EndOfThought]
    :
    : ---- END ----
    :
    : I search for the # (this works), then i search for the ":" (this works) and i retrieve the number of this deep thought. Then i search for the position after the ":" for a "<" (this doesn't work).
    :
    : And it does not work (dum dum dum). Any help would be appreciated.

    Here's my first stab (note that I broke the text arbitrarily because preformatted text does not wrap and makes this page scroll far to the right if not forced to break):

    [code]
    import urllib2
    import re

    text = """<b>Deep Thoughts by Jack Handey

    #235: When this girl at the
    museum asked me whom I liked better, Monet or Manet, I said, "I like mayonnaise."
    She just stared at me, so I said it again, louder. Then she left. I guess she went
    to try to find some mayonnaise for me.


    Refresh |
    add Deep Thoughts
    to your aim


    please support this service.


    a service of
    tremorseven.com
    """

    for match in re.finditer("#[0-9]+:", text):
    thought_number = text[match.start()+1 : match.end()-1]
    thought = ""
    try:
    thought = text[match.end()+1 : text.index("<", match.end() + 1)].strip()
    except ValueError: # '<' character not found in text
    thought = text[match.end()+1 : ].strip()
    print thought_number
    print thought
    [/code]

    You may not be familiar with regular expressions (regex). Python has an "re" module that lets you use them. They are perfect for searching text for patterns. The re.finditer() method takes a pattern and a string and returns an iterable objects so you can step through it with a for loop. The pattern, "#[0-9]+:", is quite simple as far as regexen go. They can be quite complex. This one says "find a substring that starts with a hash (#), is followed by one or more (+) digits ([0-9]) and ends with a colon (:). There are entire books written about regular expressions and I highly recommend you at least learn the basics. I tried to come up with a regular expression that would also pick out the "deep thought" as well, but that was beyond my ability, so I just opted for using the string method "index" which returns the position of a substring you specify (you can optionally specify the start and end points for the search as well).

    Try this out and let me know if you have any other questions.


    [size=5][italic][blue][RED]i[/RED]nfidel[/blue][/italic][/size]

Sign In or Register to comment.