Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories

Welcome to the new platform of Programmer's Heaven! We apologize for the inconvenience caused, if you visited us from a broken link of the previous version. The main reason to move to a new platform is to provide more effective and collaborative experience to you all. Please feel free to experience the new platform and use its exciting features. Contact us for any issue that you need to get clarified. We are more than happy to help you.

Python's Regular Expression

jmconnelljmconnell Posts: 4Member
I came across this when reading Chapter 18 of How To Think Like A Computer Scientist (in Python):


>>>import re
>>>re.split("([^0-9])", "123+456*/")
['123', '+', '456', '*', '', '/', '']


Can anyone tell me why the result isn't ['123', '456'] instead?

Comments

  • infidelinfidel Posts: 2,900Member
    : I came across this when reading Chapter 18 of How To Think Like A Computer Scientist (in Python):
    :
    :
    : >>>import re
    : >>>re.split("([^0-9])", "123+456*/")
    : ['123', '+', '456', '*', '', '/', '']
    :

    :
    : Can anyone tell me why the result isn't ['123', '456'] instead?

    My first guess is that since you're splitting on any character that is not a digit, you would be losing a great deal of data by not including the non-digit characters in the result. I'm not familiar with the "split" notion of regexen. If all you want is ['123', '456'] then perhaps you should use a function like match()? I'll look into it more if you want me to.


    [size=5][italic][blue][RED]i[/RED]nfidel[/blue][/italic][/size]

    [code]
    $ select * from users where clue > 0
    no rows returned
    [/code]

  • infidelinfidel Posts: 2,900Member
    : : I came across this when reading Chapter 18 of How To Think Like A Computer Scientist (in Python):
    : :
    : :
    : : >>>import re
    : : >>>re.split("([^0-9])", "123+456*/")
    : : ['123', '+', '456', '*', '', '/', '']
    : :

    : :
    : : Can anyone tell me why the result isn't ['123', '456'] instead?
    :
    : My first guess is that since you're splitting on any character that is not a digit, you would be losing a great deal of data by not including the non-digit characters in the result. I'm not familiar with the "split" notion of regexen. If all you want is ['123', '456'] then perhaps you should use a function like match()? I'll look into it more if you want me to.

    Ok, I thought that just maybe the standard documentatation might say something about this. The module docs come with python so you should be able to find this description of the re.split() function:

    [blue]split( pattern, string[, maxsplit = 0])

    Split string by the occurrences of pattern. [italic]If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list[/italic]. If maxsplit is nonzero, at most maxsplit splits occur, and the remainder of the string is returned as the final element of the list. (Incompatibility note: in the original Python 1.5 release, maxsplit was ignored. This has been fixed in later releases.)

    >>> re.split('W+', 'Words, words, words.')
    ['Words', 'words', 'words', '']
    >>> re.split('(W+)', 'Words, words, words.')
    ['Words', ', ', 'words', ', ', 'words', '.', '']
    >>> re.split('W+', 'Words, words, words.', 1)
    ['Words', 'words, words.']

    This function combines and extends the functionality of the old regsub.split() and regsub.splitx(). [/blue]

    The italics are mine. I don't undertand groups yet in relation to regex, but it appears that the presence of the parentheses in your original pattern are the cause of the split characters being part of the result, as the second example in the documenation also illustrates.

    Get to know the docs that come with Python, they are immensely helpful.


    [size=5][italic][blue][RED]i[/RED]nfidel[/blue][/italic][/size]

    [code]
    $ select * from users where clue > 0
    no rows returned
    [/code]

  • jmconnelljmconnell Posts: 4Member
    Thanks a lot! :)

    I began picking up Python a few months ago. I sourced function docs mainly by querying the convenient built-in help(), and that turned out to be sufficient most of the time.

    I compared those churned out by help() with those in the full-fledged Documentation from time to time, and got the impression that the former is very close to the latter (that of os.walk() is one good example).

    But all I got from help(), on re.split(), before posting my question here, was this:
    [code]
    >>> help(re.split)
    Help on function split in module sre:

    split(pattern, string, maxsplit=0)
    Split the source string by the occurrences of the pattern,
    returning a list containing the resulting substrings.
    [/code]

    [blue]
    Thanks to your reply. I now know that help()'s just kind of doing random sampling of the full-fledged.
    [/blue]

    As for "groups" in regex, after doing a little research, I found that groups are specified by parentheses, and they can be usefully divided into two types: capturing and non-capturing. After performing a match, only the pattern matched by the former can be referenced, by means of "backreferences" that takes the form "[italic]number[/italic]".

    One possible combination of the two is to use the former to point to pattern to be retained during substitution after performing a match (by means of inclusion of backreferences in replacement pattern), and the latter to specify extra criteria that must be met to trigger the substitution.

    Back to re.split(). I was introduced to "split()" first through Java's String, then through Python's str. Then re.split() struck me as something that's quite ugly because it can include the separators in its return value, polluting the notion of "split".

    But I got over it because I think it "pollutes" for practicality.
  • infidelinfidel Posts: 2,900Member
    : [blue]
    : Thanks to your reply. I now know that help()'s just kind of doing random sampling of the full-fledged.
    : [/blue]

    Actually, help() looks for docstrings in the code, which is probably where the documentation is initially generated from before extra explanations are added. If you're not familiar with docstrings, every class and function can have documentation "built in" to them. All you ahve to do is make a string the first statement of the body.

    [code]
    >>> def myfunc():
    ... "myfunc exists to demonstrate docstrings"
    ... pass
    ...
    [/code]

    The interpreter automatically puts such a string in a property named __doc__ and you can see for yourself:

    [code]
    >>> myfunc.__doc__
    'myfunc exists to demonstrate docstrings'
    [/code]

    Now you can see how the help() function is working...

    [code]
    >>> help(myfunc)
    Help on function myfunc in module __main__:

    myfunc()
    myfunc exists to demonstrate docstrings

    >>>
    [/code]



    [size=5][italic][blue][RED]i[/RED]nfidel[/blue][/italic][/size]

    [code]
    $ select * from users where clue > 0
    no rows returned
    [/code]

  • jmconnelljmconnell Posts: 4Member
    : Actually, help() looks for docstrings in the code, which is ...

    Thanks again.

    Btw, may I know where you got this kind of "insider's" info from? I tried the standard doc but didn't find it there.
  • infidelinfidel Posts: 2,900Member
    : : Actually, help() looks for docstrings in the code, which is ...
    :
    : Thanks again.
    :
    : Btw, may I know where you got this kind of "insider's" info from? I tried the standard doc but didn't find it there.

    Which insiders info? The help() function? Don't really remember. I have "Practical Python" by Magnus Lie Hetland and it's mentioned, so I may have picked it up there. May have been a newsgroup posting or perhaps just some blog entry or tutorial online somewhere.

    Here's the Daily Python URL link:

    http://www.pythonware.com/daily/

    Every work day there's a few links you can check out. Most probably won't be useful, but every now and then there's a good one. Also, you can use Google to read postings to the comp.lang.python newsgroup. I check this board a few times each regular work day so if you're stuck you can always check here. I don't use python in any professional sense, but I like to think I'm pretty good with it.


    [size=5][italic][blue][RED]i[/RED]nfidel[/blue][/italic][/size]

    [code]
    $ select * from users where clue > 0
    no rows returned
    [/code]

  • jmconnelljmconnell Posts: 4Member
    You're indeed good at it. I'll come again when I'm stuck.

    Good day!
Sign In or Register to comment.