An inquiry for the true geeks out there

We're trying to estimate the time needed to convert our application to VB.NET. The upgrade wizard that comes with VS.NET does a pretty fair job with our 200 (ish) DLL projects but chokes hard on the GUI. We killed our first run of the wizard after four hours during which it sucked up 100% of the CPU time the whole time it was running. When we killed it, the progress bar was only about 50% (but who knows what that really means).

We first thought about breaking up the GUI (150-ish forms, many dozens of modules/classes) into smaller projects. Even these seem to take an inordinate amount of time to convert with the wizard, and apparently the conversion leaves some kind of "compatibility layer" between the Windows.Forms classes and the code. Don't ask me, the other guy discovered that.

While he was playing with that option, I sat down to think about converting the code ourselves. I started thinking about how to parse, for example, a .frm file of medium size. Not only the code, but the form properties and all the controls and the "hidden" stuff you don't see in the IDE. I started working on a state machine, but the more I looked at it, the more it started to resemble an XML file without tags. So I hacked together (in Python, of course) a parser class that reads through .frm files and acts very much like a SAX parser by calling inheritable methods (startElement, endElement, characters, etc) when it encounters certain keywords. This works very well, but trying to handle all the permutations of keywords and punctuation that comprises VB6 syntax was getting less and less fun.

So to keep things interesting, I started looking into using an existing lexer/parser to tear apart the VB6 source files to produce a tree that I could use to spit out VB.NET syntax myself. I couldn't get yacc, flex, bison, et al. working via cygwin, but I found a great Python module named SimpleParse.

Ok, so now I have a parser that understands a particular form of EBNF grammar but I haven't been able to find a grammar for VB6. I've started cobbling my own together and I finally feel I've jumped up on the learning curve. Here is the grammar I have so far:

[code]
#SimpleParse EBNF grammar for Visual Basic 6
code := statement*
statement := (comment / if / select / simpleact)

#If
if := (if_inline / if_block)
if_inline := if_then, ts, simpleact
if_block := if_then, tail, statement*, else?, end_if
if_then := 'If', ts, condition, ts, 'Then'
else := 'Else', tail, statement*
end_if := 'End If', tail

#Select Case
select := 'Select Case', ts, expression, tail, case*, case_else?, end_select
case := 'Case', ts, condition, tail, statement*
case_else := 'Case Else', tail, statement*
end_select := 'End Select', tail

#Conditional expressions
condition := 'cond', digit+

#Expressions
expression := 'expr', digit+

#Simple Action
simpleact := 'act', digit+, tail

#trailing whitespace/comments
tail := (comment / ws)
comment := ts, "'", visible*, ws

#character classes
visible := [ -~]
newline := '
'
ts := [ ]+
ws := [
fv]+
digit := [0-9]
lowercase := [a-z]
uppercase := [A-Z]
alpha := (lowercase / uppercase)
alphanumeric := (alpha / digit)
[/code]

Here's the example input I created to test the grammar thus far:

[code]
act0
If cond1 Then act1 'here is a comment
If cond2 Then
act2
If cond3 Then
act3
End If
Else 'another comment
Select Case expr1
Case cond4
act4
Case cond5
act5
Case Else
act99
End Select 'end of select case
End If
[/code]

Obviously I have a long way to go to get all of VB6 syntax covered (and I just noticed a mistake in the 'comment' pattern as I was typing this), so I was wondering if any of you hardcore geeks out there have or know of a grammar that I can use to get past this nitty-gritty stuff.


[size=5][italic][blue][RED]i[/RED]nfidel[/blue][/italic][/size]

[code]
$ select * from users where clue > 0
no rows returned
[/code]

Comments

  • Read this ages back, and never got round to replying...

    : So to keep things interesting, I started looking into using an existing lexer/parser to tear apart the VB6 source files to produce a tree that I could use to spit out VB.NET syntax myself. I couldn't get yacc, flex, bison, et al. working via cygwin, but I found a great Python module named SimpleParse.
    :
    Sounds like a very interesting project. Perl has SAX modules too, though a very popular parser in the Perl community is Damian Conway's infamous Parse::RecDescent (a recursive descent parser).

    : Ok, so now I have a parser that understands a particular form of EBNF grammar but I haven't been able to find a grammar for VB6. I've started cobbling my own together and I finally feel I've jumped up on the learning curve. Here is the grammar I have so far:
    :
    : [code]
    : #SimpleParse EBNF grammar for Visual Basic 6
    : code := statement*
    : statement := (comment / if / select / simpleact)
    :
    : #If
    : if := (if_inline / if_block)
    : if_inline := if_then, ts, simpleact
    : if_block := if_then, tail, statement*, else?, end_if
    : if_then := 'If', ts, condition, ts, 'Then'
    : else := 'Else', tail, statement*
    : end_if := 'End If', tail
    :
    : #Select Case
    : select := 'Select Case', ts, expression, tail, case*, case_else?, end_select
    : case := 'Case', ts, condition, tail, statement*
    : case_else := 'Case Else', tail, statement*
    : end_select := 'End Select', tail
    :
    : #Conditional expressions
    : condition := 'cond', digit+
    :
    : #Expressions
    : expression := 'expr', digit+
    :
    : #Simple Action
    : simpleact := 'act', digit+, tail
    :
    : #trailing whitespace/comments
    : tail := (comment / ws)
    : comment := ts, "'", visible*, ws
    :
    : #character classes
    : visible := [ -~]
    : newline := '
    '
    : ts := [ ]+
    : ws := [
    fv]+
    : digit := [0-9]
    : lowercase := [a-z]
    : uppercase := [A-Z]
    : alpha := (lowercase / uppercase)
    : alphanumeric := (alpha / digit)
    : [/code]
    :
    : Here's the example input I created to test the grammar thus far:
    :
    : [code]
    : act0
    : If cond1 Then act1 'here is a comment
    : If cond2 Then
    : act2
    : If cond3 Then
    : act3
    : End If
    : Else 'another comment
    : Select Case expr1
    : Case cond4
    : act4
    : Case cond5
    : act5
    : Case Else
    : act99
    : End Select 'end of select case
    : End If
    : [/code]
    :
    : Obviously I have a long way to go to get all of VB6 syntax covered (and I just noticed a mistake in the 'comment' pattern as I was typing this), so I was wondering if any of you hardcore geeks out there have or know of a grammar that I can use to get past this nitty-gritty stuff.
    :
    'fraid I don't know of any existing VB grammars out there - did you find any or did you end up creating your own? Or did you abandon the project?

    Jonathan

    ###
    for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
    (tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
    /(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");

  • : 'fraid I don't know of any existing VB grammars out there - did you find any or did you end up creating your own? Or did you abandon the project?

    Its been put on hold for a bit while I do actual work :-) Here's the grammar I have thus far:

    [code]
    root := statement*
    statement := (comment_line / option / declaration / conditional / simpleact)
    conditional := (if / select)
    declaration := (struct / enum / method)

    #Options
    option := option_keyword, option_type, tail
    option_keyword := spacechar*, 'Option', inline_space
    option_type := (option_explicit / option_base / option_compare / option_private)
    option_explicit := 'Explicit'
    option_base := 'Base', inline_space, ('0' / '1')
    option_compare := 'Compare', inline_space, ('Binary', 'Text', 'Database')
    option_private := 'Private'

    #Struct (Type in VB6)
    struct := struct_start, struct_element+, struct_end
    struct_start := publicity?, 'Type', inline_space, name, tail
    struct_element := name, subscript?, inline_space, as, tail
    struct_end := 'End', inline_space, 'Type', tail

    #Enumeration
    enum := enum_start, enum_element+, enum_end
    enum_start := publicity?, 'Enum', inline_space, name, tail
    enum_element := name, enum_value?, tail
    enum_value := inline_space, '=', inline_space, integer
    enum_end := 'End', inline_space, 'Enum', tail

    #Methods
    method := method_publicity?, static?, (property / sub / function)
    property := property_keyword, (property_get / property_set / property_let)
    property_get := 'Get', inline_space, method_signature, inline_space, as, tail
    property_set := 'Set', inline_space, method_signature, tail
    property_let := 'Let', inline_space, method_signature, tail
    property_keyword := 'Property', inline_space
    sub := 'Sub', inline_space, name, arglist, tail
    function := 'Function', inline_space, method_signature, inline_space, as, tail
    method_signature := name, arglist

    #Method arguments
    arglist := '(foo)'
    args := arg, (',', arg)*
    arg := inline_space?, 'foo', inline_space?

    #Publicity
    publicity := ('Public' / 'Private'), inline_space
    method_publicity := ('Public' / 'Private' / 'Friend'), inline_space

    #Static keyword
    static := 'Static', inline_space

    #Array subscript
    subscript := '(', ')'

    #As
    as := 'As', inline_space, typename

    #TypeName
    typename := (name / dotname)

    #Values
    integer := '_'?, digit+

    #Names
    name := alpha, (alphanum / '_')*
    dotname := name, ('.', name)+

    #If
    if := (if_inline / if_block)
    if_inline := if_then, inline_space, simpleact
    if_block := if_then, tail, statement*, elseif*, else?, end_if
    if_then := 'If', inline_space, condition, inline_space, 'Then'
    elseif := (elseif_inline / elseif_block)
    elseif_inline := 'Else', if_inline
    elseif_block := 'Else', if_then, tail, statement*
    else := 'Else', tail, statement*
    end_if := 'End', inline_space, 'If', tail

    #Select Case
    select := 'Select', inline_space, 'Case', inline_space, expression, tail, case*, case_else?, end_select
    case := 'Case', inline_space, condition, tail, statement*
    case_else := 'Case', inline_space, 'Else', tail, statement*
    end_select := 'End', inline_space, 'Select', tail

    #Conditional expressions
    condition := 'cond', digit+

    #Expressions
    expression := 'expr', digit+

    #Simple Action (really just a placeholder for now)
    simpleact := 'act', digit+, tail

    #trailing whitespace/comments
    tail := (trailing_comment / trailing_ws)
    trailing_comment := inline_space, comment
    comment_line := spacechar*, comment
    comment := "'", visible*, trailing_ws

    #character classes
    visible := [ -~]
    newline := '
    '
    inline_space := spacechar+, ('_', spacechar*, newline, spacechar*)?
    ts := spacechar+
    trailing_ws := spacechar*, newline, spacechar*
    spacechar := [ ]
    digit := [0-9]
    lowercase := [a-z]
    uppercase := [A-Z]
    alpha := (lowercase / uppercase)
    alphanum := (alpha / digit)
    [/code]

    The "method" productions aren't working right so I decided to put it away for a while to focus on stuff I'm supposed to be doing instead.

    The Enum part threw me for a while. Having a single token on a line as VB Enums tend to do forced me to rethink the way lines are separated because the parser couldn't tell when it found the End Enum.


    [size=5][italic][blue][RED]i[/RED]nfidel[/blue][/italic][/size]

    [code]
    $ select * from users where clue > 0
    no rows returned
    [/code]

  • : : 'fraid I don't know of any existing VB grammars out there - did you
    : : find any or did you end up creating your own? Or did you abandon
    : : the project?
    :
    : Its been put on hold for a bit while I do actual work :-) Here's the
    : grammar I have thus far:
    :
    : [code]
    : root := statement*
    : statement := (comment_line / option / declaration / conditional / simpleact)
    : conditional := (if / select)
    : declaration := (struct / enum / method)
    :
    : #Options
    : option := option_keyword, option_type, tail
    : option_keyword := spacechar*, 'Option', inline_space
    : option_type := (option_explicit / option_base / option_compare / option_private)
    : option_explicit := 'Explicit'
    : option_base := 'Base', inline_space, ('0' / '1')
    : option_compare := 'Compare', inline_space, ('Binary', 'Text', 'Database')
    : option_private := 'Private'
    :
    : #Struct (Type in VB6)
    : struct := struct_start, struct_element+, struct_end
    : struct_start := publicity?, 'Type', inline_space, name, tail
    : struct_element := name, subscript?, inline_space, as, tail
    : struct_end := 'End', inline_space, 'Type', tail
    :
    : #Enumeration
    : enum := enum_start, enum_element+, enum_end
    : enum_start := publicity?, 'Enum', inline_space, name, tail
    : enum_element := name, enum_value?, tail
    : enum_value := inline_space, '=', inline_space, integer
    : enum_end := 'End', inline_space, 'Enum', tail
    :
    : #Methods
    : method := method_publicity?, static?, (property / sub / function)
    : property := property_keyword, (property_get / property_set / property_let)
    : property_get := 'Get', inline_space, method_signature, inline_space, as, tail
    : property_set := 'Set', inline_space, method_signature, tail
    : property_let := 'Let', inline_space, method_signature, tail
    : property_keyword := 'Property', inline_space
    : sub := 'Sub', inline_space, name, arglist, tail
    : function := 'Function', inline_space, method_signature, inline_space, as, tail
    : method_signature := name, arglist
    :
    : #Method arguments
    : arglist := '(foo)'
    : args := arg, (',', arg)*
    : arg := inline_space?, 'foo', inline_space?
    :
    : #Publicity
    : publicity := ('Public' / 'Private'), inline_space
    : method_publicity := ('Public' / 'Private' / 'Friend'), inline_space
    :
    : #Static keyword
    : static := 'Static', inline_space
    :
    : #Array subscript
    : subscript := '(', ')'
    :
    : #As
    : as := 'As', inline_space, typename
    :
    : #TypeName
    : typename := (name / dotname)
    :
    : #Values
    : integer := '_'?, digit+
    :
    : #Names
    : name := alpha, (alphanum / '_')*
    : dotname := name, ('.', name)+
    :
    : #If
    : if := (if_inline / if_block)
    : if_inline := if_then, inline_space, simpleact
    : if_block := if_then, tail, statement*, elseif*, else?, end_if
    : if_then := 'If', inline_space, condition, inline_space, 'Then'
    : elseif := (elseif_inline / elseif_block)
    : elseif_inline := 'Else', if_inline
    : elseif_block := 'Else', if_then, tail, statement*
    : else := 'Else', tail, statement*
    : end_if := 'End', inline_space, 'If', tail
    :
    : #Select Case
    : select := 'Select', inline_space, 'Case', inline_space, expression, tail, case*, case_else?, end_select
    : case := 'Case', inline_space, condition, tail, statement*
    : case_else := 'Case', inline_space, 'Else', tail, statement*
    : end_select := 'End', inline_space, 'Select', tail
    :
    : #Conditional expressions
    : condition := 'cond', digit+
    :
    : #Expressions
    : expression := 'expr', digit+
    :
    : #Simple Action (really just a placeholder for now)
    : simpleact := 'act', digit+, tail
    :
    : #trailing whitespace/comments
    : tail := (trailing_comment / trailing_ws)
    : trailing_comment := inline_space, comment
    : comment_line := spacechar*, comment
    : comment := "'", visible*, trailing_ws
    :
    : #character classes
    : visible := [ -~]
    : newline := '
    '
    : inline_space := spacechar+, ('_', spacechar*, newline, spacechar*)?
    : ts := spacechar+
    : trailing_ws := spacechar*, newline, spacechar*
    : spacechar := [ ]
    : digit := [0-9]
    : lowercase := [a-z]
    : uppercase := [A-Z]
    : alpha := (lowercase / uppercase)
    : alphanum := (alpha / digit)
    : [/code]
    :
    : The "method" productions aren't working right so I decided to put it
    : away for a while to focus on stuff I'm supposed to be doing instead.
    :
    : The Enum part threw me for a while. Having a single token on a line
    : as VB Enums tend to do forced me to rethink the way lines are
    : separated because the parser couldn't tell when it found the End Enum.
    :
    To my understanding of Visual Basic, there are only some things that are allowed outside of a procedure, be that a sub, function or property accessor. Basically those are options and type declarations. Further, I think there are some things that are only allowed outside of a sub or function, such as a Declare statement (for including APIs for use) or even another sub or function. So I'd probably go for something that was more like:-

    [code]root := unit*
    unit := procedure | type | enum | declare | option | dim | newline
    procedure := sub | function | proplet | propget

    ...

    sub := accesstype?, ws, 'Sub', ws, identifier, '(', arglist, ')', newline, statement*, 'End', ws, 'Sub', newline

    ...

    newline := (ws*, crlf) | (comment, crlf)
    comment := ws*, ''', visible*, ws*
    [/code]

    (Assumption of greedy quantifiers)

    Note that I've rolled comments into the newline declaration, because they can be on the end of any line (unless someone knows something I don't...)

    Have you considered, instead of just trying to turn this into VB.NET, compiling it straight down to .NET IL? If you'd want to do such a thing, let me know, because it could be cool to get the compiler to target Parrot too. Whether I'll have time to play with this I don't know - I've got a lot of projects on the go already - but it's an interesting idea if nothing else.

    Jonathan

    ###
    for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
    (tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
    /(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");

  • : To my understanding of Visual Basic, there are only some things that are allowed outside of a procedure, be that a sub, function or property accessor. Basically those are options and type declarations. Further, I think there are some things that are only allowed outside of a sub or function, such as a Declare statement (for including APIs for use) or even another sub or function.

    Basically, but there are other "hidden" things too, especially in .frm files.

    :So I'd probably go for something that was more like:-
    :
    : [code]root := unit*
    : unit := procedure | type | enum | declare | option | dim | newline
    : procedure := sub | function | proplet | propget
    :
    : ...
    :
    : sub := accesstype?, ws, 'Sub', ws, identifier, '(', arglist, ')', newline, statement*, 'End', ws, 'Sub', newline
    :
    : ...
    :
    : newline := (ws*, crlf) | (comment, crlf)
    : comment := ws*, ''', visible*, ws*
    : [/code]
    :
    : (Assumption of greedy quantifiers)

    That's generally where I was going, though I started from the "bottom" with If statements and was building up from there.

    : Note that I've rolled comments into the newline declaration, because they can be on the end of any line (unless someone knows something I don't...)

    They cannot come after a line-continuation (because comments cannot be inline)

    : Have you considered, instead of just trying to turn this into VB.NET, compiling it straight down to .NET IL? If you'd want to do such a thing, let me know, because it could be cool to get the compiler to target Parrot too. Whether I'll have time to play with this I don't know - I've got a lot of projects on the go already - but it's an interesting idea if nothing else.

    That would be pretty cool but way beyond my scope. This was really just a diversion from the conversion problem. For generating VB.NET code from VB6 code, I think the SAX-esque approach is sufficient and far easier to create.


    [size=5][italic][blue][RED]i[/RED]nfidel[/blue][/italic][/size]

    [code]
    $ select * from users where clue > 0
    no rows returned
    [/code]

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories