We're trying to estimate the time needed to convert our application to VB.NET. The upgrade wizard that comes with VS.NET does a pretty fair job with our 200 (ish) DLL projects but chokes hard on the GUI. We killed our first run of the wizard after four hours during which it sucked up 100% of the CPU time the whole time it was running. When we killed it, the progress bar was only about 50% (but who knows what that really means).
We first thought about breaking up the GUI (150-ish forms, many dozens of modules/classes) into smaller projects. Even these seem to take an inordinate amount of time to convert with the wizard, and apparently the conversion leaves some kind of "compatibility layer" between the Windows.Forms classes and the code. Don't ask me, the other guy discovered that.
While he was playing with that option, I sat down to think about converting the code ourselves. I started thinking about how to parse, for example, a .frm file of medium size. Not only the code, but the form properties and all the controls and the "hidden" stuff you don't see in the IDE. I started working on a state machine, but the more I looked at it, the more it started to resemble an XML file without tags. So I hacked together (in Python, of course) a parser class that reads through .frm files and acts very much like a SAX parser by calling inheritable methods (startElement, endElement, characters, etc) when it encounters certain keywords. This works very well, but trying to handle all the permutations of keywords and punctuation that comprises VB6 syntax was getting less and less fun.
So to keep things interesting, I started looking into using an existing lexer/parser to tear apart the VB6 source files to produce a tree that I could use to spit out VB.NET syntax myself. I couldn't get yacc, flex, bison, et al. working via cygwin, but I found a great Python module named SimpleParse.
Ok, so now I have a parser that understands a particular form of EBNF grammar but I haven't been able to find a grammar for VB6. I've started cobbling my own together and I finally feel I've jumped up on the learning curve. Here is the grammar I have so far:
[code]
#SimpleParse EBNF grammar for Visual Basic 6
code := statement*
statement := (comment / if / select / simpleact)
#Ifif := (if_inline / if_block)
if_inline := if_then, ts, simpleact
if_block := if_then, tail, statement*, else?, end_if
if_then := 'If', ts, condition, ts, 'Then'
else := 'Else', tail, statement*
end_if := 'End If', tail
#Select Case
select := 'Select Case', ts, expression, tail, case*, case_else?, end_select
case := 'Case', ts, condition, tail, statement*
case_else := 'Case Else', tail, statement*
end_select := 'End Select', tail
#Conditional expressions
condition := 'cond', digit+
#Expressionsexpression := 'expr', digit+
#Simple Action
simpleact := 'act', digit+, tail
#trailing whitespace/comments
tail := (comment / ws)
comment := ts, "'", visible*, ws
#character classes
visible := [ -~]
newline := '
'
ts := [ ]+
ws := [
fv]+
digit := [0-9]
lowercase := [a-z]
uppercase := [A-Z]
alpha := (lowercase / uppercase)
alphanumeric := (alpha / digit)
[/code]
Here's the example input I created to test the grammar thus far:
[code]
act0
If cond1 Then act1 'here is a comment
If cond2 Then
act2
If cond3 Then
act3
End If
Else 'another comment
Select Case expr1
Case cond4
act4
Case cond5
act5
Case Else
act99
End Select 'end of select case
End If
[/code]
Obviously I have a long way to go to get all of VB6 syntax covered (and I just noticed a mistake in the 'comment' pattern as I was typing this), so I was wondering if any of you hardcore geeks out there have or know of a grammar that I can use to get past this nitty-gritty stuff.
[size=5][italic][blue][RED]i[/RED]nfidel[/blue][/italic][/size]
[code]
$ select * from users where clue > 0
no rows returned
[/code]
Comments
: So to keep things interesting, I started looking into using an existing lexer/parser to tear apart the VB6 source files to produce a tree that I could use to spit out VB.NET syntax myself. I couldn't get yacc, flex, bison, et al. working via cygwin, but I found a great Python module named SimpleParse.
:
Sounds like a very interesting project. Perl has SAX modules too, though a very popular parser in the Perl community is Damian Conway's infamous Parse::RecDescent (a recursive descent parser).
: Ok, so now I have a parser that understands a particular form of EBNF grammar but I haven't been able to find a grammar for VB6. I've started cobbling my own together and I finally feel I've jumped up on the learning curve. Here is the grammar I have so far:
:
: [code]
: #SimpleParse EBNF grammar for Visual Basic 6
: code := statement*
: statement := (comment / if / select / simpleact)
:
: #If
: if := (if_inline / if_block)
: if_inline := if_then, ts, simpleact
: if_block := if_then, tail, statement*, else?, end_if
: if_then := 'If', ts, condition, ts, 'Then'
: else := 'Else', tail, statement*
: end_if := 'End If', tail
:
: #Select Case
: select := 'Select Case', ts, expression, tail, case*, case_else?, end_select
: case := 'Case', ts, condition, tail, statement*
: case_else := 'Case Else', tail, statement*
: end_select := 'End Select', tail
:
: #Conditional expressions
: condition := 'cond', digit+
:
: #Expressions
: expression := 'expr', digit+
:
: #Simple Action
: simpleact := 'act', digit+, tail
:
: #trailing whitespace/comments
: tail := (comment / ws)
: comment := ts, "'", visible*, ws
:
: #character classes
: visible := [ -~]
: newline := '
'
: ts := [ ]+
: ws := [
fv]+
: digit := [0-9]
: lowercase := [a-z]
: uppercase := [A-Z]
: alpha := (lowercase / uppercase)
: alphanumeric := (alpha / digit)
: [/code]
:
: Here's the example input I created to test the grammar thus far:
:
: [code]
: act0
: If cond1 Then act1 'here is a comment
: If cond2 Then
: act2
: If cond3 Then
: act3
: End If
: Else 'another comment
: Select Case expr1
: Case cond4
: act4
: Case cond5
: act5
: Case Else
: act99
: End Select 'end of select case
: End If
: [/code]
:
: Obviously I have a long way to go to get all of VB6 syntax covered (and I just noticed a mistake in the 'comment' pattern as I was typing this), so I was wondering if any of you hardcore geeks out there have or know of a grammar that I can use to get past this nitty-gritty stuff.
:
'fraid I don't know of any existing VB grammars out there - did you find any or did you end up creating your own? Or did you abandon the project?
Jonathan
###
for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
(tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
/(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");
Its been put on hold for a bit while I do actual work :-) Here's the grammar I have thus far:
[code]
root := statement*
statement := (comment_line / option / declaration / conditional / simpleact)
conditional := (if / select)
declaration := (struct / enum / method)
#Options
option := option_keyword, option_type, tail
option_keyword := spacechar*, 'Option', inline_space
option_type := (option_explicit / option_base / option_compare / option_private)
option_explicit := 'Explicit'
option_base := 'Base', inline_space, ('0' / '1')
option_compare := 'Compare', inline_space, ('Binary', 'Text', 'Database')
option_private := 'Private'
#Struct (Type in VB6)
struct := struct_start, struct_element+, struct_end
struct_start := publicity?, 'Type', inline_space, name, tail
struct_element := name, subscript?, inline_space, as, tail
struct_end := 'End', inline_space, 'Type', tail
#Enumeration
enum := enum_start, enum_element+, enum_end
enum_start := publicity?, 'Enum', inline_space, name, tail
enum_element := name, enum_value?, tail
enum_value := inline_space, '=', inline_space, integer
enum_end := 'End', inline_space, 'Enum', tail
#Methods
method := method_publicity?, static?, (property / sub / function)
property := property_keyword, (property_get / property_set / property_let)
property_get := 'Get', inline_space, method_signature, inline_space, as, tail
property_set := 'Set', inline_space, method_signature, tail
property_let := 'Let', inline_space, method_signature, tail
property_keyword := 'Property', inline_space
sub := 'Sub', inline_space, name, arglist, tail
function := 'Function', inline_space, method_signature, inline_space, as, tail
method_signature := name, arglist
#Method arguments
arglist := '(foo)'
args := arg, (',', arg)*
arg := inline_space?, 'foo', inline_space?
#Publicity
publicity := ('Public' / 'Private'), inline_space
method_publicity := ('Public' / 'Private' / 'Friend'), inline_space
#Static keyword
static := 'Static', inline_space
#Array subscript
subscript := '(', ')'
#As
as := 'As', inline_space, typename
#TypeName
typename := (name / dotname)
#Values
integer := '_'?, digit+
#Names
name := alpha, (alphanum / '_')*
dotname := name, ('.', name)+
#If
if := (if_inline / if_block)
if_inline := if_then, inline_space, simpleact
if_block := if_then, tail, statement*, elseif*, else?, end_if
if_then := 'If', inline_space, condition, inline_space, 'Then'
elseif := (elseif_inline / elseif_block)
elseif_inline := 'Else', if_inline
elseif_block := 'Else', if_then, tail, statement*
else := 'Else', tail, statement*
end_if := 'End', inline_space, 'If', tail
#Select Case
select := 'Select', inline_space, 'Case', inline_space, expression, tail, case*, case_else?, end_select
case := 'Case', inline_space, condition, tail, statement*
case_else := 'Case', inline_space, 'Else', tail, statement*
end_select := 'End', inline_space, 'Select', tail
#Conditional expressions
condition := 'cond', digit+
#Expressions
expression := 'expr', digit+
#Simple Action (really just a placeholder for now)
simpleact := 'act', digit+, tail
#trailing whitespace/comments
tail := (trailing_comment / trailing_ws)
trailing_comment := inline_space, comment
comment_line := spacechar*, comment
comment := "'", visible*, trailing_ws
#character classes
visible := [ -~]
newline := '
'
inline_space := spacechar+, ('_', spacechar*, newline, spacechar*)?
ts := spacechar+
trailing_ws := spacechar*, newline, spacechar*
spacechar := [ ]
digit := [0-9]
lowercase := [a-z]
uppercase := [A-Z]
alpha := (lowercase / uppercase)
alphanum := (alpha / digit)
[/code]
The "method" productions aren't working right so I decided to put it away for a while to focus on stuff I'm supposed to be doing instead.
The Enum part threw me for a while. Having a single token on a line as VB Enums tend to do forced me to rethink the way lines are separated because the parser couldn't tell when it found the End Enum.
[size=5][italic][blue][RED]i[/RED]nfidel[/blue][/italic][/size]
[code]
$ select * from users where clue > 0
no rows returned
[/code]
: : find any or did you end up creating your own? Or did you abandon
: : the project?
:
: Its been put on hold for a bit while I do actual work :-) Here's the
: grammar I have thus far:
:
: [code]
: root := statement*
: statement := (comment_line / option / declaration / conditional / simpleact)
: conditional := (if / select)
: declaration := (struct / enum / method)
:
: #Options
: option := option_keyword, option_type, tail
: option_keyword := spacechar*, 'Option', inline_space
: option_type := (option_explicit / option_base / option_compare / option_private)
: option_explicit := 'Explicit'
: option_base := 'Base', inline_space, ('0' / '1')
: option_compare := 'Compare', inline_space, ('Binary', 'Text', 'Database')
: option_private := 'Private'
:
: #Struct (Type in VB6)
: struct := struct_start, struct_element+, struct_end
: struct_start := publicity?, 'Type', inline_space, name, tail
: struct_element := name, subscript?, inline_space, as, tail
: struct_end := 'End', inline_space, 'Type', tail
:
: #Enumeration
: enum := enum_start, enum_element+, enum_end
: enum_start := publicity?, 'Enum', inline_space, name, tail
: enum_element := name, enum_value?, tail
: enum_value := inline_space, '=', inline_space, integer
: enum_end := 'End', inline_space, 'Enum', tail
:
: #Methods
: method := method_publicity?, static?, (property / sub / function)
: property := property_keyword, (property_get / property_set / property_let)
: property_get := 'Get', inline_space, method_signature, inline_space, as, tail
: property_set := 'Set', inline_space, method_signature, tail
: property_let := 'Let', inline_space, method_signature, tail
: property_keyword := 'Property', inline_space
: sub := 'Sub', inline_space, name, arglist, tail
: function := 'Function', inline_space, method_signature, inline_space, as, tail
: method_signature := name, arglist
:
: #Method arguments
: arglist := '(foo)'
: args := arg, (',', arg)*
: arg := inline_space?, 'foo', inline_space?
:
: #Publicity
: publicity := ('Public' / 'Private'), inline_space
: method_publicity := ('Public' / 'Private' / 'Friend'), inline_space
:
: #Static keyword
: static := 'Static', inline_space
:
: #Array subscript
: subscript := '(', ')'
:
: #As
: as := 'As', inline_space, typename
:
: #TypeName
: typename := (name / dotname)
:
: #Values
: integer := '_'?, digit+
:
: #Names
: name := alpha, (alphanum / '_')*
: dotname := name, ('.', name)+
:
: #If
: if := (if_inline / if_block)
: if_inline := if_then, inline_space, simpleact
: if_block := if_then, tail, statement*, elseif*, else?, end_if
: if_then := 'If', inline_space, condition, inline_space, 'Then'
: elseif := (elseif_inline / elseif_block)
: elseif_inline := 'Else', if_inline
: elseif_block := 'Else', if_then, tail, statement*
: else := 'Else', tail, statement*
: end_if := 'End', inline_space, 'If', tail
:
: #Select Case
: select := 'Select', inline_space, 'Case', inline_space, expression, tail, case*, case_else?, end_select
: case := 'Case', inline_space, condition, tail, statement*
: case_else := 'Case', inline_space, 'Else', tail, statement*
: end_select := 'End', inline_space, 'Select', tail
:
: #Conditional expressions
: condition := 'cond', digit+
:
: #Expressions
: expression := 'expr', digit+
:
: #Simple Action (really just a placeholder for now)
: simpleact := 'act', digit+, tail
:
: #trailing whitespace/comments
: tail := (trailing_comment / trailing_ws)
: trailing_comment := inline_space, comment
: comment_line := spacechar*, comment
: comment := "'", visible*, trailing_ws
:
: #character classes
: visible := [ -~]
: newline := '
'
: inline_space := spacechar+, ('_', spacechar*, newline, spacechar*)?
: ts := spacechar+
: trailing_ws := spacechar*, newline, spacechar*
: spacechar := [ ]
: digit := [0-9]
: lowercase := [a-z]
: uppercase := [A-Z]
: alpha := (lowercase / uppercase)
: alphanum := (alpha / digit)
: [/code]
:
: The "method" productions aren't working right so I decided to put it
: away for a while to focus on stuff I'm supposed to be doing instead.
:
: The Enum part threw me for a while. Having a single token on a line
: as VB Enums tend to do forced me to rethink the way lines are
: separated because the parser couldn't tell when it found the End Enum.
:
To my understanding of Visual Basic, there are only some things that are allowed outside of a procedure, be that a sub, function or property accessor. Basically those are options and type declarations. Further, I think there are some things that are only allowed outside of a sub or function, such as a Declare statement (for including APIs for use) or even another sub or function. So I'd probably go for something that was more like:-
[code]root := unit*
unit := procedure | type | enum | declare | option | dim | newline
procedure := sub | function | proplet | propget
...
sub := accesstype?, ws, 'Sub', ws, identifier, '(', arglist, ')', newline, statement*, 'End', ws, 'Sub', newline
...
newline := (ws*, crlf) | (comment, crlf)
comment := ws*, ''', visible*, ws*
[/code]
(Assumption of greedy quantifiers)
Note that I've rolled comments into the newline declaration, because they can be on the end of any line (unless someone knows something I don't...)
Have you considered, instead of just trying to turn this into VB.NET, compiling it straight down to .NET IL? If you'd want to do such a thing, let me know, because it could be cool to get the compiler to target Parrot too. Whether I'll have time to play with this I don't know - I've got a lot of projects on the go already - but it's an interesting idea if nothing else.
Jonathan
###
for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
(tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
/(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");
Basically, but there are other "hidden" things too, especially in .frm files.
:So I'd probably go for something that was more like:-
:
: [code]root := unit*
: unit := procedure | type | enum | declare | option | dim | newline
: procedure := sub | function | proplet | propget
:
: ...
:
: sub := accesstype?, ws, 'Sub', ws, identifier, '(', arglist, ')', newline, statement*, 'End', ws, 'Sub', newline
:
: ...
:
: newline := (ws*, crlf) | (comment, crlf)
: comment := ws*, ''', visible*, ws*
: [/code]
:
: (Assumption of greedy quantifiers)
That's generally where I was going, though I started from the "bottom" with If statements and was building up from there.
: Note that I've rolled comments into the newline declaration, because they can be on the end of any line (unless someone knows something I don't...)
They cannot come after a line-continuation (because comments cannot be inline)
: Have you considered, instead of just trying to turn this into VB.NET, compiling it straight down to .NET IL? If you'd want to do such a thing, let me know, because it could be cool to get the compiler to target Parrot too. Whether I'll have time to play with this I don't know - I've got a lot of projects on the go already - but it's an interesting idea if nothing else.
That would be pretty cool but way beyond my scope. This was really just a diversion from the conversion problem. For generating VB.NET code from VB6 code, I think the SAX-esque approach is sufficient and far easier to create.
[size=5][italic][blue][RED]i[/RED]nfidel[/blue][/italic][/size]
[code]
$ select * from users where clue > 0
no rows returned
[/code]