I'm relatively new to Perl and i'm having trouble with a script being used to read xml into a mysql database.
The problem is the following sub_routine char_handler reads in the strings of data from an xml file. However it treats a string as a single char event(which is fine) but it treats special char in xml as seperate char events(eg <, >) which is part of the problem. Although it outputs them normally in the DB(eg <, >) as it's treating them as seperate char_events so the endTag_handler is recognising <,> as eq to 'class' and it's adding a newline to each and ruining the format eg
Instead of:
String
String
String < String
String
I'm getting
String
String
String
<
String
String
in the DB. As there are very few instances of special char in the DB and
my priority being the format of info in the DB i can't simply remove the newlines in endTag unless i have an alternative. Any help would be appreciated
# process any character event
#On encountering character data, it is read into an array
#The array
@lines has gobal scope
#The contents
@lines is then available, the typical scenario is to use
#
@lines contents in the end tag handler for the different elements
sub char_handler {
my( $expat, $data ) =
@_;
chomp($data);
#Remove leading whitespaces
#Note use of tags
$data =~ /(^s*)(.*)/;
#Place each line encoutered into
@lines #nesting level must be 2 deep (i.e not root level)
#and added only if not empty
if(( $expat->element_index >= 2 ) && ( $2 ne "") ){
@lines = (
@lines , $2);
}
}
#description endTag_handler
if( $element eq 'description') {
if($expat->current_element eq 'class'){
$parent_info{ $element } = join("class
",
@lines);
$parent_info{ $element } =~ s//'/g;
}
elsif($expat->current_element eq 'exception') {
$exceptions{ "exception" . $element } = join("
",
@lines);
$exceptions{ "exception" . $element } =~ s//'/g;
}
elsif($expat->current_element eq 'exceptionParameter') {
$exceptions{ "exceptionParameter" . $element } = join("
",
@lines);
$exceptions{ "exceptionParameter" . $element } =~ s//'/g;
}
else {
$currentElement{ $element } = join("
",
@lines);
$currentElement{ $element } = s//'/g;
}
undef
@lines;
}
Comments
:
: The problem is the following sub_routine char_handler reads in the strings of data from an xml file. However it treats a string as a single char event(which is fine) but it treats special char in xml as seperate char events(eg <, >) which is part of the problem. Although it outputs them normally in the DB(eg <, >) as it's treating them as seperate char_events so the endTag_handler is recognising <,> as eq to 'class' and it's adding a newline to each and ruining the format eg
:
: Instead of:
:
: String
: String
: String < String
: String
:
: I'm getting
:
: String
: String
: String
: <
: String
: String
:
: in the DB. As there are very few instances of special char in the DB and
: my priority being the format of info in the DB i can't simply remove the newlines in endTag unless i have an alternative. Any help would be appreciated
:
:
:
: # process any character event
: #On encountering character data, it is read into an array
: #The array @lines has gobal scope
: #The contents @lines is then available, the typical scenario is to use
: #@lines contents in the end tag handler for the different elements
: sub char_handler {
:
: my( $expat, $data ) = @_;
:
: chomp($data);
:
: #Remove leading whitespaces
: #Note use of tags
: $data =~ /(^s*)(.*)/;
:
:
: #Place each line encoutered into @lines
: #nesting level must be 2 deep (i.e not root level)
: #and added only if not empty
: if(( $expat->element_index >= 2 ) && ( $2 ne "") ){
: @lines = (@lines , $2);
: }
: }
:
: #description endTag_handler
: if( $element eq 'description') {
: if($expat->current_element eq 'class'){
: $parent_info{ $element } = join("class
", @lines);
: $parent_info{ $element } =~ s//'/g;
: }
: elsif($expat->current_element eq 'exception') {
: $exceptions{ "exception" . $element } = join("
", @lines);
: $exceptions{ "exception" . $element } =~ s//'/g;
: }
: elsif($expat->current_element eq 'exceptionParameter') {
: $exceptions{ "exceptionParameter" . $element } = join("
", @lines);
: $exceptions{ "exceptionParameter" . $element } =~ s//'/g;
: }
: else {
: $currentElement{ $element } = join("
", @lines);
: $currentElement{ $element } = s//'/g;
: }
: undef @lines;
: }
:
:
looks to me like the problem is elsewhere in the script. The code you posted is not doing anything with transforming characters into a character entity: ie < <
are you using an XML parsing module like XML::Simple or XML::Parser?