Preserving formatting in perl

I'm relatively new to Perl and i'm having trouble with a script being used to read xml into a mysql database.

The problem is the following sub_routine char_handler reads in the strings of data from an xml file. However it treats a string as a single char event(which is fine) but it treats special char in xml as seperate char events(eg <, >) which is part of the problem. Although it outputs them normally in the DB(eg <, >) as it's treating them as seperate char_events so the endTag_handler is recognising <,> as eq to 'class' and it's adding a newline to each and ruining the format eg

Instead of:

String
String
String < String
String

I'm getting

String
String
String
<
String
String

in the DB. As there are very few instances of special char in the DB and
my priority being the format of info in the DB i can't simply remove the newlines in endTag unless i have an alternative. Any help would be appreciated



# process any character event
#On encountering character data, it is read into an array
#The array @lines has gobal scope
#The contents @lines is then available, the typical scenario is to use
#@lines contents in the end tag handler for the different elements
sub char_handler {

my( $expat, $data ) = @_;

chomp($data);

#Remove leading whitespaces
#Note use of tags
$data =~ /(^s*)(.*)/;


#Place each line encoutered into @lines
#nesting level must be 2 deep (i.e not root level)
#and added only if not empty
if(( $expat->element_index >= 2 ) && ( $2 ne "") ){
@lines = (@lines , $2);
}
}

#description endTag_handler
if( $element eq 'description') {
if($expat->current_element eq 'class'){
$parent_info{ $element } = join("class
", @lines);
$parent_info{ $element } =~ s//'/g;
}
elsif($expat->current_element eq 'exception') {
$exceptions{ "exception" . $element } = join("
", @lines);
$exceptions{ "exception" . $element } =~ s//'/g;
}
elsif($expat->current_element eq 'exceptionParameter') {
$exceptions{ "exceptionParameter" . $element } = join("
", @lines);
$exceptions{ "exceptionParameter" . $element } =~ s//'/g;
}
else {
$currentElement{ $element } = join("
", @lines);
$currentElement{ $element } = s//'/g;
}
undef @lines;
}

Comments

  • : I'm relatively new to Perl and i'm having trouble with a script being used to read xml into a mysql database.
    :
    : The problem is the following sub_routine char_handler reads in the strings of data from an xml file. However it treats a string as a single char event(which is fine) but it treats special char in xml as seperate char events(eg <, >) which is part of the problem. Although it outputs them normally in the DB(eg <, >) as it's treating them as seperate char_events so the endTag_handler is recognising <,> as eq to 'class' and it's adding a newline to each and ruining the format eg
    :
    : Instead of:
    :
    : String
    : String
    : String < String
    : String
    :
    : I'm getting
    :
    : String
    : String
    : String
    : <
    : String
    : String
    :
    : in the DB. As there are very few instances of special char in the DB and
    : my priority being the format of info in the DB i can't simply remove the newlines in endTag unless i have an alternative. Any help would be appreciated
    :
    :
    :
    : # process any character event
    : #On encountering character data, it is read into an array
    : #The array @lines has gobal scope
    : #The contents @lines is then available, the typical scenario is to use
    : #@lines contents in the end tag handler for the different elements
    : sub char_handler {
    :
    : my( $expat, $data ) = @_;
    :
    : chomp($data);
    :
    : #Remove leading whitespaces
    : #Note use of tags
    : $data =~ /(^s*)(.*)/;
    :
    :
    : #Place each line encoutered into @lines
    : #nesting level must be 2 deep (i.e not root level)
    : #and added only if not empty
    : if(( $expat->element_index >= 2 ) && ( $2 ne "") ){
    : @lines = (@lines , $2);
    : }
    : }
    :
    : #description endTag_handler
    : if( $element eq 'description') {
    : if($expat->current_element eq 'class'){
    : $parent_info{ $element } = join("class
    ", @lines);
    : $parent_info{ $element } =~ s//'/g;
    : }
    : elsif($expat->current_element eq 'exception') {
    : $exceptions{ "exception" . $element } = join("
    ", @lines);
    : $exceptions{ "exception" . $element } =~ s//'/g;
    : }
    : elsif($expat->current_element eq 'exceptionParameter') {
    : $exceptions{ "exceptionParameter" . $element } = join("
    ", @lines);
    : $exceptions{ "exceptionParameter" . $element } =~ s//'/g;
    : }
    : else {
    : $currentElement{ $element } = join("
    ", @lines);
    : $currentElement{ $element } = s//'/g;
    : }
    : undef @lines;
    : }
    :
    :


    looks to me like the problem is elsewhere in the script. The code you posted is not doing anything with transforming characters into a character entity: ie < <

    are you using an XML parsing module like XML::Simple or XML::Parser?
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories

In this Discussion