problem with LWP::Simple

Hello

I would be grateful for any help with this.

I want to pull an id number (UniProt protein accession number) from a file using a regex. This works OK.
I then wanted to use the number as part of a url to pull the relevant page back , so I could parse some information about the protein from the page.
The code is very basic.

My perl script:

#!/usr/bin/perl

# A script to pull out an id number from a file using a regex.
#The id number(s0 are put into an array @accnumber.
#The file I read in is html_test2.txt (attached to this mail).
#Then use the id number as part of a url to get and store a webpage.
#In this case to simplify things I just want to take the first
#element of the @accnumber array and use that in the url

use LWP::Simple;

$a = 0;

#ask for the file name

print "please enter file name", "
";

#open and read the file


$filename1 = <>;

open fileone, "$filename1"
or die;

while (!eof(fileone))

{

my $line = ;


if ( $line =~/UNIPROT:?w+s(w{6})s/)

{

@accnumber[$a]= $1."
";
$a++;

}

}


close fileone;


$query_number = @accnumber[0];

#as a sanity check I print the number to STDOUT

print $query_number;

#I call the subroutine to return the webpage

get_page($query_number);


sub get_page {

my $address = $_[0];


my $url = 'http://www.ebi.uniprot.org/uniprot-srv/xmlView.do?proteinId='
.$address
.'_ORYSA&pager.offset=0';



my $html_file = 'page.html';
my $status = getstore($url, $html_file);
die "No URL::Error" unless is_success($status);

}

exit;

and the text file I parse to get my id number using the regex:

BLASTP 2.0MP-WashU [13-Dec-2004] [decunix5.0a-ev6-IP32LF64 2004-12-15T17:03:39]

Copyright (C) 1996-2004 Washington University, Saint Louis, Missouri USA.
All Rights Reserved.

Reference: Gish, W. (1996-2004) http://blast.wustl.edu

Query= 24061 17154533 emb|CAC80823.1 (AJ251791) putative IAA1 protein [Oryza
sativa] 1e-130 235 236 99.5% top hit
(237 letters; record 1)

Database: uniprot
1,880,849 sequences; 604,459,357 total letters.
Searching....10....20....30....40....50....60....70....80....90....100% done

Smallest
Sum
High Probability
Sequences producing High-scoring Segment Pairs: Score P(N) N

UNIPROT:Q75KX3_ORYSA Q84PD9 Putative auxin-responsive pro... 1203 1.2e-121 1


##################################################################

Thanks for any help.


Comments

  • : @accnumber[$a]= $1."
    ";
    : $a++;
    :
    See how here you put a newline character after what you capture...

    : $query_number = @accnumber[0];
    :
    ...which ends up in $query_number...

    : get_page($query_number);
    :
    ...and passed to get_page...

    : my $address = $_[0];
    :
    :
    : my $url = 'http://www.ebi.uniprot.org/uniprot-srv/xmlView.do?proteinId='
    : .$address
    : .'_ORYSA&pager.offset=0';
    :
    And ends up in the URL. I'm thinking the newline character making it into the URL is the problem.

    Hope this helps,

    Jonathan

    ###
    for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
    (tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
    /(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");

  • : : @accnumber[$a]= $1."
    ";
    : : $a++;
    : :
    : See how here you put a newline character after what you capture...
    :
    : : $query_number = @accnumber[0];
    : :
    : ...which ends up in $query_number...
    :
    : : get_page($query_number);
    : :
    : ...and passed to get_page...
    :
    : : my $address = $_[0];
    : :
    : :
    : : my $url = 'http://www.ebi.uniprot.org/uniprot-srv/xmlView.do?proteinId='
    : : .$address
    : : .'_ORYSA&pager.offset=0';
    : :
    : And ends up in the URL. I'm thinking the newline character making it into the URL is the problem.
    :
    : Hope this helps,
    :
    : Jonathan
    :
    : ###
    : for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
    : (tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
    : /(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");
    :
    :

  • Thanks for taking time out to help John,

    I'll try to modify the code in line with your suggestion.


    : : : @accnumber[$a]= $1."
    ";
    : : : $a++;
    : : :
    : : See how here you put a newline character after what you capture...
    : :
    : : : $query_number = @accnumber[0];
    : : :
    : : ...which ends up in $query_number...
    : :
    : : : get_page($query_number);
    : : :
    : : ...and passed to get_page...
    : :
    : : : my $address = $_[0];
    : : :
    : : :
    : : : my $url = 'http://www.ebi.uniprot.org/uniprot-srv/xmlView.do?proteinId='
    : : : .$address
    : : : .'_ORYSA&pager.offset=0';
    : : :
    : : And ends up in the URL. I'm thinking the newline character making it into the URL is the problem.
    : :
    : : Hope this helps,
    : :
    : : Jonathan
    : :
    : : ###
    : : for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
    : : (tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
    : : /(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");
    : :
    : :
    :
    :

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories