Naming folders in flat file using description

I have a an excel file with a table as follows:

Genes Family Desc
A123 B2 Cytochrome p450 enzyme
A124 B2 Cytochrome p450 protease
B352 B1 lipid
C132 A1 heat shock 72
A331 A1 heat shock 70

i want to store these in a flat file: "family"/genelist.txt
where the genelist.txt contains all the genes.
The family folder is all the "B2s" "A1's" "A2's" etc, BUT instead of calling the folder A1 or B2 etc i want to look at its corresponding description and then name it co-ordinatingly.
i.e. the first gene A123, would be grouped in the B2 family, but the folder it is contained in should be called Cytochrome p450.
so basically i want to be able to run through the desciptions of the genes and create a suitable name for the folder to be in.
e.g. family A2 would contain a txt file with C132 and A331 inside the text file, this text file would be inside a folder called "Heat Shock"

I have managed to group the genes into folders but using the familyname to name the folders instead of using the description. this is the perl file so far:
use strict;
use warnings;
#-----------------
#Variables
#-----------------
my ($gene, $family);
my $fileID;
#-----------------
#open file
#-----------------
print "Please type filename you wish to use
";
chomp($fileID=);

open(F,"$fileID.tab") || die "can't open input file
$!
";
while(){
# get rid of newline character at end of line
chomp;
# split up line
($gene, $family) = split(/ /);
# remove " chars
# $gene =~ /"/g;
#$family =~ /"/g;
# create the family directory
mkdir $family; # don't care if it already exists
# add gene to end of file
open(OF,">>$family/Genelist.txt")
|| die "can't write genelist file
$!
";
print OF "$gene
";
close(OF);
}


Much Much appreciated!! thanks in advance

Comments

  • Were you getting the "why is nobody replying to my message" feeling for a while there?

    : I have a an excel file with a table as follows:
    :
    : Genes Family Desc
    : A123 B2 Cytochrome p450 enzyme
    : A124 B2 Cytochrome p450 protease
    : B352 B1 lipid
    : C132 A1 heat shock 72
    : A331 A1 heat shock 70
    :
    : i want to store these in a flat file: "family"/genelist.txt
    : where the genelist.txt contains all the genes.
    : The family folder is all the "B2s" "A1's" "A2's" etc, BUT instead of
    : calling the folder A1 or B2 etc i want to look at its corresponding
    : description and then name it co-ordinatingly.
    : i.e. the first gene A123, would be grouped in the B2 family, but the
    : folder it is contained in should be called Cytochrome p450.
    : so basically i want to be able to run through the desciptions of the
    : genes and create a suitable name for the folder to be in.
    : e.g. family A2 would contain a txt file with C132 and A331 inside
    : the text file, this text file would be inside a folder called "Heat
    : Shock"
    My question to you here is - how do we know what bit of the description to take and what to leave? We need some reliable way of deciding this, e.g. "it's always the first 2 words".

    Let me know and I can try and work something out.

    Jonathan

    ###
    for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
    (tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
    /(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");

  • Hi, yea i was gettin the feeling that no-one was around! :)
    ok you know what bit of description to use by looking at all the descriptions for each of the genes in that particular family and pick out the words which occur in all of the descriptions. e.g.
    a family B2 contains 3 genes a, b and c.
    The descriptions for a,b, and c are as follows:
    a) Cytochrome p450 B11 G12
    b) Cytochrome p450 B15 H23
    c) Cytochrome p450 C13 A13
    So the folder name for this family would be "Cytochrome p450" because these two words occur in all three of the gene desciptions.
    Hope this makes things a bit clearer.......
    Thanks! :)




    : My question to you here is - how do we know what bit of the description to take and what to leave? We need some reliable way of deciding this, e.g. "it's always the first 2 words".
    :
    : Let me know and I can try and work something out.
    :
    : Jonathan
    :
    : ###
    : for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
    : (tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
    : /(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");
    :
    :

  • : Hi, yea i was gettin the feeling that no-one was around! :)
    Hehe...it was one of those "hmmm...I'm not quite sure what they want so I'll leave it for later when my brain is functioning". Eventually I figured the problem was just underspecified. :-)

    : ok you know what bit of description to use by looking at all the
    : descriptions for each of the genes in that particular family and
    : pick out the words which occur in all of the descriptions. e.g.
    : a family B2 contains 3 genes a, b and c.
    : The descriptions for a,b, and c are as follows:
    : a) Cytochrome p450 B11 G12
    : b) Cytochrome p450 B15 H23
    : c) Cytochrome p450 C13 A13
    : So the folder name for this family would be "Cytochrome p450"
    : because these two words occur in all three of the gene desciptions.
    I feared that may be what you want, but didn't want to suggest it because it's the hardest one to do out of my list of possibles.

    I'll go and have a hack at it and come back with a suggestion later.

    Jonathan

    ###
    for(74,117,115,116){$::a.=chr};(($_.='qwertyui')&&
    (tr/yuiqwert/her anot/))for($::b);for($::c){$_.=$^X;
    /(p.{2}l)/;$_=$1}$::b=~/(..)$/;print("$::a$::b $::c hack$1.");

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories