Hi all and thanks in advance for the help,
I am trying to learn python but have little to no knowledge writing programs (I do use some SAS and STATA). A while back I took a crack at writing a python program to extract some data off of the International Trade Commission's website and ended up getting help from someone on this forum. They pretty much threw out my old code and rewrote it. Thus, I used it but did not really know what it was doing. I am now trying to edit this code to fix a few problems and I was wondering if someone could go through the somewhat menial task of telling me what everything in the code actually does. I would greatly appreciate it. It is a very short loop so I don't think it would take much time, but knowing what the syntax and operators meant and did would be invaluable to me. Thanks if you can help out!
outfile = open("1.Raw_Data.txt", "w")
reg_title = re.compile(r'Inv.s+#(.*):')
reg_info = re.compile(r'#0000ff">(.*?)')
initseg = "http://info.usitc.gov/ouii/public/337inv.nsf/56ff5fbca63b069e852565460078c0ae/
endseg = "?OpenDocument"
CASES = 
for start in range(1,752,30):
htmlAll = urllib2.urlopen("http://info.usitc.gov/ouii/public/337inv.nsf/All?OpenView&Start=%d
" % start).read()
reg_url = re.compile(r'<a href="/ouii/public/337inv.nsf/56ff5fbca63b069e852565460078c0ae/(w+)')
found = reg_url.findall(htmlAll)
for case in CASES:
complete_url = initseg + case + endseg
page = urllib2.urlopen(complete_url).read()
outfile.write ("|" + "|".join([reg_title.search(page).group(1)] + reg_info.findall(page)) + "
print "|" + "|".join([reg_title.search(page).group(1)] + reg_info.findall(page))
0 · ·