reading PDF file with pypdf no contents are captured. Please help - Programmers Heaven

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories

reading PDF file with pypdf no contents are captured. Please help

sujanaerosujanaero Posts: 12Member
I am trying to read a PDF file using pypdf and write onto a text file. But its not working. content value in the below code is just "u/n/n/n/n/n'...PDF file has 5 pages so 5 times new line character and in the begining 'u'..whats going wrong please help. why the contents are not coming. Any help is highly appreciated. Thanks Sujan

[code]
#!/usr/bin/python
import pyPdf
import sys

def getPDFContent(path):
content = ""
p = file(path, "rb")
pdf = pyPdf.PdfFileReader(p)
for i in range(0, pdf.getNumPages()):
content += pdf.getPage(i).extractText() + "
"
content = " ".join(content.replace(u"xa0", " ").strip().split())
return content

def main():
f= open('test.txt','w')
pdfl = getPDFContent("test.pdf").encode("ascii", "ignore")
f.write(pdfl)
f.close()

if __name__ == "__main__":
main()
[/code]

Comments

Sign In or Register to comment.