Welcome to the new platform of Programmer's Heaven! We apologize for the inconvenience caused, if you visited us from a broken link of the previous version. The main reason to move to a new platform is to provide more effective and collaborative experience to you all. Please feel free to experience the new platform and use its exciting features. Contact us for any issue that you need to get clarified. We are more than happy to help you.
reading PDF file with pypdf no contents are captured. Please help
I am trying to read a PDF file using pypdf and write onto a text file. But its not working. content value in the below code is just "u/n/n/n/n/n'...PDF file has 5 pages so 5 times new line character and in the begining 'u'..whats going wrong please help. why the contents are not coming. Any help is highly appreciated. Thanks Sujan
content = ""
p = file(path, "rb")
pdf = pyPdf.PdfFileReader(p)
for i in range(0, pdf.getNumPages()):
content += pdf.getPage(i).extractText() + "
content = " ".join(content.replace(u"xa0", " ").strip().split())
pdfl = getPDFContent("test.pdf").encode("ascii", "ignore")
if __name__ == "__main__":
0 · ·