How to extract data from PDF file?
Sometimes data will be stored as PDF files, hence first we need to extract text data from PDF file and then use it for further analysis.
PyPDF2 is required library for this recipe. Installing PyPDF2 on your computer is a very simple. You simply need to install it using pip.
pip install PyPDF2
PDF Data Collection for Analysis
import PyPDF2 from PyPDF2 import PdfFileReader # Creating a pdf file object. pdf = open("test.pdf", "rb") # Creating pdf reader object. pdf_reader = PyPDF2.PdfFileReader(pdf) # Checking total number of pages in a pdf file. print("Total number of Pages:", pdf_reader.numPages) # Creating a page object. page = pdf_reader.getPage(200) # Extract data from a specific page number. print(page.extractText()) # Closing the object. pdf.close()
2019-04-29T19:41:31+05:30
2019-04-29T19:41:31+05:30
Amit Arora
Amit Arora
Python Programming Tutorial
Python
Practical Solution