How to read or parse data from Web Pages?
Sometimes we need to extract text data from blogs and other HTML web pages for our analysis.
Beautifulsoup is required library for this recipe. Installing Beautifulsoup on your computer is a very simple. You simply need to install it using pip.
pip install bs4
Blog or Web Page Data Collection for Analysis
from urllib.request import Request, urlopen from bs4 import BeautifulSoup req = Request('http://www.cmegroup.com/trading/products/#sortField=oi&sortAsc=false&venues=3&page=1&cleared=1&group=1', headers={'User-Agent': 'Mozilla/5.0'}) webpage = urlopen(req).read() # Parsing soup = BeautifulSoup(webpage, 'html.parser') # Formating the parsed html file strhtm = soup.prettify() # Print first 500 lines print(strhtm[:500]) # Extract meta tag value print(soup.title.string) print(soup.find('meta', attrs={'property':'og:description'})) # Extract anchor tag value for x in soup.find_all('a'): print(x.string) # Extract Paragraph tag value for x in soup.find_all('p'): print(x.text)
2019-04-24T00:47:33+05:30
2019-04-24T00:47:33+05:30
Amit Arora
Amit Arora
Python Programming Tutorial
Python
Practical Solution