BeautifulSoup

From Code Self Study Wiki
Jump to: navigation, search

HTML parsing:

Example of scraping URLs out of a page and displaying the text, href, and all the attributes on each link:

import requests
from bs4 import BeautifulSoup
 
page = requests.get('http://codeselfstudy.com/')
soup = BeautifulSoup(page.text, 'lxml') # specifying the parser is optional
print('Anchor Text\tAttributes\tSingle Attribute')
for anchor in soup.findAll('a', href=True):
    print('{}\t{}\t{}'.format(anchor.text, anchor.attrs, anchor['href']))

References[edit]