This is an old revision of the document!
Install the packages required.
# Debian-based apt install python3-bs4 apt install python3-requests # using pip pip install bs4 pip install requests
Sample code for parsing:
# obtain html using requests response = requests.get('http://example.org') html = BeautifulSoup(response.text, 'html.parser') # select using DOM selector (list of elements) elements = html.select('your.css selector[attr="value"]') # examples on findings if len(elements) > 0: # get "value" attribute print(elements[0].get('value')) # get "href" or "src" print(elements[0].get('href')) # get class print(elements[0]['class']) # get text of DOM print(elements[0].get_text())
Documentation on BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/