Install the packages required.
# Debian-based apt install python3-bs4 apt install python3-requests # using pip pip install bs4 pip install requests
Sample code for parsing:
# obtain html using requests response = requests.get('http://example.org') html = BeautifulSoup(response.text, 'html.parser') # get page title print(html.title) # select using DOM selector (list of elements) elements = html.select('#your-id .your-class a[href="value"]') # examples on findings if len(elements) > 0: # get "href" or "src" print(elements[0].get('href')) print(elements[0].get('src')) # or get using dictionary: print(elements[0]['class']) print(elements[0]['style']) # get text of DOM print(elements[0].get_text()) print(elements[0].string)
Documentation on BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/