==== Simple guide for HTML Web Scraping ====
\\
Install the packages required.
# Debian-based
apt install python3-bs4
apt install python3-requests
# using pip
pip install bs4
pip install requests
\\
Sample code for parsing:
# obtain html using requests
response = requests.get('http://example.org')
html = BeautifulSoup(response.text, 'html.parser')
# get page title
print(html.title)
# select using DOM selector (list of elements)
elements = html.select('#your-id .your-class a[href="value"]')
# examples on findings
if len(elements) > 0:
# get "href" or "src"
print(elements[0].get('href'))
print(elements[0].get('src'))
# or get using dictionary:
print(elements[0]['class'])
print(elements[0]['style'])
# get text of DOM
print(elements[0].get_text())
print(elements[0].string)
Documentation on BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/