NoBIGTech Wiki Técnico

This is an old revision of the document!

Simple guide for HTML Web Scraping

Install the packages required.

# Debian-based
apt install python3-bs4
apt install python3-requests
 
# using pip
pip install bs4
pip install requests

Sample code for parsing:

# obtain html using requests
response = requests.get('http://example.org')
html = BeautifulSoup(response.text, 'html.parser')
 
# select using DOM selector (list of elements)
elements = html.select('your.css selector[attr="value"]')
 
# examples on findings
if len(elements) > 0:
    # get "value" attribute
    print(elements[0].get('value'))
    # get "href" or "src"
    print(elements[0].get('href'))
 
    # get class
    print(elements[0]['class'])
 
    # get text of DOM
    print(elements[0].get_text())
    print(elements[0].string)

Documentation on BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

NoBIGTech Wiki Técnico

User Tools

Site Tools

Simple guide for HTML Web Scraping

Page Tools