How I DIY’d my Budget Using Python for Selenium and Beautiful Soup (2024)

How I DIY’d my Budget Using Python for Selenium and Beautiful Soup (3)

I’m an avid fan of personality tests. I love reading through my pre-packaged type description I swear is unique to me and only me. I love how they make me feel as though a pair of psychologists have stolen a glance at my soul using answers I myself provided. But I know not everyone agrees. For the naysayers, whether they say the mainstream options are too cliche, or discount the result because it’s very much self-reported, I present: the Budget Type Index.

For those now expecting to receive a pdf with your personalized profile, I hope you’re not too disappointed when I tell you this is something you’ll have to make for yourself. Okay, fine, this isn’t really a personality test in the traditional sense, but I do think a well crafted budget can tell you just as much about yourself, if not more, as any of those alluring online quizzes.

When I set out to do this project, I wanted to understand where my money was going, and by extension, understand what I prioritized. I was previously manually aggregating all my spending across my multiple bank accounts, including non-banks but frequently used services like Venmo. I had searched for a service that would not only automate this process and show me my historical data, but also do it without a monthly fee. There wasn’t quite anything that fit all these criteria, so I created my own using Python.

For anyone else also looking to measure and manage their spending, gathering the data is the first, and most important, step. I’ve broken down the rest of this article based on the two tools I’ve used:

  1. Selenium
  2. Beautiful Soup

I’m happy to help if you want to build your own budgeting tool — feel free to reach out at [email protected] even if we don’t know each other!

Selenium automates browsers. Originally created to test web applications, it is now also widely used for web scraping. I’ve included the entirety of my code below before breaking down how I used this tool.

Getting set up

You’ll first need to install two software packages.

  • The Selenium package

Install by typing the following in your command prompt:

pip install selenium
  • The web driver of the browser you’re using

The Chrome driver (which is what I’m using) can be found here. There are different drivers for different versions of Chrome. To find out what version you’re using, click the three vertical dots on the top right of your browser. This will take you to settings. Then, open the menu and click “About Chrome” — this will display your Chrome version. Download the applicable driver and make sure it’s in your Python PATH.

A more thorough installation explanation, including links to drivers of other browsers, can be found in the docs here.

Size matters

Now that you have the necessary packages, you can start dictating web elements the driver should select. One thing that can affect the location of these elements is the size of your window. For maximum consistency, I like to full screen my window before starting any processes.

# from line 8
browser.maximize_window()

Locating elements

To retrieve transactions, we first want Selenium to log into the bank’s website. We can identify what elements need to be selected by inspecting the website’s HTML page. To pull up the page, go to the website, and identify the log in box. Right click the Online ID field. Select “Inspect”.

How I DIY’d my Budget Using Python for Selenium and Beautiful Soup (4)

You’ll have this elements locator pop up, highlighted on the field you chose (in this case, Online ID).

How I DIY’d my Budget Using Python for Selenium and Beautiful Soup (5)

There are eight different ways to locate your element in selenium. These are by:

  • Name
elem = driver.find_element_by_name("INSERT-NAME")

This is the one I decided to use, as indicated by the poorly drawn red circle in the screenshot above.

  • ID
elem = driver.find_element_by_id("INSERT-ID")

This is considered to be the most accurate method, as each element’s ID is unique.

  • Link Text
How I DIY’d my Budget Using Python for Selenium and Beautiful Soup (6)
elem = driver.find_element_by_link_text("INSERT-NAME-OF-LINK-ON-PAGE")# example if I wanted to select link circled above
elem = driver.find_element_by_link_text("Vote Is Likely to Fall Largely Along Party Lines")
  • Partial Link Text
elem = driver.find_element_by_partial_link_text("DONT-NEED-FULL-LINK")# example if I still wanted to select above NYT link
elem = driver.find_element_by_link_text("Vote Is Likely to Fa")
  • CSS Selector
elem = driver.find_element_by_css_selector("INSERT-CSS-SYNTAX")

Good examples on CSS selectors can be found here: https://saucelabs.com/resources/articles/selenium-tips-css-selectors

  • Tag Name
# referring to an HTML tag. first element with tag is returned.
elem = driver.find_element_by_tag_name("INSERT-TAG-NAME")
  • Class Name
elem = driver.find_element_by_class_name("INSERT-CLASS-NAME")
  • XPath
elem = driver.find_element_by_xpath("INSERT-XPATH")

XPath is a language used for locating nodes in a XML doc. This is useful for when there is no suitable id or name attribute for your target element. The basic format is as follows:

xpath = //tagname[@attribute = "value"]

You can read more on xpath here.

Be mindful that all these methods will only select the first element it finds. To select multiple elements, use the same methods, but replace the word “element” with “elements.” (e.g. driver.find_elements_by_name(“INSERT-NAME”))

Inputting keys

After you find the login element, the next step is to input your credentials. This is done with the function send_keys().

username = browser.find_element_by_name("onlineId1").send_keys("YOUR-USERNAME")time.sleep(2)password = browser.find_element_by_name("passcode1")
password.send_keys(<YOUR PASSWORD>)

Remember to protect yourself by not committing your password anywhere.

I added a wait to tell Selenium to pause for two seconds between entering my username and password using time.sleep(). I found that without it, Selenium moved too fast and the browser had a hard time keeping up.

I would typically press the “Enter” button after I typed in my credentials, so I wanted to do the same in Selenium. Luckily, Selenium has a list of standard keyboard keys. In this case, I used Keys.RETURN:

password.send_keys(Keys.RETURN)

Now you’re in!

To see if you located the elements and inputted your credentials correctly, you can try running your code. A new Chrome instance will pop up and you can see the browser running automatically. This instance is a different browser than the one you use regularly. It contains no cookies and disappears after you are done. Therefore, if you do need cookies, you can check out how to add them on this website.

I can see that my code ran correctly when this Chrome instance takes me to my bank account home page. I see two links: one to my checking account and the other to my credit card. To click these links, I use find_element_by_link_text and select using the click() method.

browser.find_element_by_link_text('Bank of America Travel Rewards Visa Platinum Plus - ****').click()

Once you are on the page with the transactions you want, retrieve the page_source from the web driver and store it in a variable. This will be used for parsing later.

boa_travel_html = browser.page_source

Now the only thing left to do is to repeat with your other bank accounts.

iFrames

The process was nearly the same for my other account at Barclays, aside from a pesky iFrame. An iFrame, or inline frame, is an HTML document embedded inside another HTML document on a website. I first suspected this might be getting in my way when I received an Element Not Found error despite clearly locating the element I wanted by its name. Luckily, Selenium has an easy way to navigate to an iFrame using the switch_to method.

browser.switch_to.frame(browser.find_element_by_tag_name("iframe"))
browser.find_element_by_name("uxLoginForm.username")

Continue to retrieve the page source using the same method as in the Bank of America example.

Headless browser

Once you know that your code works, you can expedite the process by getting rid of the browser that pops up the next time you run your program.

from selenium import webdriver from selenium.webdriver.chrome.options import Options
chrome_options = Options()

chrome_options.add_argument("--headless")

driver = webdriver.Chrome(options = chrome_options)

You now have all your necessary data. It may not be in a very readable format, but making it usable is what Beautiful Soup is for.

Beautiful Soup is a Python package for parsing HTML files. Now that we have the necessary HTML pages, we can use Beautiful Soup to parse it for the information we need. Again, I’ve included the code in its entirety before diving in below.

Parsing transaction information

It’s time to check out the HTML page you retrieved earlier. Because the jumbled plain text page is so… jumbled, I chose to navigate the HTML via the source itself, by right clicking each of the transactions on the bank website and selecting “Inspect.” This highlighted the transaction in the web page element inspector (used earlier to identify login boxes with Selenium).

How I DIY’d my Budget Using Python for Selenium and Beautiful Soup (7)

The data I wanted to gather included the date, the description of the transaction, and the dollar amount. As seen above, these pieces of information were nested in multiple “td” tags within the parent “tr” tags. I used a combination of find and find_all functions to move along the tree until I arrived at the tag containing the text I wanted. The snippet below is how I retrieved the date.

# narrowed down to largest parent container 
containers = rows.find_all(‘tr’, class_ = [‘trans-first-row odd’, ‘trans-first-row even’, ‘even’, ‘odd’])
dateli = []
descli = []
amtli = []
pending_counter = 0
for container in containers:
date = container.find(‘td’, headers = ‘transaction-date’) . .get_text(strip=True)

Since how you use Beautiful Soup is so specific to the web page you’re looking at (as evidenced by the separate functions I made for each page I retrieved), instead of running through my code line by line, I wanted to instead point out irregularities and interesting tidbits I found to help your process be as efficient as possible.

Class is class_

All the Beautiful Soup find functions take HTML attributes as keyword arguments. While this is pretty straightforward for most attributes, in Python, since class is a reserved keyword, you can use class_ to represent its HTML counterpart.

containers = rows.find_all(‘tr’, class_ = [‘trans-first-row odd’, ‘trans-first-row even’, ‘even’, ‘odd’])

Lambda functions in soup.find()

The find functions can also take other functions as arguments. For a quick way to locate specific tags that fit multiple criteria, try inserting a lambda function.

# Example 1
rows = boa_travel_table.find(lambda tag: tag.name=='tbody')
# Example 2
boa_checking_table = boa_checking_soup.find(lambda tag: tag.name == 'table' and tag.has_attr('summary') and tag['summary'] == 'Account Activity table, made up of Date, Description, Type, Status, Dollar Amount, and Available balance columns.')

Example 1 is pretty simple. I could also have done without the lambda function to find the same element using this:

rows = boa_travel_table.find('tbody', class_ = 'trans-tbody-wrap')

Example 2 is where the lambda function’s power really shines. By combining multiple criteria and using Python’s has_attr, my power to search for exactly what I want increases exponentially. Another good example of lambda’s usefulness (and an explanation of lambda!) can be found here, where the author takes the Python isinstance function to conduct Beautiful Soup searches.

Beautiful Soup’s text vs. string

In Rows 8–19 of my Beautiful Soup code above, I narrowed down the tags (or containers as I like to call them) to the largest one that contained all three pieces of information I wanted to extract (date, description, amount) for each transaction. To extract data from these drilled down containers, I used soup.tag.get_text().

date = container.find('td', headers = 'transaction-date').get_text(strip=True)

If you read through the Beautiful Soup documentation, you may have seen soup.tag.string instead used to extract text. This is the function I first used, but I quickly found it did not work in this situation. soup.tag.string only returns a NavigableString type object, which further must be the only object present in the tag.

soup.tag.get_text(), on the other hand, can access all its childrens’ strings (even if it’s not a direct child) and returns a Unicode object. Therefore, if the text data you want to extract lives within a child tag (you can see the a tag within the td tag in the screenshot below), you should use soup.tag.get_text().

How I DIY’d my Budget Using Python for Selenium and Beautiful Soup (8)

If you prefer slightly cleaner code, you can also use soup.tag.text. This calls get_text(), and basically does the same thing, but I prefer the original get_text() as it supports keyword arguments like separator, strip, and types. For this project, I included strip=True as a keyword argument to strip out any white spaces from the text.

You now have the power to retrieve all your financial data from your sources by running a single program. This is your start to creating your own Budget Type Index and finding out more about yourself through your spending habits. Head off to collect your data points, and become the best financial version of yourself!

How I DIY’d my Budget Using Python for Selenium and Beautiful Soup (2024)

FAQs

How to use beautiful soup with Selenium Python? ›

The Plan
  1. Identify the page(s) with the information we want and review the source code.
  2. Outline a path for navigating the pages and forms to access the data we're targeting.
  3. Implement the Selenium methods to navigate the course we've chosen.
  4. Pass the content of each page to Beautiful Soup to parse.

Is Selenium or Beautifulsoup better for web scraping? ›

Let's recap: Beautiful Soup is the perfect starting point for beginners, offering a gentle introduction to web scraping. Selenium shines when you need to handle user interactions and JavaScript-heavy websites. And Scrapy? Well, that's the powerhouse you want by your side for large-scale, concurrent data extraction.

How to scrape table data using Selenium Python? ›

To scrap the table, we need to find the XPath of its body. In the inspection, we can find the table ( <table class=”XClist”> ) with its body just below ( <tbody> ). Right-click and copy the XPath. We save the XPath as table_XPath .

How to extract data from HTML using Beautiful Soup? ›

Steps for Scraping Any Website
  1. Sending an HTTP GET request to the URL of the webpage that you want to scrape, which will respond with HTML content. ...
  2. Fetching and parsing the data using Beautifulsoup and maintain the data in some data structure such as Dict or List.

Is BeautifulSoup good for web scraping? ›

BeautifulSoup is a powerful library in Python used for web scraping purposes. It helps in parsing HTML and XML documents, making it easy to navigate, search, and modify the parse tree.

Is BeautifulSoup better than Scrapy? ›

BeautifulSoup allows for more granular and precise parsing, which is excellent for extracting data from complicated or irregular HTML. No built-in data pipeline. Unlike Scrapy, it doesn't have a built-in data pipeline, so you'll need to handle data storage and processing manually. Not asynchronous.

How to get all data from a table in Selenium? ›

Using the findElement(By. xpath()) function and an xpath() expression that matches the table element's id property, the first line of code locates the table element on the web page. The xpath() expression is used in the second line of code to discover every row in the database using the findElement(By.

How to fetch data in Selenium Python? ›

A step-by-step guide to Selenium web scraping
  1. Step 1: Install and Imports. ...
  2. Step 2: Install and Access WebDriver. ...
  3. Step 3: Access Website Via Python. ...
  4. Step 4: Locate Specific Information You're Scraping. ...
  5. Step 5: Do it together. ...
  6. Step 6: Store the data.
Mar 20, 2023

Does BeautifulSoup use XPath? ›

Can we use XPath with BeautifulSoup? Technically, no. But we can BeautifulSoup4 with lxml Python library to achieve that.

Why is it called BeautifulSoup? ›

Beautiful Soup was started in 2004 by Leonard Richardson. It takes its name from the poem Beautiful Soup from Alice's Adventures in Wonderland and is a reference to the term "tag soup" meaning poorly-structured HTML code.

Is BeautifulSoup lxml or HTML? ›

BeautifulSoup is a Python package that parses broken HTML, just like lxml supports it based on the parser of libxml2. BeautifulSoup uses a different parsing approach. It is not a real HTML parser but uses regular expressions to dive through tag soup. It is therefore more forgiving in some cases and less good in others.

How do I import BeautifulSoup in Python? ›

How To Install BeautifulSoup
  1. sudo pip3 install beautifulsoup4. To install BeautifulSoup on a Windows machine run:
  2. pip3 install beautifulsoup4. To import BeautifulSoup into your Python script:
  3. from bs4 import BeautifulSoup.

What is the function of BeautifulSoup in Python? ›

Beautiful Soup is a Python package for parsing HTML and XML documents. It creates a parse tree for parsed web pages based on specific criteria that can be used to extract, navigate, search, and modify data from HTML, which is mostly used for web scraping.

How to scrape images using Selenium Python? ›

Web Scraping Images with Python and Selenium
  1. Install Selenium Package.
  2. Import the Libraries.
  3. Install the Web Driver.
  4. Launch Browser and Open the URL.
  5. Load the Images.
  6. Review the Web Page's HTML Structure.
  7. Find and Extract Images.
  8. Download Images.

Top Articles
Credit Card With $5,000 Limit With Bad Credit
Understanding the Federal Reserve's Role in Mortgage Rates
English Bulldog Puppies For Sale Under 1000 In Florida
Katie Pavlich Bikini Photos
Gamevault Agent
Pieology Nutrition Calculator Mobile
Hocus Pocus Showtimes Near Harkins Theatres Yuma Palms 14
Hendersonville (Tennessee) – Travel guide at Wikivoyage
Compare the Samsung Galaxy S24 - 256GB - Cobalt Violet vs Apple iPhone 16 Pro - 128GB - Desert Titanium | AT&T
Vardis Olive Garden (Georgioupolis, Kreta) ✈️ inkl. Flug buchen
Craigslist Dog Kennels For Sale
Things To Do In Atlanta Tomorrow Night
Non Sequitur
Crossword Nexus Solver
How To Cut Eelgrass Grounded
Pac Man Deviantart
Alexander Funeral Home Gallatin Obituaries
Energy Healing Conference Utah
Geometry Review Quiz 5 Answer Key
Hobby Stores Near Me Now
Icivics The Electoral Process Answer Key
Allybearloves
Bible Gateway passage: Revelation 3 - New Living Translation
Yisd Home Access Center
Pearson Correlation Coefficient
Home
Shadbase Get Out Of Jail
Gina Wilson Angle Addition Postulate
Celina Powell Lil Meech Video: A Controversial Encounter Shakes Social Media - Video Reddit Trend
Walmart Pharmacy Near Me Open
Marquette Gas Prices
A Christmas Horse - Alison Senxation
Ou Football Brainiacs
Access a Shared Resource | Computing for Arts + Sciences
Vera Bradley Factory Outlet Sunbury Products
Pixel Combat Unblocked
Movies - EPIC Theatres
Cvs Sport Physicals
Mercedes W204 Belt Diagram
Mia Malkova Bio, Net Worth, Age & More - Magzica
'Conan Exiles' 3.0 Guide: How To Unlock Spells And Sorcery
Teenbeautyfitness
Where Can I Cash A Huntington National Bank Check
Topos De Bolos Engraçados
Sand Castle Parents Guide
Gregory (Five Nights at Freddy's)
Grand Valley State University Library Hours
Hello – Cornerstone Chapel
Stoughton Commuter Rail Schedule
Nfsd Web Portal
Selly Medaline
Latest Posts
Article information

Author: Laurine Ryan

Last Updated:

Views: 5553

Rating: 4.7 / 5 (77 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Laurine Ryan

Birthday: 1994-12-23

Address: Suite 751 871 Lissette Throughway, West Kittie, NH 41603

Phone: +2366831109631

Job: Sales Producer

Hobby: Creative writing, Motor sports, Do it yourself, Skateboarding, Coffee roasting, Calligraphy, Stand-up comedy

Introduction: My name is Laurine Ryan, I am a adorable, fair, graceful, spotless, gorgeous, homely, cooperative person who loves writing and wants to share my knowledge and understanding with you.