🔥 Budget-Friendly Big Data Analysis: Python & Google Colab On an Everyday Laptop (2024)

1. Introduction

As a software developer I have had to face different challenges throughout my career. The use of Python as a programming language is becoming more and more widespread, and it serves as a basis for web development, AI, crypto, etc.

Big Data, according to this Oracle article, encompasses the phenomenon of “larger and more complex data sets.” In software development with Big Data, the volume of data and the speed of processing are fundamental. I consider that a large part of the development life cycle depends on these two, and due to their nature, it is to be expected that these processes are carried out on equipment whose hardware allows the rapid and efficient management of volumes ranging from thousands to billions of data.

However, in the middle of 2024, I think that circ*mstances have changed in favor of developers, so in this article, I will show you how you can do Big Data on an everyday laptop (- $300 USD) using open-source tools and optimization techniques.

2. Realistic minimum software requirements

Hewlett-Packard, also known as HP, is a multinational company considered one of the leaders in technology and the creation of computer equipment. They explain here what, according to them, are the minimum requirements that data science demands to do big data. The list of requirements they share is clear:

  • Min. 16GB of RAM memory
  • A GPU with a minimum of 4GB of memory (they emphasize the use of NVIDIA as an option for GPUs)
  • Intel® Core™ i7, i9, and Xeon®2 processor, with a minimum of 4 cores and a base speed of 2.0GHz
  • Windows 11 or Ubuntu operating system

However, not everyone can have such equipment at their fingertips. In my personal case, I code with a laptop ASUS Vivobook that I bought for $295 USD, which has:

  • 8GB RAM
  • An Intel i5-1135G7 4-core processor
  • An integrated Intel Iris Xe Graphics GPU.

So, in this article, we will take these requirements as the minimum (and we will even see if we can reduce them even further) for the development of Big Data.

3. Work tools

These are the tools we are going to use:

  • Pandas and NumPy
  • Dask
  • Google Co

3.1. Pandas and NumPy

Pandas and NumPy are two Python libraries popularly used in data science. They are used for data manipulation and scientific computing, respectively. We will use these because they can efficiently handle data structures and multidimensional arrays, which will help us deal with large amounts of data.

3.2. Dask

Dask is a library very similar in its use to Pandas and Numpy, with the difference that it is focused on large-scale distributed data processing. We will use it since we are interested in its efficiency, being able to process large amounts of data sets.

3.3. Google Collaboratory (Colab)

For a last use case, we will use the service Google Collaboratory to run Python code from the web browser. We will use it for its ability to access GPUs for free and TPUs for the use of the aforementioned libraries. It also has subscription plans for access to more powerful cloud computers. Alternatively, you can do the code locally in case you have the necessary hardware and want to do the test anyway.

Finally, this is how our work environment will look like:

🔥 Budget-Friendly Big Data Analysis: Python & Google Colab On an Everyday Laptop (1)

4. Exercise

🔥 Budget-Friendly Big Data Analysis: Python & Google Colab On an Everyday Laptop (2)

We are going to do a simple ETL using Google Colab and the mentioned libraries.

4.1. Create a Google Colab Notebook

We will create a Google Colab notebook using the following link.

4.2. Import modules

4.3. Create dataset

We are going to create an example dataset for the exercise. For this, we will create a new code block in the notebook and execute the following script:

This will create a new dummy dataset called restaurant_reviews.csv.

🔥 Budget-Friendly Big Data Analysis: Python & Google Colab On an Everyday Laptop (3)

4.3.1. Check dataset size

In another block of code, we are going to execute the following script to validate the size of the created dataset.

🔥 Budget-Friendly Big Data Analysis: Python & Google Colab On an Everyday Laptop (4)

4.4. Extract using Pandas & Dask

In another block of code, we are going to perform data loading to a data frame in Python, we are going to compare the loading speeds of both Pandas and Dask:

🔥 Budget-Friendly Big Data Analysis: Python & Google Colab On an Everyday Laptop (5)

We conclude that, for this case, Dask strongly outperforms pandas in extracting information from the dataset.

4.5. Transform & clean data using Pandas & NumPy

Now, we will perform some data cleaning and transformation using both Pandas and Numpy functions. We will start using Pandas functions:

Then, we will use NumPy functions to continue with the data frame transformation process:

4.6. Load data into DB via API

Now, we are going to simulate an upload process using a fake normalization template. This will allow us to convert each data frame entry into a request to a rest dummy api that will simulate the upload of data to a server.

5. Conclusions

In summary, this article serves as a comprehensive guide for developers who want to address Big Data challenges efficiently on affordable laptops.

By using Python, Google Colab, and core libraries like Pandas, NumPy, and Dask, users can successfully manage huge data sets with ease. Performance comparison, data cleansing processes, and simulated data loads underscore the practicality and affordability of these tools, allowing developers to do complicated tasks seamlessly.

🔥 Budget-Friendly Big Data Analysis: Python & Google Colab On an Everyday Laptop (2024)
Top Articles
This JPMorgan ETF has beaten the S&P 500 five years in a row. Here’s how
Wall Street Analysts See Archer Aviation (ACHR) as a Buy: Should You Invest?
Katie Pavlich Bikini Photos
Gamevault Agent
Hocus Pocus Showtimes Near Harkins Theatres Yuma Palms 14
Free Atm For Emerald Card Near Me
Craigslist Mexico Cancun
Hendersonville (Tennessee) – Travel guide at Wikivoyage
Doby's Funeral Home Obituaries
Vardis Olive Garden (Georgioupolis, Kreta) ✈️ inkl. Flug buchen
Select Truck Greensboro
How To Cut Eelgrass Grounded
Craigslist In Flagstaff
Shasta County Most Wanted 2022
Energy Healing Conference Utah
Testberichte zu E-Bikes & Fahrrädern von PROPHETE.
Aaa Saugus Ma Appointment
Geometry Review Quiz 5 Answer Key
Walgreens Alma School And Dynamite
Bible Gateway passage: Revelation 3 - New Living Translation
Yisd Home Access Center
Home
Shadbase Get Out Of Jail
Gina Wilson Angle Addition Postulate
Celina Powell Lil Meech Video: A Controversial Encounter Shakes Social Media - Video Reddit Trend
Walmart Pharmacy Near Me Open
Dmv In Anoka
A Christmas Horse - Alison Senxation
Ou Football Brainiacs
Access a Shared Resource | Computing for Arts + Sciences
Pixel Combat Unblocked
Umn Biology
Cvs Sport Physicals
Mercedes W204 Belt Diagram
Rogold Extension
'Conan Exiles' 3.0 Guide: How To Unlock Spells And Sorcery
Colin Donnell Lpsg
Teenbeautyfitness
Weekly Math Review Q4 3
Facebook Marketplace Marrero La
Nobodyhome.tv Reddit
Topos De Bolos Engraçados
Gregory (Five Nights at Freddy's)
Grand Valley State University Library Hours
Holzer Athena Portal
Hampton In And Suites Near Me
Stoughton Commuter Rail Schedule
Bedbathandbeyond Flemington Nj
Free Carnival-themed Google Slides & PowerPoint templates
Otter Bustr
San Pedro Sula To Miami Google Flights
Selly Medaline
Latest Posts
Article information

Author: Delena Feil

Last Updated:

Views: 6216

Rating: 4.4 / 5 (45 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Delena Feil

Birthday: 1998-08-29

Address: 747 Lubowitz Run, Sidmouth, HI 90646-5543

Phone: +99513241752844

Job: Design Supervisor

Hobby: Digital arts, Lacemaking, Air sports, Running, Scouting, Shooting, Puzzles

Introduction: My name is Delena Feil, I am a clean, splendid, calm, fancy, jolly, bright, faithful person who loves writing and wants to share my knowledge and understanding with you.