Best Practices in Structuring Python Projects | Dagster Blog (2024)

Python is a versatile and widely-adopted programming language used for everything from data analysis to web development. However, as Python projects grow in complexity, it can become challenging to keep track of all the moving parts and ensure that everything stays organized.

That's where having an understanding of Python file and project directory structures comes in. In this article, we'll review some key concepts in structuring Python projects and how to best apply them.

If you're just starting out, these best practices will not only help you write better code, but they can also help you better maintain and scale your Python codebases or data pipelines.

Table of Contents

  • Why break up your projects?
  • 9 Best practices for structuring Python projects
  • Demo: A simple data engineering project
  • Structuring projects for easy collaboration
  • Python project folder structure and key files

Why break up your projects?

As Python programs become larger, they become harder to manage. This is especially true if a team is collaborating on the project and making changes to multiple aspects of the project at once.

In order to make the Python project more manageable and maintainable, we break large, unwieldy programming tasks into more manageable subtasks or modules. Modules, as their name suggests, can then be reused in multiple places.

Breaking the project down into modules is called “Modular programming." As discussed in previous articles, functions, modules, and packages are all mechanisms used in Modular Programming. Modular programming provides many advantages.

  1. It simplifies your work by allowing you to focus on one module at a time.
  2. It makes your project more maintainable. If a team is working together on a project, adopting a modular approach reduces the likelihood your work will end up in version conflicts.
  3. It makes your code more reusable. If your project is one large monolith, anybody looking to reuse it has to parse through a lot of code. If your code is organized in modules, importing just the parts that are needed becomes easier.
  4. It reduces duplication. As per the point above, modular code is more reusable, meaning it is less likely you will end up duplicating functions.
  5. It helps avoid namespace conflicts as each module can define a separate namespace.

9 Best practices for structuring Python projects

When you look for advice online about how to organize and write your Python code, you will find a lot of different ideas. But really, it comes down to a few basic things.

1. Organize your code

Properly organizing your code is crucial when working on a Python project. You can start by creating separate folders for different parts of the project, such as one for the code itself, one for data, one for testing, and one for project documentation. This way to structure will help you find what you need more quickly and make it easier for others to navigate your code.

2. Use consistent naming

It's important to use consistent names for files and folders throughout your project. Try to follow conventions like using underscores for variables and functions and capital letters for classes. This will make it easier to read and understand your code.

3. Use version control

Even when working alone, using a tool like Git to keep track of changes to your code is a recommended best practice. It allows you to keep a record of your changes and easily back up your work to a cloud-based repository. Most cloud-based Git solutions offer a free tier for solo practitioners.

4. Use a package manager

Using a package manager like pip to manage dependencies will help you install and keep track of all the different pieces of software your project needs to run. This is especially important when working with large projects with many dependencies.

5. Create virtual environments

To keep your Python project isolated from other projects on your computer, you can use a virtual environment. A virtual environment will prevent conflicts between packages used in different projects.

6. Comment your code

Adding comments to explain what your code is doing and how to use it is important for making your code more accessible to others. It also helps you remember what you were thinking when you wrote the code.

7. Test, test, test

Using automated tests to check that your code works as expected is essential for catching bugs early on. This will save you time and prevent issues down the line.

8. Lint and Style

Using a tool like Ruff or Flake8 to make sure your code looks consistent and catch common mistakes will help you write better code. These tools check your code for consistency with PEP 8, the official Python style guide. You can also use a tool like Black to ensure that your code looks the same across your project. This will help you maintain consistency and make it easier to read and understand your code.

Using a tool like setuptools to package and distribute your Python code will make it easier to share your work with others. This will also help ensure that others can use your code without encountering any issues.

Demo: A simple data engineering project

Let’s apply these tips to a hands-on example. We’ll use several best practices, including organized code, consistent naming, helpful comments, and tests.

We’ll write a script to extract data from a database, transform it, and load it into another database. Let’s first define variables and functions in a module named my_module.py:

### ConstantsSOURCE_DB = "source_db"DESTINATION_DB = "destination_db"TABLE_NAME = "table_name"DATE_COLUMN = "date_column"SALES_COLUMN = "sales_column"### Functionsdef extract_data(table_name): # Code to extract data from the source databasedef transform_data(data_frame): # Code to transform the datadef load_data(data_frame): # Code to load the data into the destination database

In this example, we use consistent names for the variables and functions. Constants like SOURCE_DB, DESTINATION_DB, TABLE_NAME, DATE_COLUMN, and SALES_COLUMN are all capitalized and separated by underscores. Functions like extract_data, transform_data, and load_data are all lowercase and separated by underscores.

Next, let’s write a simple unit test in a test.py file to make sure our code works the way we want it to:

import unittestimport pandas as pdfrom my_module import transform_dataclass TestDataTransformation(unittest.TestCase): def test_transform_data(self): # Create a sample input DataFrame input_data = pd.DataFrame({ "date_column": ["2022-01-01", "2022-02-01", "2022-03-01"], "sales_column": [100, 200, 300] }) # Call the transform_data function output_data = transform_data(input_data) # Check the output DataFrame expected_output = pd.DataFrame({ "year": [2022, 2022, 2022], "month": [1, 2, 3], "sales": [100, 200, 300] }) pd.testing.assert_frame_equal(output_data, expected_output)

Here, we imported the unittest and pandas module, as well as the transform_data function from my_module.py.

Then, we defined a test class called TestDataTransformation that inherits from unittest.TestCase. In this class, we define a single test function called test_transform_data. This function creates a sample input DataFrame with three rows of data, calls the transform_data function with this input data, and then checks the output DataFrame to make sure it has the expected values.

We define the expected output DataFrame with the same three rows of data and use the pd.testing.assert_frame_equal function to check that the output DataFrame matches the expected DataFrame. If the output DataFrame matches the expected DataFrame, the test will pass. If not, the test will fail and provide information on where the mismatch occurred.

Structuring projects for easy collaboration

When you code with other developers, the most important things to consider are:

  • making sure you have the correct version of the code and
  • the correct software dependencies.

This is where version control comes in handy. Suppose a team of developers is working on a Python project that involves developing a web application. Each developer is working on a specific feature or module of the project. Without version control, the team would have to coordinate their work manually, which can be daunting and error-prone. One developer may accidentally overwrite another's work, leading to lost progress or conflicts.

However, by using a version control system like Git, the team can easily track changes, collaborate on code, and ensure that everyone is working with the same version of the codebase. With Git, each developer can work on their own branch, which allows them to make changes independently without affecting the main codebase.

When it's time to integrate everyone's changes, Git provides tools for merging code and resolving conflicts, making it much easier for the team to work together. Git also provides a complete history of the codebase, including all changes made by each developer, which can be useful for debugging and understanding how the code has evolved over time.

To ensure that changes to shared code don't break other projects, it's a good idea to use automated testing tools like pytest. It's also a good practice to review code changes before merging them into the main code base.

Finally, when it comes to ‘easy collaboration’ and as mentioned in our list of best practices, it's important to document your code as you write it. This makes it easier for new team members to understand the code and get up to speed.

Recommended Python Project Structure: folder structure and key files

You might have wondered why Python projects have the structure my-project/my_project. This way of organizing a project is popular in Python because it helps keep everything neat and tidy. The top folder, called "my-project," is like the main folder for the entire project. The use of the dash and the underscore help differentiate between the two levels of the project. The underscore, in particular, helps distinguish the project reference from a variable or function name that might use the dash symbol. Because a dash is also a minus sign, the "inner level", which defines a python package, must use the underscore.

There are loose norms for how to organize files inside "my-project". The top level typically contains a README.md file and various configuration files. The most important part of your project is the one or more python packages it contains. A python package is a directory constituting a valid target for Python's import system, typically containing an __init__.py file. In this example, there is one Python package named "my_project", which is placed in the top level.

Best Practices in Structuring Python Projects | Dagster Blog (1)

Best Practices in Structuring Python Projects | Dagster Blog (2)

From here, the key files and subfolders found in most projects are:

  • A dependency management file (typically setup.py or pyproject.toml) This file is used to configure your project and its dependencies. We explored dependecy management in part two of this blog series.
  • README.md: This file, written as markdown, provides a brief introduction to your project, its purpose, and any installation or usage instructions.
  • LICENSE: This plain text file specifies the license under which your code is released. There are many open-source licenses to choose from, so be sure to select one that fits your needs.

In addition, many projects will have the following two folders:

  • src/: This directory is an alternative location for python packages if you don't want to place them directly in the project root.
  • tests/: This directory contains test code that verifies the functionality of your project. You can use a testing framework like pytest or unittest to write and run tests.

Up next…

We’ve shared how organizing Python projects in a structured manner is crucial as the project's complexity grows, especially if you are part of a team or plan on sharing your work with other developers. By following the recommended python project structure outlined in this article, you can maintain and scale your codebases or data pipelines more easily.

In part 4 of our guide, From Python Projects to Dagster Pipelines, we explore setting up a Dagster project, and the key concept of Data Assets.

Best Practices in Structuring Python Projects | Dagster Blog (3)

Best Practices in Structuring Python Projects | Dagster Blog (4)

We're always happy to hear your feedback, so please reach out to us! If you have any questions, ask them in the Dagster community Slack (join here!) or start a Github discussion. If you run into any bugs, let us know with a Github issue. And if you're interested in working with us, check out our open roles!

Follow us:

Best Practices in Structuring Python Projects | Dagster Blog (5)

Best Practices in Structuring Python Projects | Dagster Blog (6)

Sign up for our newsletter

Best Practices in Structuring Python Projects | Dagster Blog (7)

Best Practices in Structuring Python Projects | Dagster Blog (8)

Subscribe to the RSS feed

Best Practices in Structuring Python Projects | Dagster Blog (2024)
Top Articles
UK Savings Statistics 2024 - Saving Facts and Stats Report | money.co.uk
Influences on finance decisions - Sources of finance - Higher Business management Revision - BBC Bitesize
The Blackening Showtimes Near Century Aurora And Xd
It may surround a charged particle Crossword Clue
Gomoviesmalayalam
Gamevault Agent
King Fields Mortuary
Lycoming County Docket Sheets
Evita Role Wsj Crossword Clue
United Dual Complete Providers
Does Publix Have Sephora Gift Cards
Knaben Pirate Download
Wunderground Huntington Beach
Enderal:Ausrüstung – Sureai
Best Food Near Detroit Airport
Peraton Sso
Epro Warrant Search
Teacup Yorkie For Sale Up To $400 In South Carolina
Samantha Aufderheide
Espn Horse Racing Results
Sea To Dallas Google Flights
The BEST Soft and Chewy Sugar Cookie Recipe
Sister Souljah Net Worth
Directions To Nearest T Mobile Store
FREE Houses! All You Have to Do Is Move Them. - CIRCA Old Houses
A Plus Nails Stewartville Mn
James Ingram | Biography, Songs, Hits, & Cause of Death
Jeep Cherokee For Sale By Owner Craigslist
Que Si Que Si Que No Que No Lyrics
Craigslist Free Stuff San Gabriel Valley
Quality Tire Denver City Texas
Whas Golf Card
Jr Miss Naturist Pageant
11 Pm Pst
Restored Republic December 9 2022
How much does Painttool SAI costs?
Lovein Funeral Obits
062203010
2132815089
Divinity: Original Sin II - How to Use the Conjurer Class
Makes A Successful Catch Maybe Crossword Clue
Wolf Of Wallstreet 123 Movies
Cult Collectibles - True Crime, Cults, and Murderabilia
Jane Powell, MGM musical star of 'Seven Brides for Seven Brothers,' 'Royal Wedding,' dead at 92
A jovem que batizou lei após ser sequestrada por 'amigo virtual'
Bismarck Mandan Mugshots
10 Best Tips To Implement Successful App Store Optimization in 2024
60 Second Burger Run Unblocked
BYU Football: Instant Observations From Blowout Win At Wyoming
라이키 유출
Ff14 Palebloom Kudzu Cloth
Cbs Scores Mlb
Latest Posts
Article information

Author: Rev. Porsche Oberbrunner

Last Updated:

Views: 5832

Rating: 4.2 / 5 (53 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Rev. Porsche Oberbrunner

Birthday: 1994-06-25

Address: Suite 153 582 Lubowitz Walks, Port Alfredoborough, IN 72879-2838

Phone: +128413562823324

Job: IT Strategist

Hobby: Video gaming, Basketball, Web surfing, Book restoration, Jogging, Shooting, Fishing

Introduction: My name is Rev. Porsche Oberbrunner, I am a zany, graceful, talented, witty, determined, shiny, enchanting person who loves writing and wants to share my knowledge and understanding with you.