Using Mount Points in Databricks: A Practical Guide for Data Engineers (2024)

Using Mount Points in Databricks: A Practical Guide for Data Engineers (2)

1.What are Mount Points in Databricks?

2.How Do Mount Points Work?

3.How Can I Mount a Cloud Object Storage on DBFS?

4.How Do I Access My Data Stored In a Cloud Object Storage Using Mount Points?

5.Why and When Do You Need Mount Points?

6.When Should You Use Unity Catalog Instead of Mount Points?

7.Best Practices for Using Mount Points

Mount points in Databricks serve as a bridge, linking your Databricks File System (DBFS) to cloud object storage, such as Azure Data Lake Storage Gen2 (ADLS Gen2), Amazon S3, or Google Cloud Storage. This setup allows you to interact with your cloud storage using local file paths, as if the data were stored directly on DBFS.

Mounting creates a linkage between a Databricks workspace and your cloud object storage.

A mount point encapsulates:

  • The location of the cloud object storage.
  • Driver specifications for connecting to the storage account or container.
  • Security credentials for data access.

You can list your existing mount points using the below dbutils command:

# Also shows the databricks built in mount points (e.g., volume, databricks-datasets)
# Just ignore them
dbutils.fs.mounts()

Or directly using the Databricks Workspace UI, in the Catalog Explorer you can click Browse DBFS:

Using Mount Points in Databricks: A Practical Guide for Data Engineers (3)

And in the opened tab, simply click the “mnt”. It will ask you choose a cluster. Choose/start your cluster. Finally, you can see all your mount points (if there is any).

For Azure environments, mounting ADLS Gen2 using Azure Active Directory (AAD) or with the new name Microsoft Entra ID OAuth is a common practice. Here’s how you can do this:

configs = {
"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<application-id>",
"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>", key="<service-credential-key-name>"),
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"
}
dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
mount_point = "/mnt/<mount-name>",
extra_configs = configs
)

When configuring your mount, it’s important to understand the configs dictionary and your Azure AD setup. Specifically, the fs.azure.account.oauth2.client.id should be set to your Service Principal (SP) ID, which acts as a unique identifier for your application in Azure AD. Similarly, the fs.azure.account.oauth2.client.secret parameter requires the secret associated with your SP. These credentials enable secure authentication and authorization, ensuring that only authorized entities can access your cloud object storage. Additionally, ensure you have assigned the appropriate roles and necessary permissions to your Service Principal in the Storage Account. You can learn more about this process: https://learn.microsoft.com/en-us/azure/databricks/connect/storage/aad-storage-service-principal.

Remember, the configuration mentioned above, is specific to Azure ADLS Gen2 Storage Account. Adjustments are necessary for other cloud providers.

To unmount simply:

dbutils.fs.unmount(mount_point="/mnt/<mount-name>")

Once mounted, accessing your data (e.g., Delta Table) is as straightforward as referencing the mount point in your data operations:

# Using spark, read delta table by the path
df = spark.read.load("/mnt/my_mount_point/my_data")

# Using spark, write back to the mount point
df.write.format("delta").mode("overwrite").save("/mnt/my_mount_point/delta_table")

Using mount points was the general practice for accessing cloud object storage before the unity catalog was introduced.

  • You want to access your cloud object storage as if it is on DBFS
  • Unity Catalog is not activated in your workspace
  • Your cluster runs on a Databricks runtime (DBR) version older than 11.3 LTS
  • You have no access to a premium workspace plan (i.e., Standard plan)
  • If you want to avoid mount points and still can not use Unity Catalog (UC), you can set your Service Principal (SP) credentials in the spark configuration and access the ADLS Gen2 containers as well.
  • The above conditions don’t apply to you.
  • You can use cluster with a later DBR version (>= 11.3 LTS) and have access to premium plan
  • Mounted data doesn’t work with Unity Catalog.
    - However, you can still see your tables and their referenced mount point paths in the old hive_metastore catalog if you migrated to UC.
  • When doing mounting operations, manage your secrets using secret scopes and never expose raw secrets
  • Keep your mount points up-to-date
    - In case a source doesn’t exist anymore in the storage account, remove the mount points from Databricks as well
  • Using the same mount point name as your container name can make things easier if you have many mount points. Especially, if you come back to your workspace after some time, you can easily match them with the Azure Storage Explorer.
  • Don’t put non-mount point folders and other files in the /mnt/ directory. They will confuse you.
  • If your SP credentials get updated, you might have to remount your all mount points again:
    - You can loop through the mount points if all the mount points are still pointing to existing sources.
    - Otherwise, you will get AAD exceptions and have to manually try unmounting and mounting each mount point.
  • If you can, use Unity Catalog (UC) instead of mount points for better data governance, centralized metadata management, fine-grained security controls and a unified data catalog across different Databricks workspaces.

REFERENCES

Using Mount Points in Databricks: A Practical Guide for Data Engineers (2024)
Top Articles
Why Are My Airbnb Bookings Down...What Now?
Stop Airbnb in Santa Cruz. Anti Airbnb Information. Unfairbnb.
Frases para un bendecido domingo: llena tu día con palabras de gratitud y esperanza - Blogfrases
7 Verification of Employment Letter Templates - HR University
Chicago Neighborhoods: Lincoln Square & Ravenswood - Chicago Moms
Tyson Employee Paperless
Comforting Nectar Bee Swarm
Overnight Cleaner Jobs
Tx Rrc Drilling Permit Query
Sunday World Northern Ireland
Best Cav Commanders Rok
Craigslistdaytona
Driving Directions To Atlanta
Where does insurance expense go in accounting?
Binghamton Ny Cars Craigslist
Belle Delphine Boobs
Lima Funeral Home Bristol Ri Obituaries
Craigslist Farm And Garden Tallahassee Florida
Buy PoE 2 Chaos Orbs - Cheap Orbs For Sale | Epiccarry
Hermitcraft Texture Pack
The Largest Banks - ​​How to Transfer Money With Only Card Number and CVV (2024)
Tu Pulga Online Utah
European city that's best to visit from the UK by train has amazing beer
Everything To Know About N Scale Model Trains - My Hobby Models
Klsports Complex Belmont Photos
WPoS's Content - Page 34
101 Lewman Way Jeffersonville In
Town South Swim Club
Tu Housing Portal
Craigslist Sf Garage Sales
Sam's Club Gas Price Hilliard
Grandstand 13 Fenway
Wildfangs Springfield
Foolproof Module 6 Test Answers
Ishow Speed Dick Leak
Dr Adj Redist Cadv Prin Amex Charge
Sunrise Garden Beach Resort - Select Hurghada günstig buchen | billareisen.at
Urban Blight Crossword Clue
Wait List Texas Roadhouse
Jack In The Box Menu 2022
Nid Lcms
The Wait Odotus 2021 Watch Online Free
Payrollservers.us Webclock
Walgreens On Secor And Alexis
Nimbleaf Evolution
Adams-Buggs Funeral Services Obituaries
Dolce Luna Italian Restaurant & Pizzeria
Tìm x , y , z :a, \(\frac{x+z+1}{x}=\frac{z+x+2}{y}=\frac{x+y-3}{z}=\)\(\frac{1}{x+y+z}\)b, 10x = 6y và \(2x^2\)\(-\) \(...
Gear Bicycle Sales Butler Pa
Game Akin To Bingo Nyt
Ihop Deliver
Latest Posts
Article information

Author: Greg Kuvalis

Last Updated:

Views: 5755

Rating: 4.4 / 5 (55 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Greg Kuvalis

Birthday: 1996-12-20

Address: 53157 Trantow Inlet, Townemouth, FL 92564-0267

Phone: +68218650356656

Job: IT Representative

Hobby: Knitting, Amateur radio, Skiing, Running, Mountain biking, Slacklining, Electronics

Introduction: My name is Greg Kuvalis, I am a witty, spotless, beautiful, charming, delightful, thankful, beautiful person who loves writing and wants to share my knowledge and understanding with you.