Azure Databricks Level-400 Workshop - Agenda
Targeted Audience and Scenarios
Azure-Databricks Level-400 Workshop is aimed to upskill various audiences
- Data Engineers
- Data Scientists
- SQL Developers
- Developers
- Solution Architects
- Data Architects
This workshop content is useful in various scenarios,
- POCs / AI Hacks - Developers can understand connecting to Blob Storage, Submitting the jobs, persisting/loading ML models etc. This can be very useful material to expedite the development at early POC stages
- Self-learning through code samples
- Best-Practices for Databricks Clusters (Interactive, Job, High-Concurrency)
- Best practices for ADLA to Databricks Migration
Spark, Azure-Databricks overview
A brief introduction to Spark framework and the history of Big Data technologies. Why Spark framework have been widely adopted across the industries. An Overview on of Spark Modules including Spark Core (Map-Reduce), Datastructures, Streaming, SQL, GraphX. Databricks introduction and the Key differentiors of Databricks Spark in terms of Performance, Collabarative and Interactive features. Azure-Databricks benefits, deep integration of Azure-Databricks into Azure Data platform, Security, BI services.
Azure-Databricks Architecture
A detailed discussion around the Spark architecture followed by Azure Databricks Components.
Databricks Workspace, Developer Tools overview
An overview of Azure Databricks collaborative workspace and its components. Azure Databricks Developer tools discussion,
- Databricks CLI
- Filesystem utilities
- Notebook workflow utilities
- Widget, Secret, Library utilities
Azure Databricks CLI Lab
Azure Databricks - Developer Tools
Azure Databricks Lab for DBUtils such as Widgets, Notebooks, Library etc
Reading data from Azure Blob Storage in the databricks jobs
Azure Databricks - Azure Blob Storage
Reading data from Azure Data Lake Storage Gen2 in the databricks jobs
Azure Databricks - Azure Data Lake Storage Gen2
Read data from Azure Cosmos DB in the databricks jobs
Databricks Cluster Types and Best Practices
Azure-Databricks have various cluster types like Interactive Clusters, Job Clusters and High-Concurrency Clusters (formarly known as Serverless-pools). This section talks about selecting right cluster type depeding upon the scenario.
Submit databricks jobs using CLI and UI
Azure Databricks - Job Submission Lab 1
Create and submit Workflow Pipeline in Azure Data Factory V2 to Azure Databricks
Azure Databricks - Azure Datafactory V2 Job Pipline submission
Databricks Performance
In this section we will learn discuss about the performance improvements made by Azure Databricks.
Spark-SQL Overview
In this section we will discuss about ways to work with Structured data within Azure Databricks. We will learn the nuances of Managed, Un-managed tables and how to integrate external metastores like Hive.
Create a managed table and work with Spark SQL
Azure Databricks - Managed Tables
Machine Learning with Azure Databricks
An overview of Spark MLLib package and introduction to Statistical modeling also understand how to run Deep Learning models using Tensorflow on Azure Databricks.
Spark MLLib for Anomaly detection using Random Forests classification technique
Azure Databricks - Anomaly Detection
Implement batch predictions within Azure Databricks. You will also understand how to persist and load the model from Blob Storage within your Spark Jobs
Azure Databricks - Batch Predictions
Documentation
Azure Databricks Documentation
Link | Description |
Azure Databricks - Microsoft | Azure Databricks Microsoft Documentation |
Databricks Official Documentation | Azure Databricks official documentation from Databricks |
Azure Databricks Sample Labs | Sample Labs in GitHub repository from Mahesh Balija |
Leave a comment