Azure Databricks Level-400 Workshop - Agenda

Targeted Audience and Scenarios

Azure-Databricks Level-400 Workshop is aimed to upskill various audiences

  • Data Engineers
  • Data Scientists
  • SQL Developers
  • Developers
  • Solution Architects
  • Data Architects

This workshop content is useful in various scenarios,

  • POCs / AI Hacks - Developers can understand connecting to Blob Storage, Submitting the jobs, persisting/loading ML models etc. This can be very useful material to expedite the development at early POC stages
  • Self-learning through code samples
  • Best-Practices for Databricks Clusters (Interactive, Job, High-Concurrency)
  • Best practices for ADLA to Databricks Migration

Spark, Azure-Databricks overview

A brief introduction to Spark framework and the history of Big Data technologies. Why Spark framework have been widely adopted across the industries. An Overview on of Spark Modules including Spark Core (Map-Reduce), Datastructures, Streaming, SQL, GraphX. Databricks introduction and the Key differentiors of Databricks Spark in terms of Performance, Collabarative and Interactive features. Azure-Databricks benefits, deep integration of Azure-Databricks into Azure Data platform, Security, BI services.

Azure-Databricks Architecture

A detailed discussion around the Spark architecture followed by Azure Databricks Components.

Databricks Workspace, Developer Tools overview

An overview of Azure Databricks collaborative workspace and its components. Azure Databricks Developer tools discussion,

  • Databricks CLI
  • Filesystem utilities
  • Notebook workflow utilities
  • Widget, Secret, Library utilities

Azure Databricks CLI Lab

Azure Databricks - Developer Tools

Azure Databricks Lab for DBUtils such as Widgets, Notebooks, Library etc

Azure Databricks - DB Utils

Reading data from Azure Blob Storage in the databricks jobs

Azure Databricks - Azure Blob Storage

Reading data from Azure Data Lake Storage Gen2 in the databricks jobs

Azure Databricks - Azure Data Lake Storage Gen2

Read data from Azure Cosmos DB in the databricks jobs

Azure Databricks - Cosmos DB

Databricks Cluster Types and Best Practices

Azure-Databricks have various cluster types like Interactive Clusters, Job Clusters and High-Concurrency Clusters (formarly known as Serverless-pools). This section talks about selecting right cluster type depeding upon the scenario.

Submit databricks jobs using CLI and UI

Azure Databricks - Job Submission Lab 1

Create and submit Workflow Pipeline in Azure Data Factory V2 to Azure Databricks

Azure Databricks - Azure Datafactory V2 Job Pipline submission

Databricks Performance

In this section we will learn discuss about the performance improvements made by Azure Databricks.

Spark-SQL Overview

In this section we will discuss about ways to work with Structured data within Azure Databricks. We will learn the nuances of Managed, Un-managed tables and how to integrate external metastores like Hive.

Create a managed table and work with Spark SQL

Azure Databricks - Managed Tables

Machine Learning with Azure Databricks

An overview of Spark MLLib package and introduction to Statistical modeling also understand how to run Deep Learning models using Tensorflow on Azure Databricks.

Spark MLLib for Anomaly detection using Random Forests classification technique

Azure Databricks - Anomaly Detection

Implement batch predictions within Azure Databricks. You will also understand how to persist and load the model from Blob Storage within your Spark Jobs

Azure Databricks - Batch Predictions

Documentation

Azure Databricks Documentation

Link Description
Azure Databricks - Microsoft Azure Databricks Microsoft Documentation
Databricks Official Documentation Azure Databricks official documentation from Databricks
Azure Databricks Sample Labs Sample Labs in GitHub repository from Mahesh Balija

Leave a comment