Databricks Unity File System

This workflow demonstrates KNIME's capability to connect with Databricks Unity Volumes, part of the Unity Catalog framework. It enables users to read and write files from and to Databricks Unity Volumes.

The use case presented here involves writing once-a-month Excel files containing daily weather information for different locations as Parquet files into the Databricks Unity Volume. Then, the data is read, and a simple linear regression model is applied to Spark.

For more information about Databricks Unity Catalog and Databricks Unity Volumes, please refer to the "External resources" links.

You can download the workflow and run it on your local machine using the latest version of the KNIME Analytics Platform. For optimal performance, it is recommended that you use the latest version of KNIME AP.

Workflow Requirements

To run the workflow locally, you will need:

A Databricks account
An existing Databricks cluster

Workflow Details

Connecting to Databricks Unity Volume
- First, we connect to the Databricks Unity Volume, where we want to read and write files via the Databricks Unity File System Connector.
Writing Data to Unity Volume
- The use case involves taking thirty generated Excel files with synthetic weather information from 1000 locations and writing them into the Databricks Unity Volume as Parquet files.
Creating a Spark Context
- We create a Spark context using the Create Databricks Environment node and read the previously generated Parquet files with the Parquet to Spark node, creating a DataFrame in Spark.
Data Manipulation and Modeling
- We manipulate the data in the Spark context using the KNIME Extension for Apache Spark nodes. This operation includes filtering missing values, splitting and normalizing the data frame, and applying a linear regression model.
Model Evaluation
- Finally, we use the Spark Numeric Score node to visualize the linear regression performance and capacity to predict rainfall based on the selected features and shut down the Spark context.

External resources

Loading deploymentsLoading ad hoc jobs

Legal

By using or downloading the workflow, you agree to our terms and conditions.

External resources

KNIME Base nodes

KNIME Databricks Integration

KNIME Data Generation

KNIME Excel Support

KNIME Extension for Apache Spark

KNIME Extension for Big Data File Formats

KNIME Javasnippet

Legal