The Databricks Workspace Connector node allows to connect to a Databricks workspace. Learn more
Collection
KNIME for Databricks Users
Use an intuitive, low-code/no-code interface to integrate and manipulate Databricks data, combine it with other sources, build visual workflows of any complexity level from data manipulation to machine learning models, and share insights with others in the organization through data apps without leaving Databricks. You can accomplish various tasks with a no-code KNIME approach, but if needed, you will have the flexibility to script custom Spark jobs that execute directly on Databricks clusters.
KNIME Databricks Extension Guide
This guide describes working with Databricks within the KNIME Analytics Platform.
Like
Getting Started with Databricks Workflows
Utilizing Databricks within KNIME requires connecting to a Databricks workspace using the Create Databricks Environment node. If you only want to work with the file system, you can also use the Databricks Files System Connector. Once connected, you can use all KNIME DB, File, and Spark nodes for visual query and analytics generation.
Go to item
Workflow
Working with Databricks
Enables connection to Databricks Cluster and utilizes DB, file handling, and Spark nodes for data processing.
Examples10_Big_Data01_Big_Data_Connectors03_DatabricksExample
1
Go to item
Workflow
Connecting to Databricks
Interacts with Databricks cluster using KNIME nodes.
Examples10_Big_Data01_Big_Data_Connectors06_Connecting_to_Databricks
2
Data Manipulation with KNIME and Databricks
KNIME makes it easy for you to get your data into the right form within Databricks. Using the KNIME DB extension, you can visually create database queries to push execution to the Databricks Warehouse or cluster. You can also use the KNIME Spark extension to visually create Spark instructions that get executed within the Databricks cluster. Combining these two possibilities with the local execution power of KNIME allows you to design your workflow based on where you would like the execution of different workflow pieces to take place.
File Handling with KNIME and Databricks
KNIME allows you to seamlessly work with your files within the Databricks environment. The KNIME File Handling framework allows you to read and write various different file formats as well as manage your files.
Go to item
Workflow
Working with Utility Nodes
Executes a workflow that downloads, extracts, reads, deletes extracted file, and moves remote file to another folder.
Examples01_Data_Access01_Common_Type_Files11_Working_with_Utility_Nodes
2
Go to item
Workflow
Incremental Data Processing with Parquet
Generates separate Parquet files for incremental data processing of NYC taxi dataset.
Examples01_Data_Access01_Common_Type_Files12_Incremental_processing_Parquet_file
2
Machine Learning with KNIME and Databricks
With the Databricks integration, you can create and infer models within your Databricks environment without writing any line of code.
Go to item
Workflow
Taxi Demand Prediction on Spark Training
Generates a time series prediction model for taxi demand using Spark and Random Forest.
Codeless Time Series Analysis with KNIMEChapter 1201 Taxi Demand Prediction on Spark Training
0
Go to item
Workflow
Seasonality Removal
Removes seasonality from NYC taxi dataset for time series prediction.
Examples10_Big_Data02_Spark_Executor11_Taxi_Demand_PredictionSeasonality_Removal
1
Scripting with KNIME and Databricks
Explore the integration of KNIME Analytics Platform with Databricks for scripting and execution on Databricks cluster.
Data Apps powered by KNIME and Databricks
With KNIME Business Hub, a user has the ability to create and deploy workflows across the business that are powered by Databricks and KNIME. You can schedule the workflow to run at specific times, deploy it as a REST endpoint, or create a visual data app. Below is an example of a workflow that uses Databricks data to create a taxi demand prediction dashboard. Take a look at the KNIME Data Apps Beginners Guide in the additional material section.
Go to item
Workflow
Taxi Demand Prediction Deployment
Generates time series prediction model for taxi demand based on NYC dataset using Random Forest on Spark cluster.
Examples10_Big_Data02_Spark_Executor11_Taxi_Demand_PredictionDeployment_workflow
2