Hub
Pricing About
Collection

KNIME for Databricks Users

DatabricksSparkDBDatabaseETL
+1

Use an intuitive, low-code/no-code interface to integrate and manipulate Databricks data, combine it with other sources, build visual workflows of any complexity level from data manipulation to machine learning models, and share insights with others in the organization through data apps without leaving Databricks. You can accomplish various tasks with a no-code KNIME approach, but if needed, you will have the flexibility to script custom Spark jobs that execute directly on Databricks clusters.

KNIME Databricks Integration Guide
This guide describes working with Databricks within the KNIME Analytics Platform.
Feb 25, 2025 4:26 PM
Like

Getting Started with Databricks Workflows

Utilizing Databricks within KNIME requires connecting to a Databricks workspace using the Databricks Workspace Connector node. Once connected, you can utilize to the different Databricks services using the dedicated connector nodes such as the Databricks SQL Warehouse Connector node or the Databricks Unity File System Connector as well as the Create Databricks Environment node. The connector nodes are supported by the KNIME DB, File and Spark nodes for visual query and analytics generation.
Node / Source
Databricks Workspace Connector

The Databricks Workspace Connector node allows to connect to a Databricks workspace. Learn more

Node / Source
Databricks SQL Warehouse Connector

Connects to a Databricks SQL Warehouse, that can be connected to the KNIME database nodes. Learn more

Node / Source
Create Databricks Environment

Creates a Databricks Environment connected to an existing Databricks cluster for data processing. Learn more

Node / Source
Databricks Unity File System Connector

Connects to the Unity Volumes of a Databricks workspace, in order to read/write files in downstreamnodes. Learn more

Node / Source
DB Table Selector

Retrieves data from a specified database table or view. Learn more

Node / Source
List Files/Folders

Displays files and folders within a selected folder across various file systems. Learn more

Node / Source
Parquet to Spark

Converts Parquet files to Spark DataFrames/RDDs. Learn more

Node / Source
Databricks File System Connector

Connects to Databricks File System for file system operations. Learn more

Go to item
Workflow
Getting Started with Databricks Workflows
DBDatabaseELT
+3
This workflow is an example of how to get started with using Databricks from within KNIME Analytics Platform.
Examples10_Big_DataDatabricksGetting Started with Databricks Workflows
0
knime
Go to item
Workflow
Databricks Integration Overview
DatabricksSparkAi
+4
This workflow demonstrates how you can visually interact with the different Databricks services integrated into the KNIME Analyti…
Examples10_Big_DataDatabricksDatabricks Integration Overview
0
knime

Data Manipulation with KNIME and Databricks

KNIME makes it easy for you to get your data into the right form within Databricks. Using the KNIME DB extension, you can visually create database queries to push execution to the Databricks Warehouse or cluster. You can also use the KNIME Spark extension to visually create Spark instructions that get executed within the Databricks cluster. Combining these two possibilities with the local execution power of KNIME allows you to design your workflow based on where you would like the execution of different workflow pieces to take place.
Node / Source
Databricks SQL Warehouse Connector

Connects to a Databricks SQL Warehouse, that can be connected to the KNIME database nodes. Learn more

Node / Source
Create Databricks Environment

Creates a Databricks Environment connected to an existing Databricks cluster for data processing. Learn more

Node / Sink
DB Loader

The item loads large amounts of data into an existing database table using database specific bulk loading functionality. Learn more

Node / Manipulator
DB GroupBy

Groups rows by selected columns and aggregates data based on specified criteria. Learn more

Node / Manipulator
DB Numeric-Binner

Bins numeric values into defined intervals with unique names and consistent borders, replacing or appending a binned, string-type column. Learn more

Node / Manipulator
DB Joiner

This node joins two DB Data tables. The join is based on the joining columns of both tables. Learn more

Node / Other
DB to Spark

Reads a database query/table into a Spark RDD/DataFrame. Learn more

Node / Manipulator
Spark GroupBy

Groups rows by selected columns for aggregation based on defined criteria. Learn more

Node / Manipulator
Spark Normalizer

This node normalizes the values of all selected (numeric) columns. Learn more

Node / Manipulator
Spark Joiner

Joins two Spark DataFrame/RDDs based on specified columns. Learn more

Node / Manipulator
GroupBy

Groups rows of a table by unique values in selected columns and aggregates remaining columns based on specified settings. Learn more

Node / Manipulator
Auto-Binner

The item groups numeric data into intervals called bins using predefined naming options and methods. Learn more

Node / Manipulator
Joiner

The item combines two tables based on selected columns, similar to a database join operation. Learn more

Go to item
Workflow
Databricks SQL Warehouse
DatabricksDBWarehouse
+1
This workflow demonstrates how to connect and work with a (serverless) Databricks Warehouse. Using the powerful KNIME database fr…
Examples10_Big_DataDatabricksDatabricks SQL Warehouse
0
knime
Go to item
Workflow
Spark Preprocessing Example
Big DataEducation
Preprocesses data for Spark analysis in a Big Data course exercise solution.
EducationCourses°Archive°L4-BD Introduction to Big Data with KNIME Analytics Platform3_Spark3_Solutions02_Spark_Preprocessing
0
knime

KNIME and Databricks for AI

Harness the power of Databricks AI Models within a KNIME workflow.
Node / Source
Databricks Chat Model Connector

This node connects to a chat model served by the Databricks workspace that is provided as an input. Learn more

Node / Source
Databricks Embedding Connector

This node connects to an embedding model served by the Databricks workspace that is provided as an input. Learn more

Node / Predictor
Chat Model Prompter

This node prompts a chat model using the provided user message, with an existing conversation history as context. Learn more

Node / Source
FAISS Vector Store Creator

The node generates a FAISS vector store that uses the given embeddings model to map documents to a numerical vector that captures the semantic meaning of the document. Learn more

Node / Source
Vector Store Retriever

This node specializes in retrieving embeddings from a vector store based on their relevance to user queries. Learn more

Node / Other
Giskard LLM Scanner

This node provides an open-source framework for detecting potential vulnerabilites in the GenAI model contained in the provided workflow. It evaluates the workflow by combining heuristics-based and L… Learn more

Go to item
Workflow
Databricks AI
DatabricksLLMVector Store
+1
This workflow demonstrates how you can visually interact with the model serving endpoints of Databricks, The model serving endpoi…
Examples10_Big_DataDatabricksDatabricks AI
0
knime

File Handling with KNIME and Databricks

KNIME allows you to seamlessly work with your files within the Databricks environment. The KNIME File Handling framework allows you to read and write various different file formats as well as manage your files.
Node / Source
Databricks Unity File System Connector

Connects to the Unity Volumes of a Databricks workspace, in order to read/write files in downstreamnodes. Learn more

Node / Source
Databricks File System Connector

Connects to Databricks File System for file system operations. Learn more

Node / Source
Excel Reader

Reads Excel files, converts data to KNIME types, and outputs table with auto-guessed structure and types. Learn more

Node / Sink
Excel Writer

Writes input data table to Excel file in .xls or .xlsx format, creating new files or appending data to existing files. Learn more

Node / Source
CSV Reader

Reads CSV files and can auto-guess the file structure, but offers more options for complex files compared to the File Reader node. Learn more

Node / Sink
CSV Writer

Writes input data table to a file or remote location in CSV format. Learn more

Node / Source
CSV to Spark

Converts CSV files into Spark DataFrames/RDDs. Learn more

Node / Sink
Spark to CSV

Writes Spark data to CSV files. Learn more

Node / Source
Parquet Reader

Reads Parquet files with primitive and repeated groups, but not complex nested structures, from a specified directory or single file. Learn more

Node / Sink
Parquet Writer

Writes KNIME data table into Parquet file, potentially splitting data into multiple files in specified folder. Learn more

Node / Source
Parquet to Spark

Converts Parquet files to Spark DataFrames/RDDs. Learn more

Node / Sink
Spark to Parquet

Converts Spark data to Parquet format for storage. Learn more

Go to item
Workflow
Databricks File Handling
DatabricksFile handlingDelete
+1
This workflow demonstrates how to work with the different file systems provided by Databricks. Using the powerful KNIME file hand…
Examples10_Big_DataDatabricksDatabricks File Handling
0
knime
Go to item
Workflow
Working with Utility Nodes
File handlingZipUnzip
+4
Executes a workflow that downloads, extracts, reads, deletes extracted file, and moves remote file to another folder.
Examples01_Data_Access01_Common_Type_Files11_Working_with_Utility_Nodes
2
knime
Go to item
Workflow
Incremental Data Processing with Parquet
ParquetIncremental loadingNYC taxi dataset
+3
Generates separate Parquet files for incremental data processing of NYC taxi dataset.
Examples01_Data_Access01_Common_Type_Files12_Incremental_processing_Parquet_file
2
knime

Machine Learning with KNIME and Databricks

With the Databricks integration, you can create and infer models within your Databricks environment without writing any line of code.
Node / Source
Create Databricks Environment

Creates a Databricks Environment connected to an existing Databricks cluster for data processing. Learn more

Node / Learner
Spark Association Rule Learner

Generates association rules from input data using Spark MLlib. Learn more

Node / Learner
Spark Random Forest Learner

Trains a random forest model for classification using Spark ML. Learn more

Node / Learner
Spark k-Means

Performs K-means clustering using Apache Spark, outputting cluster centers for a fixed number of clusters. Learn more

Node / Learner
Spark Linear Regression Learner

Trains a linear regression model in Spark with different regularization options. Learn more

Node / Learner
Spark Decision Tree Learner

Trains a Decision Tree classification model using Spark's MLlib, performing binary partitioning to maximize information gain for target prediction. Learn more

Node / Learner
Spark Gradient Boosted Trees Learner

Trains a Gradient Boosted Trees model for binary classification using Spark's implementation. Learn more

Node / Other
Spark Scorer

Compares two columns by their attribute value pairs and shows the confusion matrix. Learn more

Node / Other
Spark Entropy Scorer

Calculates entropy values and quality scores for clustering results compared to a reference clustering. Learn more

Node / Predictor
Spark Cluster Assigner

Assigns new data to existing prototypes based on distance. Learn more

Node / Predictor
Spark Predictor (Classification)

Classifies input data using a Spark ML classification model. Learn more

Go to item
Workflow
Databricks Machine Learning
DatabricksClassificationDecision tree
+3
This workflow demonstrates how to work with an all-purpose compute cluster on Databricks. The workflow uploads some test data to …
Examples10_Big_DataDatabricksDatabricks Machine Learning
0
knime
Go to item
Workflow
Taxi Demand Prediction on Spark Training
Demand predictionRandom forestTime series prediction
+4
Generates a time series prediction model for taxi demand using Spark and Random Forest.
Codeless Time Series Analysis with KNIMEChapter 1201 Taxi Demand Prediction on Spark Training
0
knime
Go to item
Workflow
Seasonality Removal
SparkTime Series
Removes seasonality from NYC taxi dataset for time series prediction.
Examples10_Big_Data02_Spark_Executor11_Taxi_Demand_PredictionSeasonality_Removal
1
knime

Scripting with KNIME and Databricks

Explore the integration of KNIME Analytics Platform with Databricks for scripting and execution on Databricks cluster.
Node / Source
PySpark Script Source

Executes Python code on Spark to generate output in resultDataFrame1. Learn more

Node / Manipulator
PySpark Script (1 to 1)

Executes Python code on Spark to generate output in resultDataFrame1. Learn more

Node / Other
Spark SQL Query

Executes Spark SQL query statements using Apache Spark. Learn more

Node / Manipulator
DB Query

Modifies incoming SQL query to create a sub-query for defining new DB Data table at out-port. Learn more

Node / Source
Parameterized DB Query Reader

Restricts SQL queries to match input table values. Learn more

Node / Source
DB Query Reader

Executes an entered SQL query and returns the result as KNIME data table. Learn more

Node / Manipulator
Spark DataFrame Java Snippet

Executes Java code to manipulate or create Spark DataFrames, supporting flow variables and external libraries. Learn more

Data Apps powered by KNIME and Databricks

With KNIME Business Hub, a user has the ability to create and deploy workflows across the business that are powered by Databricks and KNIME. You can schedule the workflow to run at specific times, deploy it as a REST endpoint, or create a visual data app. Below is an example of a workflow that uses Databricks data to create a taxi demand prediction dashboard. Take a look at the KNIME Data Apps Beginners Guide in the additional material section.
Go to item
Workflow
Taxi Demand Prediction Deployment
Demand predictionRandom forestTime series prediction
+5
Generates time series prediction model for taxi demand based on NYC dataset using Random Forest on Spark cluster.
Examples10_Big_Data02_Spark_Executor11_Taxi_Demand_PredictionDeployment_workflow
2
knime

Additional Material

Docs: KNIME Databricks Integration User Guide
Blog: Interactive Big Data Exploration and Finding a Cab in NYC
Docs: KNIME Data Apps Beginners Guide

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits