Create Spark Context (Livy)

Source

Creates a new Spark context via Apache Livy.

This node requires access to a remote file system such as HDFS/webHDFs/httpFS or S3/Blob Store in order to exchange temporary files between KNIME and the Spark context (running on the cluster).

Note: Executing this node always creates a new Spark context. Resetting the node or closing the KNIME workflow will destroy the Spark context. Spark contexts created by this node cannot be shared between KNIME workflows.

Input Ports

  1. Type: Remote Connection
    A connection to a remote file system to exchange temporary files between KNIME and the Spark context (running on the cluster). Supported file systems are:
    • HDFS, webHDFS and httpFS. Note that here KNIME must access the remote file system with the same user as Spark, otherwise Spark context creation fails. When authenticating with Kerberos against both HDFS/webHDFs/httpFS and Livy, then usually the same user will be used. Otherwise, this must be ensured manually.
    • Amazon S3 and Azure Blob Store (recommended when using Spark on Amazon EMR/Azure HDInsight). Note that for these file systems a staging area must be specified (see above).

Output Ports

  1. Type: Spark Context
    Spark context.

Extension

This node is part of the extension

KNIME Extension for Apache Spark

v4.0.0

Short Link

Drag node into KNIME Analytics Platform