Hub
Pricing About
WorkflowWorkflow

H2O AutoML on Spark

SparkH2oAutomlSparkling waterAirline
+2
knime profile image
Draft Latest edits on 
Jun 14, 2021 7:26 AM
Drag & drop
Like
Download workflow
Workflow preview
This workflow trains classification models for the Airlines Delay dataset using H2O AutoML on Spark. The dataset is expected to be stored on S3 in parquet format. It is first read into the Spark cluster and preprocessed on Spark (missing value handling, normalization, etc.). Then, Sparkling Water is used to train both binary and muliclass classification models with H2O AutoML on the dataset. Last, the models are scored on the previously partitioned test data. The Airlines Delay dataset and description for it can be found here: https://www.kaggle.com/giovamata/airlinedelaycauses You can use the Parquet Writer node to write the dataset to S3 or, e.g., replace the Parquet to Spark node with the CSV Reader and Table to Spark nodes (note that using parquet provides a better performance of the whole process). By increasing or removing the runtime limit for the H2O AutoML Learner nodes, better models might be learned.

External resources

  • Airlines Delay Dataset
Loading deploymentsLoading ad hoc jobs

Used extensions & nodes

Created with KNIME Analytics Platform version 4.4.0
  • Go to item
    KNIME Amazon Cloud ConnectorsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.4.0

    knime profile image
    knime
  • Go to item
    KNIME Extension for Apache SparkTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.4.0

    knime profile image
    knime
  • Go to item
    KNIME H2O Machine Learning IntegrationTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.4.0

    knime profile image
    knime
  • Go to item
    KNIME H2O Sparkling Water IntegrationTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.4.0

    knime profile image
    knime

Legal

By using or downloading the workflow, you agree to our terms and conditions.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits