Hub
Pricing About
WorkflowWorkflow

Incremental Data Processing with Parquet

ParquetIncremental loadingNYC taxi datasetIncremental processingBigdata
+1
H
Draft Latest edits on 
Feb 7, 2019 9:23 AM
Drag & drop
Like
Download workflow
Workflow preview
In this workflow, we will use the NYC taxi dataset to show case a continous preprocessing and publishing of event data. Instead of the Group Loop Start node this workflow could executed once per week in order to preprocess and publish all data that has arrived during the week. The result is written as a separate Parquet file within the same folder for each run. To ensure the uniquness of the file for each run we use the year and week of each run as file prefix that is set via flow variable. Since the folder stays the same and Parquet is reading all files within the same folder independent of their file name, this folder can be exposed as external table (e.g. in Hive or Impala) to power further analysis processes.

External resources

  • KNIME File Handling Guide
Loading deploymentsLoading ad hoc jobs

Used extensions & nodes

Created with KNIME Analytics Platform version 4.3.0
  • Go to item
    KNIME Amazon Cloud ConnectorsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.3.0

    knime
  • Go to item
    KNIME Azure Cloud ConnectorsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.3.0

    knime
  • Go to item
    KNIME Base nodesTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.3.0

    knime
  • Go to item
    KNIME Basic File System ConnectorsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.3.0

    knime
  • Go to item
    KNIME Big Data ConnectorsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.3.0

    knime
  • Go to item
    KNIME Databricks IntegrationTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.3.0

    knime
  • Go to item
    KNIME Extension for Big Data File FormatsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.3.0

    knime
  • Go to item
    KNIME Extension for Local Big Data EnvironmentsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.3.0

    knime
  • Go to item
    KNIME Google Cloud Storage ConnectionTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.3.0

    knime
  • Go to item
    KNIME Google ConnectorsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.3.0

    knime
  • Go to item
    KNIME JavasnippetTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.3.0

    knime
  • Go to item
    KNIME Office 365 ConnectorsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.3.0

    knime
  • Go to item
    KNIME ServerSpaceTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.12.0

    knime

Legal

By using or downloading the workflow, you agree to our terms and conditions.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits