Hub
Pricing About
  • Software
  • Blog
  • Forum
  • Events
  • Documentation
  • About KNIME
  • KNIME Community Hub
  • chemgirl36
  • Spaces
  • Public Space
  • L4-DE Best Practices for Data Engineering
  • exercises
  • Session_3_ELT_on_Big_Data
  • 03.4_Writing_from_Spark
WorkflowWorkflow

03.4_Writing_from_Spark_exercise

Education Data engineering Data engineer Best practices Spark
+1
Chemgirl36 profile image

Last edited: 

Drag & drop
Like
Download workflow
Copy short link
Workflow preview
The company tracks the usage of the website and stores the information about each session. - Various data are collected, e.g., session start, duration, # clicks, etc., as well as the session satisfaction score (optional) - The company calculates averaged statistics for each customer, e.g., total # visits, average satisfaction, etc., and updates the "statistics" table on the database - Session satisfaction score column has missing values which need to be imputed, e.g., with machine learning predictions. We access the usage data from Hive and personal data (anonymized & updated in sessions 1 & 2) and contracts data from the PostgreSQL database. We perform in-database processing, read the data into Spark, enrich the usage data with the personal and contract data to predict missing values better, and continue working with the relatively big usage data on Spark. We export the final status of the workflow. In the case some processes fail, we notify responsible people via an automated email.

External resources

  • L4-DE Course Slides

Used extensions & nodes

Created with KNIME Analytics Platform version 4.5.2
  • Go to item
    KNIME Base nodes Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.5.2

    knime
  • Go to item
    KNIME Database Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.5.2

    knime
  • Go to item
    KNIME Extension for Apache Spark Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.5.2

    knime
  • Go to item
    KNIME Extension for Local Big Data Environments Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.5.2

    knime
  • Go to item
    KNIME Quick Forms Trusted extension

    KNIME AG, Zurich, Switzerland

    Version 4.5.2

    knime
  1. Go to item
  2. Go to item
  3. Go to item
  4. Go to item
  5. Go to item
  6. Go to item
Loading deployments
Loading ad hoc executions

Legal

By using or downloading the workflow, you agree to our terms and conditions.

Discussion
Discussions are currently not available, please try again later.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • E-Learning course
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • KNIME Open Source Story
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more on KNIME Business Hub
© 2023 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Credits