This workflow demonstrates how to create a Spark context via Apache Livy and execute a simple Spark job on an Amazon EMR cluster. This example uses the NYC taxi dataset from the AWS Registry of Open Data to build a simple prediction model with Random Forest. Additionally, this workflow also shows how to configure Amazon Athena to query dataset that is located on an Amazon S3 bucket.
Used extensions & nodes
Created with KNIME Analytics Platform version 4.2.0 Note: Not all extensions may be displayed.
By downloading the workflow, you agree to our terms and conditions.License (CC-BY-4.0)