This is the first workflow in the PubChem Big Data story. In the top part of the workflow we download the assay data from the PubChem database using its API and upload it to a specified S3 bucket on AWS. One file per assay/experiment (AID). In the bottom part we clean up the assay data using KNIME Extension for Apache Spark and store cleaned up files on AWS. AWS Autentication component, Paths to Livy and S3 component, and Create Spark Contex (Livy) node require configuration.
Used extensions & nodes
Created with KNIME Analytics Platform version 4.6.0
Loading ad hoc executions
By using or downloading the workflow, you agree to our terms and conditions.
Discussions are currently not available, please try again later.