This is the first workflow in the PubChem Big Data story.
In the top part of the workflow we download the assay data from the PubChem database using its API and upload it to a specified S3 bucket on AWS. One file per assay/experiment (AID).
In the bottom part we clean up the assay data using KNIME Extension for Apache Spark and store cleaned up files on AWS.
AWS Autentication component, Paths to Livy and S3 component, and Create Spark Contex (Livy) node require configuration.
Workflow
01_Fetch_BioAssays
Used extensions & nodes
Created with KNIME Analytics Platform version 4.6.0
Legal
By using or downloading the workflow, you agree to our terms and conditions.