01_Fetch_BioAssays

Workflow

Draft Latest edits on

This is the first workflow in the PubChem Big Data story. In the top part of the workflow we download the assay data from the PubChem database using its API and upload it to a specified S3 bucket on AWS. One file per assay/experiment (AID). In the bottom part we clean up the assay data using KNIME Extension for Apache Spark and store cleaned up files on AWS. AWS Autentication component, Paths to Livy and S3 component, and Create Spark Contex (Livy) node require configuration.

Loading deploymentsLoading ad hoc jobs

Legal

By using or downloading the workflow, you agree to our terms and conditions.