In this example workflow we demonstrate the usage of One-Hot Encoder (Biological Sequences) component which is part of the KNIME Verified Components (https://www.knime.com/verified-components). After reading FASTA files using another verified component created for this purpose, we pass the table containing cDNA sequences to the One-Hot Encoder component which turns the sequences to one-hot encoded vectors. We use these one-hot encoded vectors to train a deep learning network (CNN) created using the KNIME keras integration.
The data contains cDNA sequences where some of these sequences represent RNAs that are binding preferences to ELAVL1A protein. The model is trained and to predict if a sequence is a binding preference for this particular protein or not.
The data used in this workflow are from the following publication:
Xiaoyong Pan, Peter Rijnbeek, Junchi Yan, Hong-Bin Shen. Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks. BMC Genomics, 2018, 19:511.
Specifically: https://github.com/xypan1232/iDeepS/tree/master/datasets/clip
Workflow
Example Workflow for One-Hot Encoder (Biological Sequences) Component
External resources
Used extensions & nodes
Created with KNIME Analytics Platform version 4.2.3
- Go to item
- Go to item
- Go to item
- Go to item
- Go to item
- Go to item
Loading deployments
Loading ad hoc executions
Legal
By using or downloading the workflow, you agree to our terms and conditions.
Discussion
Discussions are currently not available, please try again later.