Finding disease-causal variants among large amounts of present variants remains a major challenge in next-generation sequencing experiments data analysis ("Needles in stacks of needles", Cooper 2011).
One of the most frequently used formats to store variant information is the Variant Call Format (VCF). As extracting information from complex genetic variation data encoded in VCF files is not a straightforward task, there are several command line tools for filtering and querying information in VCF files with the ultimate goal to detect disease-causal variants.
This workflow illustrates how to mine your VCF files within KNIME Analytics Platform with the ultimate goal to find variants associated with a specific disease.
We utilize three common tools: BCFtools, VCFtools and VEP (via the Ensembl Rest API) to filter and annotate the variants. The domain expert can interactively select variants of interest, filter by allele frequency in the 1000 genomes project and gnomeAD or by predicted deleteriousness of a variant (SIFT Score).
Requirements:
- Run Bash scripts
- Install tabix, VCFtools and BCFtools
Workflow
Reproducible Variant Prioritization
Used extensions & nodes
Created with KNIME Analytics Platform version 4.4.1
- Go to item
KNIME NGS tools
Plate-forme 2 - Transcriptome et Epigenome, Institut Pasteur, Paris.
Version 0.2.300
- Go to item
- Go to item
- Go to item
- Go to item
- Go to item
- Go to item
Legal
By using or downloading the workflow, you agree to our terms and conditions.