FASTA Sequence Extractor

Node / Manipulator

FASTA Sequence Extractor

This node extracts the sequences for all chains listed in the FASTA file. For multi-chain FASTA files, a new row will be added for each chain. A number of columns will be added according to the source type selected in the drop-down as follows - properties extracted are shown as {property}:

GenBank >gi|{gi-number}|gb|{accession}|{locus}
EMBL Data Library >gi|{gi-number}|emb|{accession}|{locus}
DDBJ, DNA Database of Japan >gi|{gi-number}|dbj|{accession}|{locus}
NBRF PIR >pir||{entry}
Protein Research Foundation >prf||{name}
SWISS-PROT >sp|{accession}|{name}
PDB >pdb|{entry}|{chain} or >{entry}:{chain}|PDBID|CHAIN|SEQUENCE
Patents >pat|{country}|{number}
GenInfo Backbone Id >bbs|{number}
General database identifier >gnl|{database}|{identifier}
NCBI Reference Sequence >ref|{accession}|{locus}
Local Sequence identifier >lcl|{identifier}
Other (No properties extracted)

FASTA Files can be retrieved for PDB entries using the PDB Downloader nodes. NOTE: No checking of the FASTA header format is implemented, so selecting the wrong format may give unpredicatable results, although the node should still execute in these circumstances. No sequence parsing is implemented, and the processing is type-agnostic (protein, nucleotide etc)

This node was developed by Vernalis (Cambridge, UK) . For feedback and more information, please contact knime@vernalis.com

Node details

Ports Options Views

Input ports

Type: Table
Appended sequences
Input table containing a column of FASTA sequence files downloaded from the RCSB PDB

Output ports

Type: Table
Out-Port 0
Output table with the chains and sequences extracted into separate columns according to the options specifid

Extension

The FASTA Sequence Extractor node is part of this extension:

Go to item