NodeFASTA Sequence Extractor

Manipulator

This node extracts the sequences for all chains listed in the FASTA file. For multi-chain FASTA files, a new row will be added for each chain. A number of columns will be added according to the source type selected in the drop-down as follows - properties extracted are shown as {property}:

  • GenBank >gi|{gi-number}|gb|{accession}|{locus}
  • EMBL Data Library >gi|{gi-number}|emb|{accession}|{locus}
  • DDBJ, DNA Database of Japan >gi|{gi-number}|dbj|{accession}|{locus}
  • NBRF PIR >pir||{entry}
  • Protein Research Foundation >prf||{name}
  • SWISS-PROT >sp|{accession}|{name}
  • PDB >pdb|{entry}|{chain} or >{entry}:{chain}|PDBID|CHAIN|SEQUENCE
  • Patents >pat|{country}|{number}
  • GenInfo Backbone Id >bbs|{number}
  • General database identifier >gnl|{database}|{identifier}
  • NCBI Reference Sequence >ref|{accession}|{locus}
  • Local Sequence identifier >lcl|{identifier}
  • Other (No properties extracted)
FASTA Files can be retrieved for PDB entries using the PDB Downloader nodes. NOTE: No checking of the FASTA header format is implemented, so selecting the wrong format may give unpredicatable results, although the node should still execute in these circumstances. No sequence parsing is implemented, and the processing is type-agnostic (protein, nucleotide etc)

This node was developed by Vernalis (Cambridge, UK). For feedback and more information, please contact knime@vernalis.com

Input Ports

  1. Port Type: Data
    Input table containing a column of FASTA sequence files downloaded from the RCSB PDB

Output Ports

  1. Port Type: Data
    Output table with the chains and sequences extracted into separate columns according to the options specifid