Hub
Pricing About
NodeNode / Manipulator

PDB Sequence Extractor

Community NodesVernalisSequence ToolsStreamable
Drag & drop
Like

The PDB Sequence Extractor node extracts all chain sequences from a PDB cell. A new row is added to the output table for each chain, and the chain ID is always added. The sequences can be enumerated in any of 4 ways:

  • ‘Raw’ 3-letter sequence(s) from the SEQRES records
  • ‘Sanitized’ 1-letter sequence(s) from the SEQRES records (This option should give identical results to those obtained from the PDB FASTA file download and FASTA Sequence Extractor node)
  • ‘Raw’ 3-letter sequence(s) from the co-ordinates block
  • ‘Sanitized’ 1-letter sequence(s) from the co-ordinates block
If co-ordinates sequences are extracted, then a Model ID column will also be included in the output. Optionally, HETATM records can be included in co-ordinates-derived the sequence(s). If no sequences are selected, then only a list of chains will be returned. The list of chains will consist of all chains found in SEQRES or Co-ordinate blocks (the latter respecting the Include HETATM option setting), regardless of which sequences are extracted.

'Sanitization' is as follows (which follows as closely as possible the process implemented by the PDB ):

  • Phosphorylated, Sulfated, Acylated and Side-chain Methylated amino acids are converted to their unmodified parents
  • D-Amino acids are converted to their L-Amino acid counterparts
  • DNA residues (e.g. DA) are converted to the corresponding RNA residue (e.g. A)
For SEQRES residues, the mappings are taken from the MODRES record in the PDB file. For co-ordinate sequences, tha mappings are from a built-in dictionary, in case the MODRES record is incomplete. 'X' is used for non-deciphered residues, and '?' for sequence gaps in the co-ordinate sequences.

This node was developed by Vernalis (Cambridge, UK) . For feedback and more information, please contact knime@vernalis.com

Node details

Input ports
  1. Type: Table
    In-Port name
    Input table containing a column of PDB Cells
Output ports
  1. Type: Table
    Appended sequence(s)
    Table with one or more sequence columns appended

Extension

The PDB Sequence Extractor node is part of this extension:

  1. Go to item

Related workflows & nodes

  1. Go to item
  2. Go to item
  3. Go to item

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits