NodePDB Sequence Extractor


The PDB Sequence Extractor node extracts all chain sequences from a PDB cell. A new row is added to the output table for each chain, and the chain ID is always added. The sequences can be enumerated in any of 4 ways:

  • ‘Raw’ 3-letter sequence(s) from the SEQRES records
  • ‘Sanitized’ 1-letter sequence(s) from the SEQRES records (This option should give identical results to those obtained from the PDB FASTA file download and FASTA Sequence Extractor node)
  • ‘Raw’ 3-letter sequence(s) from the co-ordinates block
  • ‘Sanitized’ 1-letter sequence(s) from the co-ordinates block
If co-ordinates sequences are extracted, then a Model ID column will also be included in the output. Optionally, HETATM records can be included in co-ordinates-derived the sequence(s). If no sequences are selected, then only a list of chains will be returned. The list of chains will consist of all chains found in SEQRES or Co-ordinate blocks (the latter respecting the Include HETATM option setting), regardless of which sequences are extracted.

'Sanitization' is as follows (which follows as closely as possible the process implemented by the PDB):

  • Phosphorylated, Sulfated, Acylated and Side-chain Methylated amino acids are converted to their unmodified parents
  • D-Amino acids are converted to their L-Amino acid counterparts
  • DNA residues (e.g. DA) are converted to the corresponding RNA residue (e.g. A)
For SEQRES residues, the mappings are taken from the MODRES record in the PDB file. For co-ordinate sequences, tha mappings are from a built-in dictionary, in case the MODRES record is incomplete. 'X' is used for non-deciphered residues, and '?' for sequence gaps in the co-ordinate sequences.

This node was developed by Vernalis (Cambridge, UK). For feedback and more information, please contact

Input Ports

  1. Port Type: Data
    Input table containing a column of PDB Cells

Output Ports

  1. Port Type: Data
    Table with one or more sequence columns appended