The PDB Sequence Extractor node extracts all chain sequences from a PDB cell. A new row is added to the output table for each chain, and the chain ID is always added. The sequences can be enumerated in any of 4 ways:
- ‘Raw’ 3-letter sequence(s) from the SEQRES records
- ‘Sanitized’ 1-letter sequence(s) from the SEQRES records (This option should give identical results to those obtained from the PDB FASTA file download and FASTA Sequence Extractor node)
- ‘Raw’ 3-letter sequence(s) from the co-ordinates block
- ‘Sanitized’ 1-letter sequence(s) from the co-ordinates block
'Sanitization' is as follows (which follows as closely as possible the process implemented by the PDB ):
- Phosphorylated, Sulfated, Acylated and Side-chain Methylated amino acids are converted to their unmodified parents
- D-Amino acids are converted to their L-Amino acid counterparts
- DNA residues (e.g. DA) are converted to the corresponding RNA residue (e.g. A)
This node was developed by Vernalis (Cambridge, UK) . For feedback and more information, please contact knime@vernalis.com