Extracts RDKit molecules out of png images embedded in MS Word or MS PowerPoint files.
The images must have been generated with the RDKit version (>=2020_09_1). Since that version, RDKit adds the molecule information as metadata to the image and hence the molecule can be extracted again.
The output contains a RDKit Molecule column and the index of the image it was extracted from. (No testing was done how Office products generate the images index, I would assume order of insertion and not order of pages/slides)
If the image was generated with
https://github.com/kienerj/molecule-slide-generator
then in addition to the molecules, all it's properties are extracted as well into accordingly named table columns.
- Type: TableMoleculesRDKit Molecules