This node implements Presidio's Anonymizer, which allows to anonymize English text data. It uses pseudonymization, which makes it possible to reinsert the personal information into the anonymized data with the Presidio Deanonymizer node.
The node anonymizes the data of a specified string column of the input table by replacing all occurrences of the selected PII entity types with abstract placeholders. If it is possible for the selected types, the information can be replaced with randomly generated information of the same type. You can choose whether the anonymized data replaces the original data or is appended in a new column.
Per default, this node detects the PII entities before anonymizing them. Since Presidio may mistakenly detect words as PII, it is possible to connect a table that has the output columns of the Presidio Analyzer node to the dynamic port. The Presidio Anonymizer will then only anonymize the entities stored in that table.
Warning: Presidio can help identify sensitive/PII data in un/structured text. However, because it is using automated detection mechanisms, there is no guarantee that Presidio will find all sensitive information. Therefore, always evaluate the quality of detections and take appropriate measures if necessary.