This node implements Presidio's Analyzer, which allows to detect Personal Identifiable Information (PII) in English text data.
The node analyzes the data of a specified string column of the input table for specified PII entity types. It adds the detected entities to the input table by appending the following columns:
- Entity Type: the entity type of the detected PII entity
- Entity: the piece of text that is recognized as PII
- Start: the index of the first character of the entity in the text
- End: the index of the last character of the entity in the text
- Score: the certainty of Presidio for the detection
- Row: the original row ID from the input table
Rows with multiple entities will be ungrouped so that each row contains one entity.
Further information on the Presidio Analyzer can be found on the Microsoft Presidio website .
Warning: Presidio can help identify sensitive/PII data in un/structured text. However, because it is using automated detection mechanisms, there is no guarantee that Presidio will find all sensitive information. Therefore, always evaluate the quality of detections and take appropriate measures if necessary.