This workflow uses Presidio nodes within KNIME to anonymize sensitive information while working with third-party AI models in the KNIME Analytics Platform environment.
Using the Presidio nodes, you can easily identify Personally Identifiable Information (PII) and anonymize it before passing it to chat large language models, significantly reducing the risk of sensitive data leaks and ensuring a secure data environment.
You can download and run the workflow directly in your KNIME Analytics Platform. For optimal performance, we recommend using the latest version of the KNIME Analytics Platform.
Workflow Details
Creating Fake Customer Message
A fake bank customer message is created, complaining about some unauthorized transactions on their credit card, along with the system message to use in the Chat Model Prompter node.
Identifying PII with Presidio Analyzer
The Presidio Analyzer node identifies the following PII entities: Credit Card, Email, and Person. After running the node, we check how many entities the node has found.
Anonymizing PII with Presidio Anonymizer
Using the information from the Presidio Analyzer node, the Presidio Anonymizer node anonymizes the entities with random ones (names and emails).
Passing Anonymized Data to Chat Model
Once the data is anonymized, the customer issue is passed to OpenAI using the Chat Model Prompter node. The node is configured as follows:
System Message: Instruct the AI to behave like a bank's virtual assistant.
User Message: Contains the request from the customer.
Past Conversations: Includes a welcome message.
Deanonymizing AI Response
When the AI assistant produces the response, the generated content is passed to the Presidio Deanonymizer node to restore the original entities. In the node output you can compare the anonymized and deanonymized outputs.
Visualizing Chat
Finally, the chat is plotted with the original data from the customer.