This node provides an open-source framework for detecting potential vulnerabilites in the provided as GenAI workflow. It evaluates the workflow by combining heuristics-based and LLM-assisted detectors. Giskard uses the provided LLM for the evaluation but applies different model parameters for some of the detectors. The viability of the LLM-assisted detectors can be improved by providing an optional input table with common example prompts for the workflow.
The node uses detectors for the following vulnerabilities:
- Hallucination and Misinformation : Detects if the workflow is prone to generate fabricated or false information.
- Harmful Content : Detects if the workflow is prone to produce content that is unethical, illegal or otherwise harmful.
- Prompt Injection : Detects if the workflow's behavior can be altered via a variety of prompt injection techniques.
- Robustness : Detects if the workflow is sensitive to small perturbations in the input that result in inconsistent responses.
- Stereotypes : Detects stereotype-based discrimination in the workflow responses.
- Information disclosure : Attempts to cause the workflow to disclose sensitive information such as secrets or personally identifiable information. Might produce false-positives if the workflow is required to output information that can be considered sensitive such as contact information for a business.
- Output Formatting : Checks that the workflow output is consistent with the format requirements indicated in the model description, if such instructions are provided.
This node does not utilize Giskard's LLMCharsInjectionDetector. For more details on LLM vulnerabilities refer to the Giskard documentation
In order to perform tasks with LLM-assisted detectors, Giskard sends the following information to the language model provider:
- Data provided in your Dataset
- Text generated by your model
- Model name and description
Note that this does not apply if a self-hosted model is used.
More information on Giskard can be found in the documentation .