Mankinds' Knowledge Glossary
This glossary defines the key terms used in the AI systems evaluation framework, compliant with responsible AI principles and the EU AI Act.
Table of Contents
- AI Core Concepts
- Evaluation Dimensions
- Privacy
- Security
- Accuracy
- Fairness
- Explainability
- Accountability
- Sustainability
- Systemic Risk
- Metrics and Thresholds
- Technical Terms
- Regulatory & Standards References
AI Core Concepts
AI System
An AI-powered application (chatbot, RAG system, classifier, agent) registered in Mankinds for evaluation.
Evaluation Run
A single execution of the Mankinds test suite across selected dimensions.
Evaluation Mode
The approach used for evaluation: offline (synthetic test scenarios), online (production traces from observability connectors), or mixed (both).
Overall Score
The aggregated score summarizing system performance across all evaluated dimensions.
Dimension Score
The score achieved within a specific trust dimension.
Scorecard
Consolidated evaluation report summarizing results, thresholds, findings, risks, and recommendations.
Test Case
A specific input-output scenario used to evaluate the system's behavior.
Behavioral Test
Simulation of real-world user interactions to assess expected or unsafe behaviors.
Adversarial Test
A deliberately challenging or malicious input designed to provoke failures, security leaks, or unsafe behavior.
Golden Dataset
A validated dataset containing representative inputs paired with expected target outputs, used as a reference for evaluating system accuracy, consistency, and reproducibility.
Failure Mode
A specific type of model misbehavior (e.g., PII reuse, biased output, unjustified refusal, hallucination).
Threshold
Minimum required performance for a criterion to be considered compliant.
Connector
A pre-built integration with an external data source. Three categories: database (PostgreSQL, MySQL, MongoDB, etc.), observability (Datadog, Langfuse, LangSmith, etc.), and document (Notion, Confluence, etc.).
Evaluation Dimensions
The framework evaluates AI systems across 8 trust dimensions aligned with AI Act requirements and responsible AI best practices.
| Dimension | Description |
|---|---|
| Privacy | Protection of personal data, PII handling, consent, and data minimization. |
| Security | Resistance to attacks, data exfiltration, jailbreak resilience, and input validation. |
| Accuracy | Performance, reliability, hallucination detection, factual consistency, and agentic accuracy. |
| Fairness | Bias detection across protected attributes, equitable treatment, intersectional analysis. |
| Explainability | Ability to justify decisions, communicate AI nature, purpose, and limitations. |
| Accountability | Governance structure, traceability, audit trail, human oversight, and RACI compliance. |
| Sustainability | Environmental efficiency of data storage and logging practices. |
| Systemic Risk | Resistance to societal-scale threats: disinformation, deepfakes, autonomous escalation, malware assistance. |
1. Privacy
Definition
Privacy evaluates the protection of personal data (PII - Personally Identifiable Information) by the AI system, in compliance with GDPR and the AI Act.
Evaluated Criteria
PII Reuse (Non-reuse of PII)
- Definition: System's ability not to reuse personal data from one conversation in another
- Format: Behavioral tests with LLM Judge
PII Request (PII Request Minimization)
- Definition: Evaluation of personal data request minimization by the system
- Format: Behavioral tests
PII Masking Detection
- Definition: Classification of PII protection level in system responses
- Categories:
clear(unprotected),pseudonymized,anonymized,encrypted,none(no PII)
PII in DB (PII in Database)
- Definition: Detection of personal data stored in clear text in the database
PII in Logs
- Definition: Detection of personal data present in log files
PII Masking DB / Logs (Masking Classification)
- Definition: Analysis of detected PII protection level in database and logs
Refusal Privacy
- Definition: Quality of explanations when refusing for data protection reasons
2. Security
Definition
Security evaluates the system's resistance to attacks, sensitive data exfiltration attempts, and harmful content generation.
Evaluated Criteria
Exfiltration Resistance
- PII Exfiltration: Resistance to personal data exfiltration attempts from other users
- Tech Exfiltration: Resistance to technical secrets exfiltration (API keys, tokens, credentials) — analyzed across chatbot, database, and logs
- Internal Exfiltration: Resistance to internal data exfiltration (system prompts, configs, architecture)
- Context Exfiltration: Resistance to contextual data exfiltration (sessions, histories)
- Traces Exfiltration: Resistance to system log leaks (stack traces, SQL queries, debug logs)
Attack Resistance
- Multiturn Resistance: Resistance to progressive jailbreak attacks over multiple exchanges
- Prompt Injection: Resistance to malicious command injection in prompts
- Social Engineering: Resistance to psychological manipulation techniques
- Obfuscation: Resistance to attack obfuscation attempts
- Context Manipulation: Resistance to conversation context manipulation
Compliance
- IP Copyright Violation: Detection of intellectual property and copyright violations
- Catastrophic Misuse: Resistance to requests that could lead to catastrophic harm
3. Accuracy
Definition
Accuracy evaluates the system's performance, reliability, factual correctness, and consistency across diverse scenarios.
Evaluated Criteria
Quality & Consistency
- Reproducibility: Response consistency for identical or similar inputs
- Quality: Response accuracy compared to a reference golden dataset
- Response Correctness: Overall correctness of generated responses
- Response Completeness: Coverage of all relevant aspects in the response
- Contextual Coherence: Logical consistency within the conversation context
Factual Accuracy
- Hallucination Detection: Detection of fabricated or unsupported claims
- Factual Grounding: Verification of factual claims against reliable sources
- Reformulation Stability: Consistency of answers across rephrased questions
Specialized Accuracy
- Classification Accuracy: Correctness of classification tasks
- Structured Output Conformity: Adherence to expected output format (JSON, XML, etc.)
- Extraction Accuracy: Precision of information extraction tasks
- Edge Case Handling: Robustness when facing unusual or boundary inputs
Agentic Accuracy
- Tool Call Accuracy: Correctness of tool/function call invocations
- Tool Call F1: Precision and recall of tool selection
- Agent Goal Accuracy: Success rate in achieving multi-step agent objectives
4. Fairness
Definition
Fairness evaluates the absence of discriminatory biases in AI system responses, in accordance with the AI Act's non-discrimination principle.
Evaluated Criteria
| Dimension | Description |
|---|---|
| Age | Age-related biases |
| Ethnic | Ethnicity-related biases |
| Gender | Gender-related biases |
| Health | Health status-related biases |
| Identity | Sexual identity-related biases |
| Religious | Religious belief-related biases |
| Socioeconomic | Socioeconomic status-related biases |
| Intersectional Bias | Compound biases across multiple protected attributes |
Metrics
- Gap Value: Treatment gap between variants (0 = perfect fairness)
- Threshold: Gap ≤ 10%
- Bias Level:
none,low,medium,high - Semantic Similarity
- Token Ratio
- Refusal Rate
5. Explainability
Definition
Explainability evaluates the AI system's ability to explain and justify its decisions, clearly communicate its nature and limitations.
Evaluated Criteria
Justification & Traceability
- Justification: Quality of explanations provided to justify a decision
- Purpose Disclosure: Ability to clearly communicate the system's purpose
- Limitation Explanation: Ability to clearly communicate its limitations
AI Nature Disclosure
- AI Nature Disclosure: Ability to reveal its artificial intelligence nature when relevant
- AI Self Disclosure: Ability to proactively present itself as an AI
Scope Management
- Control Transparency: Clear communication about available control options
- Ambiguous Scope Clarification: Ability to request clarifications on ambiguous requests
- Refusal Scope: Quality of explanations when refusing out-of-scope requests
- Refusal Non-Qualification: Quality of explanations when refusing because the system is not qualified
6. Accountability
Definition
Accountability evaluates decision traceability, audit data protection, and human oversight maintenance, in accordance with AI Act requirements (Article 14).
Evaluated Criteria
Human Oversight
- Usage Conformity: Compliance with the defined functional scope
- Scope Creep Detection: Ability to detect and refuse out-of-scope requests
- Opt-Out Capabilities: Respecting user requests to disengage or stop
- Decision Override: Acceptance of corrections and overrides by humans
- Override Refusal Resistance: Maintaining refusals against bypass attempts
Secure Logging
- Secure Logging DB: Traceability and decision data protection in the database
- Secure Logging Logs: Traceability and decision data protection in logs
Ethics
- Traceability: Ability to trace sources and reasoning behind a response
- Human Escalation: Ability to transfer to a human when necessary
Traceability Categories
The 7 decision data categories required for traceability:
- User identity
- Timestamp
- User input
- Model output
- Decision context
- Model identifier
- User feedback
7. Sustainability
Definition
Sustainability evaluates the environmental efficiency of AI system data practices.
Evaluated Criteria
- DB Environmental Efficiency: Environmental efficiency of database storage practices
- Log Environmental Efficiency: Environmental efficiency of logging practices
8. Systemic Risk
Definition
Systemic Risk evaluates the AI system's resistance to societal-scale threats and misuse scenarios, aligned with EU AI Act provisions on general-purpose AI (GPAI) models with systemic risk.
Evaluated Criteria
Information Integrity
- Authoritative Confabulation: Resistance to generating authoritative-sounding false claims
- Source Fabrication: Resistance to fabricating fake sources or references
- Expert Impersonation: Resistance to impersonating domain experts or authorities
- Disinformation Generation: Resistance to generating disinformation content
- Opinion Manipulation: Resistance to crafting manipulative or persuasive content
Safety
- Dangerous Content Refusal: Refusal to generate dangerous or harmful content
- Deepfake Assistance: Resistance to assisting in creating deepfakes
- Vulnerability Exploitation: Resistance to providing vulnerability exploitation guidance
- Malware Generation: Resistance to generating malicious code
- Attack Planning: Resistance to assisting in planning attacks
Control
- Confidential Exfiltration: Resistance to leaking confidential training data or system internals
- Context Knowledge Leakage: Resistance to leaking context window content
- Scope Override: Resistance to instruction override attempts
- Autonomous Escalation: Resistance to autonomous capability escalation
- Instruction Resistance: Resistance to ignoring safety instructions
Metrics and Thresholds
Summary of thresholds by dimension. These thresholds represent Mankinds' default benchmark values. They can be fully customised by each organisation based on internal risk tolerance, regulatory requirements, or domain-specific constraints.
| Dimension | Criterion | Threshold |
|---|---|---|
| Privacy | PII Reuse | ≥ 80% |
| PII Request | ≥ 80% | |
| Clear PII in DB / Logs | = 0 | |
| Security | Leak Rate (Exfiltration) | ≤ 5% |
| Multiturn Resistance | ≥ 95% | |
| Attack Resistance (Robustness) | ≥ 80% | |
| Accuracy | Reproducibility Error Rate | ≤ 15% |
| Quality / Response Correctness | ≥ 80% | |
| Hallucination Detection | ≥ 80% | |
| Fairness | Treatment Gap | ≤ 10% |
| Explainability | Justification | ≥ 80% |
| Explained Refusal | ≥ 80% | |
| AI Disclosure | ≥ 80% | |
| Accountability | Human Oversight (All Criteria) | ≥ 80% |
| Traceability (Secure Logging) | ≥ 70% | |
| Protection (Masking Level) | ≥ 80% | |
| Systemic Risk | All Criteria | ≥ 80% |
Technical Terms
PII (Personally Identifiable Information)
Personal data that can directly or indirectly identify an individual.
Types of detected PII:
- First and last names
- Email addresses
- Phone numbers
- Social security numbers
- Postal addresses
- Credit card numbers
- IP addresses
- Login identifiers
LLM Judge
A large language model used as an automated evaluator to score explanations, refusals, fairness answers, or privacy behavior based on defined rubrics.
Masking
Sensitive data protection techniques:
- Clear: Clear text data (unprotected)
- Pseudonymized: Replacement with a reversible pseudonym
- Anonymized: Irreversible anonymization
- Encrypted: Data encryption
Prompt Injection
Attack technique consisting of inserting malicious instructions in user inputs to hijack system behavior.
Jailbreak
Attempt to bypass AI system guardrails and security restrictions.
Hallucination
Generation of content that is fabricated, unsupported by the context, or factually incorrect, presented with unwarranted confidence.
Semantic Similarity
Measure of meaning proximity between two texts, used notably to evaluate response fairness.
Token
Basic unit of text processing by LLMs (words, subwords, or characters).
Overall Score
The overall pass threshold is set at 80% (GLOBAL_PASS_THRESHOLD).
The overall score is calculated as the weighted average of scores for each evaluated dimension. Sustainability is excluded from the overall score. Systemic Risk is excluded when fewer than 3 criteria are evaluated.
Trace
A recorded input/output pair from a production AI system, captured via an observability connector (Langfuse, LangSmith, etc.). Used for online evaluation.
Online Evaluation
Assessment mode that analyzes real production traces instead of synthetic scenarios. Requires an observability connector with Traces capability.
Offline Evaluation
Assessment mode where agents run structured test scenarios against a connected AI system via its API endpoint.
Regulatory and Standards References
- AI Act: European regulation on artificial intelligence
- GDPR: General Data Protection Regulation
- DORA: Digital Operational Resilience Act
- NIS2: Network and Information Security Directive
- Recital 81: AI Act recital regarding environmental efficiency of AI systems
- Article 14: AI Act article regarding human oversight of high-risk systems
- ISO/IEC 42001: Artificial Intelligence Management System (AIMS)
- ISO 27001: Information Security Management System
- NIST AI RMF: National Institute of Standards and Technology Artificial Intelligence Risk Management Framework
- OECD AI: Organisation for Economic Co-operation and Development Artificial Intelligence Principles
- GPAI: General-Purpose AI model provisions under the AI Act
- OWASP LLM Top 10: Open Worldwide Application Security Project — LLM threat taxonomy
- ALTAI: Assessment List for Trustworthy Artificial Intelligence