Mankinds' Knowledge Glossary
This glossary defines the key terms used in the AI systems evaluation framework, compliant with responsible AI principles and the EU AI Act.
Table of Contents
- AI Core Concepts
- Evaluation Themes
- Privacy & Security
- Reliability & Performance
- Fairness & Ethics
- Explainability & Transparency
- Accountability & Responsibility
- Metrics and Thresholds
- Technical Terms
- Regulatory & Standards References
AI Core Concepts
AI System
An AI-powered application (chatbot, RAG system, classifier, agent) registered in Mankinds for evaluation.
Evaluation Run
A single execution of the Mankinds test suite across selected dimensions.
Global Score
The aggregated score summarizing system performance across all evaluated dimensions.
Dimension Score
The score achieved within a specific trust dimension.
Scorecard
Consolidated evaluation report summarizing results, thresholds, findings, risks, and recommendations.
Test Case
A specific input-output scenario used to evaluate the system's behavior.
Behavioral Test
Simulation of real-world user interactions to assess expected or unsafe behaviors.
Adversarial Test
A deliberately challenging or malicious input designed to provoke failures, security leaks, or unsafe behavior.
Golden Dataset
A validated dataset containing representative inputs paired with expected target outputs, used as a reference for evaluating system accuracy, consistency, and reproducibility.
Failure Mode
A specific type of model misbehavior (e.g., PII reuse, biased output, unjustified refusal, hallucination).
Threshold
Minimum required performance for a criterion to be considered compliant.
Evaluation Themes
The framework evaluates AI systems across 5 trust dimensions aligned with AI Act requirements and responsible AI best practices.
| Theme | Description |
|---|---|
| Privacy & Security | Protection of personal data and resistance to data leaks, exfiltration, and malicious attacks. |
| Reliability & Performance | Accuracy, robustness, and consistency of system responses across standard and adversarial conditions. |
| Fairness & Ethics | Equity, non-discrimination, and ethical compliance across demographic groups. |
| Explainability & Transparency | Ability to justify decisions and clearly communicate AI nature, purpose, and limitations. |
| Accountability & Responsibility | Traceability, secure logging, human oversight, and governance controls ensuring responsible system use. |
1. Privacy & Security
Privacy
Definition
Privacy evaluates the protection of personal data (PII - Personally Identifiable Information) by the AI system, in compliance with GDPR and the AI Act.
Evaluated Criteria
PII Reuse (Non-reuse of PII)
- Definition: System's ability not to reuse personal data from one conversation in another
- Format: Behavioral tests with LLM Judge
PII Request (PII Request Minimization)
- Definition: Evaluation of personal data request minimization by the system
- Format: Behavioral tests
PII Masking Detection
- Definition: Classification of PII protection level in system responses
- Categories: `clear` (unprotected), `pseudonymized`, `anonymized`, `encrypted`, `none` (no PII)
PII in DB (PII in Database)
- Definition: Detection of personal data stored in clear text in the database
PII in Logs
- Definition: Detection of personal data present in log files
PII Masking DB / Logs (Masking Classification)
- Definition: Analysis of detected PII protection level in database and logs
PII Retention Overview (Consolidated Retention View)
- Definition: Retention period of detected personal data
- Metrics: `max_retention_days`, `oldest_date`, `total_pii_with_retention_data`
Security
Definition
Security evaluates the system's resistance to sensitive data exfiltration attempts.
Evaluated Criteria
PII Exfiltration
- Definition: Resistance to personal data exfiltration attempts from other users
Tech Exfiltration
- Definition: Resistance to technical secrets exfiltration attempts (API keys, tokens, credentials)
- Analyzed Sources: Chatbot (AI tests), Database, Logs
Internal Exfiltration
- Definition: Resistance to internal data exfiltration attempts (system prompts, configs, architecture)
- Analyzed Sources: Chatbot (AI tests), Database, Logs
Context Exfiltration
- Definition: Resistance to contextual data exfiltration attempts (sessions, histories)
- Analyzed Sources: Chatbot (AI tests), Database, Logs
Traces Exfiltration
- Definition: Resistance to system log leaks (stack traces, SQL queries, debug logs)
- Analyzed Sources: Chatbot (AI tests), Database, Logs
Multiturn Resistance
- Definition: Resistance to progressive jailbreak attacks over multiple exchanges
- Metrics: `refusal_rate_per_turn`
2. Reliability & Performance
Definition
Reliability & Performance evaluates the system's robustness, accuracy, consistency, and stability across diverse scenarios.
Robustness
Prompt Injection
- Definition: Resistance to malicious command injection attempts in prompts
Social Engineering
- Definition: Resistance to psychological manipulation techniques
Obfuscation
- Definition: Resistance to attack obfuscation attempts
Context Manipulation
- Definition: Resistance to conversation context manipulation attempts
Performance
Reproducibility
- Definition: Response consistency for identical or similar inputs
- Metrics: `reproducibility_score`, `error_rate`, `groups_inconsistent`
Quality
- Definition: Response accuracy compared to a reference golden dataset
- Metrics: `accuracy_score`, `total_samples`, `failed_samples`
3. Fairness & Ethics
Definition
Fairness evaluates the absence of discriminatory biases in AI system responses, in accordance with the AI Act's non-discrimination principle.
Evaluated Criteria
| Dimension | Description |
|---|---|
| Age | Age-related biases |
| Ethnic | Ethnicity-related biases |
| Gender | Gender-related biases |
| Health | Health status-related biases |
| Identity | Sexual identity-related biases |
| Religious | Religious belief-related biases |
| Socioeconomic | Socioeconomic status-related biases |
Metrics
- Gap Value: Treatment gap between variants (0 = perfect fairness)
- Threshold: Gap ≤ 10%
- Bias Level: `none`, `low`, `medium`, `high`
- Semantic Similarity
- Token Ratio
- Refusal Rate
4. Explainability & Transparency
Explainability
Definition
Explainability evaluates the AI system's ability to explain and justify its decisions and behaviours.
Evaluated Criteria
Traceability
- Definition: Ability to trace sources and reasoning behind a response
Justification
- Definition: Quality of explanations provided to justify a decision
Refusal Security
- Definition: Quality of explanations when refusing for security reasons
Refusal Privacy
- Definition: Quality of explanations when refusing for data protection reasons
Refusal Scope
- Definition: Quality of explanations when refusing because the request is out of scope
- Threshold: ≥ 80%
Refusal Non-Qualification
- Definition: Quality of explanations when refusing because the system is not qualified to respond
Transparency
Definition
Transparency evaluates the system's clear communication about its artificial nature and limitations.
Evaluated Criteria
Purpose Disclosure
- Definition: Ability to clearly communicate the system's purpose
AI Nature Disclosure
- Definition: Ability to reveal its artificial intelligence nature when relevant
AI Self Disclosure
- Definition: Ability to proactively present itself as an AI
Limitation Explanation
- Definition: Ability to clearly communicate its limitations
5. Accountability & Responsibility
Definition
Accountability evaluates decision traceability and audit data protection, in accordance with AI Act requirements.
Evaluated Criteria
Secure Logging DB (Secure Logging - Database)
- Definition: Analysis of traceability and decision data protection in the database
- Metrics: `traceability_score` (coverage of 7 categories), `protection_score` (masking level)
Secure Logging Logs (Secure Logging - Logs)
- Definition: Analysis of traceability and decision data protection in logs
- Metrics: `traceability_score`, `protection_score`
Traceability Categories
The 7 decision data categories required for traceability:
- User identity
- Timestamp
- User input
- Model output
- Decision context
- Model identifier
- User feedback
Human Oversight
Definition
Human Oversight evaluates the maintenance of human control over the AI system, in accordance with Article 14 of the AI Act.
Evaluated Criteria
Usage Conformity
- Definition: Compliance with the defined functional scope for the system
Scope Creep Detection
- Definition: Ability to detect and refuse out-of-scope requests
Ambiguous Scope Clarification
- Definition: Ability to request clarifications on ambiguous requests
Human Escalation
- Definition: Ability to transfer to a human when necessary
Opt-Out Capabilities
- Definition: Respecting user requests to disengage or stop
Decision Override
- Definition: Acceptance of corrections and overrides by humans
Control Transparency
- Definition: Clear communication about available control options
Override Refusal Resistance
- Definition: Maintaining refusals against bypass attempts
Metrics and Thresholds
Summary of thresholds by theme. These thresholds represent Mankinds' default benchmark values. They can be fully customised by each organisation based on internal risk tolerance, regulatory requirements, or domain-specific constraints.
| Theme | Criterion | Threshold |
|---|---|---|
| Privacy & Security | PII Reuse | ≥ 80% |
| PII Request | ≥ 80% | |
| Clear PII in DB / Logs | = 0 | |
| Leak Rate (PII Exfiltration) | ≤ 5% | |
| Multiturn Resistance | ≥ 95% | |
| Reliability & Performance | Attack Resistance (Robustness) | ≥ 80% |
| Reproducibility Error Rate | ≤ 15% | |
| Quality Accuracy | ≥ 80% | |
| Fairness & Ethics | Treatment Gap | ≤ 10% |
| Explainability & Transparency | Traceability | ≥ 80% |
| Justification | ≥ 80% | |
| Explained Refusal | ≥ 80% | |
| AI Disclosure | ≥ 80% | |
| Accountability & Responsibility | Human Oversight (All Criteria) | ≥ 80% |
| Traceability (Secure Logging) | ≥ 70% | |
| Protection (Masking Level) | ≥ 80% |
Technical Terms
PII (Personally Identifiable Information)
Personal data that can directly or indirectly identify an individual.
Types of detected PII:
- First and last names
- Email addresses
- Phone numbers
- Social security numbers
- Postal addresses
- Credit card numbers
- IP addresses
- Login identifiers
LLM Judge
A large language model used as an automated evaluator to score explanations, refusals, fairness answers, or privacy behavior based on defined rubrics.
Masking
Sensitive data protection techniques:
- Clear: Clear text data (unprotected)
- Pseudonymized: Replacement with a reversible pseudonym
- Anonymized: Irreversible anonymization
- Encrypted: Data encryption
Prompt Injection
Attack technique consisting of inserting malicious instructions in user inputs to hijack system behavior.
Jailbreak
Attempt to bypass AI system guardrails and security restrictions.
Semantic Similarity
Measure of meaning proximity between two texts, used notably to evaluate response fairness.
Token
Basic unit of text processing by LLMs (words, subwords, or characters).
Global Score
The global pass threshold is set at 80% (GLOBAL_PASS_THRESHOLD).
The global score is calculated as the weighted average of scores for each evaluated theme.
Regulatory and Standards References
- AI Act: European regulation on artificial intelligence
- GDPR: General Data Protection Regulation
- Recital 81: AI Act recital regarding environmental efficiency of AI systems
- Article 14: AI Act article regarding human oversight of high-risk systems
- ISO/IEC 42001: Artificial Intelligence Management System (AIMS)
- NIST AI RMF: National Institute of Standards and Technology Artificial Intelligence Risk Management Framework
- OECD AI: Organisation for Economic Co-operation and Development Artificial Intelligence Principles
- GPAI: Global Partnership on Artificial Intelligence
- OWASP: Open Worldwide Application Security Project
- ALTAI: Assessment List for Trustworthy Artificial Intelligence