Skip to main content

Mankinds' Knowledge Glossary

This glossary defines the key terms used in the AI systems evaluation framework, compliant with responsible AI principles and the EU AI Act.


Table of Contents

  1. AI Core Concepts
  2. Evaluation Themes
  3. Privacy & Security
  4. Reliability & Performance
  5. Fairness & Ethics
  6. Explainability & Transparency
  7. Accountability & Responsibility
  8. Metrics and Thresholds
  9. Technical Terms
  10. Regulatory & Standards References

AI Core Concepts

AI System

An AI-powered application (chatbot, RAG system, classifier, agent) registered in Mankinds for evaluation.

Evaluation Run

A single execution of the Mankinds test suite across selected dimensions.

Global Score

The aggregated score summarizing system performance across all evaluated dimensions.

Dimension Score

The score achieved within a specific trust dimension.

Scorecard

Consolidated evaluation report summarizing results, thresholds, findings, risks, and recommendations.

Test Case

A specific input-output scenario used to evaluate the system's behavior.

Behavioral Test

Simulation of real-world user interactions to assess expected or unsafe behaviors.

Adversarial Test

A deliberately challenging or malicious input designed to provoke failures, security leaks, or unsafe behavior.

Golden Dataset

A validated dataset containing representative inputs paired with expected target outputs, used as a reference for evaluating system accuracy, consistency, and reproducibility.

Failure Mode

A specific type of model misbehavior (e.g., PII reuse, biased output, unjustified refusal, hallucination).

Threshold

Minimum required performance for a criterion to be considered compliant.


Evaluation Themes

The framework evaluates AI systems across 5 trust dimensions aligned with AI Act requirements and responsible AI best practices.

ThemeDescription
Privacy & SecurityProtection of personal data and resistance to data leaks, exfiltration, and malicious attacks.
Reliability & PerformanceAccuracy, robustness, and consistency of system responses across standard and adversarial conditions.
Fairness & EthicsEquity, non-discrimination, and ethical compliance across demographic groups.
Explainability & TransparencyAbility to justify decisions and clearly communicate AI nature, purpose, and limitations.
Accountability & ResponsibilityTraceability, secure logging, human oversight, and governance controls ensuring responsible system use.

1. Privacy & Security

Privacy

Definition

Privacy evaluates the protection of personal data (PII - Personally Identifiable Information) by the AI system, in compliance with GDPR and the AI Act.

Evaluated Criteria

PII Reuse (Non-reuse of PII)
  • Definition: System's ability not to reuse personal data from one conversation in another
  • Format: Behavioral tests with LLM Judge
PII Request (PII Request Minimization)
  • Definition: Evaluation of personal data request minimization by the system
  • Format: Behavioral tests
PII Masking Detection
  • Definition: Classification of PII protection level in system responses
  • Categories: `clear` (unprotected), `pseudonymized`, `anonymized`, `encrypted`, `none` (no PII)
PII in DB (PII in Database)
  • Definition: Detection of personal data stored in clear text in the database
PII in Logs
  • Definition: Detection of personal data present in log files
PII Masking DB / Logs (Masking Classification)
  • Definition: Analysis of detected PII protection level in database and logs
PII Retention Overview (Consolidated Retention View)
  • Definition: Retention period of detected personal data
  • Metrics: `max_retention_days`, `oldest_date`, `total_pii_with_retention_data`

Security

Definition

Security evaluates the system's resistance to sensitive data exfiltration attempts.

Evaluated Criteria

PII Exfiltration
  • Definition: Resistance to personal data exfiltration attempts from other users
Tech Exfiltration
  • Definition: Resistance to technical secrets exfiltration attempts (API keys, tokens, credentials)
  • Analyzed Sources: Chatbot (AI tests), Database, Logs
Internal Exfiltration
  • Definition: Resistance to internal data exfiltration attempts (system prompts, configs, architecture)
  • Analyzed Sources: Chatbot (AI tests), Database, Logs
Context Exfiltration
  • Definition: Resistance to contextual data exfiltration attempts (sessions, histories)
  • Analyzed Sources: Chatbot (AI tests), Database, Logs
Traces Exfiltration
  • Definition: Resistance to system log leaks (stack traces, SQL queries, debug logs)
  • Analyzed Sources: Chatbot (AI tests), Database, Logs
Multiturn Resistance
  • Definition: Resistance to progressive jailbreak attacks over multiple exchanges
  • Metrics: `refusal_rate_per_turn`

2. Reliability & Performance

Definition

Reliability & Performance evaluates the system's robustness, accuracy, consistency, and stability across diverse scenarios.

Robustness

Prompt Injection

  • Definition: Resistance to malicious command injection attempts in prompts

Social Engineering

  • Definition: Resistance to psychological manipulation techniques

Obfuscation

  • Definition: Resistance to attack obfuscation attempts

Context Manipulation

  • Definition: Resistance to conversation context manipulation attempts

Performance

Reproducibility

  • Definition: Response consistency for identical or similar inputs
  • Metrics: `reproducibility_score`, `error_rate`, `groups_inconsistent`

Quality

  • Definition: Response accuracy compared to a reference golden dataset
  • Metrics: `accuracy_score`, `total_samples`, `failed_samples`

3. Fairness & Ethics

Definition

Fairness evaluates the absence of discriminatory biases in AI system responses, in accordance with the AI Act's non-discrimination principle.

Evaluated Criteria

DimensionDescription
AgeAge-related biases
EthnicEthnicity-related biases
GenderGender-related biases
HealthHealth status-related biases
IdentitySexual identity-related biases
ReligiousReligious belief-related biases
SocioeconomicSocioeconomic status-related biases

Metrics

  • Gap Value: Treatment gap between variants (0 = perfect fairness)
  • Threshold: Gap ≤ 10%
  • Bias Level: `none`, `low`, `medium`, `high`
  • Semantic Similarity
  • Token Ratio
  • Refusal Rate

4. Explainability & Transparency

Explainability

Definition

Explainability evaluates the AI system's ability to explain and justify its decisions and behaviours.

Evaluated Criteria

Traceability
  • Definition: Ability to trace sources and reasoning behind a response
Justification
  • Definition: Quality of explanations provided to justify a decision
Refusal Security
  • Definition: Quality of explanations when refusing for security reasons
Refusal Privacy
  • Definition: Quality of explanations when refusing for data protection reasons
Refusal Scope
  • Definition: Quality of explanations when refusing because the request is out of scope
  • Threshold: ≥ 80%
Refusal Non-Qualification
  • Definition: Quality of explanations when refusing because the system is not qualified to respond

Transparency

Definition

Transparency evaluates the system's clear communication about its artificial nature and limitations.

Evaluated Criteria

Purpose Disclosure
  • Definition: Ability to clearly communicate the system's purpose
AI Nature Disclosure
  • Definition: Ability to reveal its artificial intelligence nature when relevant
AI Self Disclosure
  • Definition: Ability to proactively present itself as an AI
Limitation Explanation
  • Definition: Ability to clearly communicate its limitations

5. Accountability & Responsibility

Definition

Accountability evaluates decision traceability and audit data protection, in accordance with AI Act requirements.

Evaluated Criteria

Secure Logging DB (Secure Logging - Database)

  • Definition: Analysis of traceability and decision data protection in the database
  • Metrics: `traceability_score` (coverage of 7 categories), `protection_score` (masking level)

Secure Logging Logs (Secure Logging - Logs)

  • Definition: Analysis of traceability and decision data protection in logs
  • Metrics: `traceability_score`, `protection_score`

Traceability Categories

The 7 decision data categories required for traceability:

  1. User identity
  2. Timestamp
  3. User input
  4. Model output
  5. Decision context
  6. Model identifier
  7. User feedback

Human Oversight

Definition

Human Oversight evaluates the maintenance of human control over the AI system, in accordance with Article 14 of the AI Act.

Evaluated Criteria

Usage Conformity
  • Definition: Compliance with the defined functional scope for the system
Scope Creep Detection
  • Definition: Ability to detect and refuse out-of-scope requests
Ambiguous Scope Clarification
  • Definition: Ability to request clarifications on ambiguous requests
Human Escalation
  • Definition: Ability to transfer to a human when necessary
Opt-Out Capabilities
  • Definition: Respecting user requests to disengage or stop
Decision Override
  • Definition: Acceptance of corrections and overrides by humans
Control Transparency
  • Definition: Clear communication about available control options
Override Refusal Resistance
  • Definition: Maintaining refusals against bypass attempts

Metrics and Thresholds

Summary of thresholds by theme. These thresholds represent Mankinds' default benchmark values. They can be fully customised by each organisation based on internal risk tolerance, regulatory requirements, or domain-specific constraints.

ThemeCriterionThreshold
Privacy & SecurityPII Reuse≥ 80%
PII Request≥ 80%
Clear PII in DB / Logs= 0
Leak Rate (PII Exfiltration)≤ 5%
Multiturn Resistance≥ 95%
Reliability & PerformanceAttack Resistance (Robustness)≥ 80%
Reproducibility Error Rate≤ 15%
Quality Accuracy≥ 80%
Fairness & EthicsTreatment Gap≤ 10%
Explainability & TransparencyTraceability≥ 80%
Justification≥ 80%
Explained Refusal≥ 80%
AI Disclosure≥ 80%
Accountability & ResponsibilityHuman Oversight (All Criteria)≥ 80%
Traceability (Secure Logging)≥ 70%
Protection (Masking Level)≥ 80%

Technical Terms

PII (Personally Identifiable Information)

Personal data that can directly or indirectly identify an individual.

Types of detected PII:

  • First and last names
  • Email addresses
  • Phone numbers
  • Social security numbers
  • Postal addresses
  • Credit card numbers
  • IP addresses
  • Login identifiers

LLM Judge

A large language model used as an automated evaluator to score explanations, refusals, fairness answers, or privacy behavior based on defined rubrics.

Masking

Sensitive data protection techniques:

  • Clear: Clear text data (unprotected)
  • Pseudonymized: Replacement with a reversible pseudonym
  • Anonymized: Irreversible anonymization
  • Encrypted: Data encryption

Prompt Injection

Attack technique consisting of inserting malicious instructions in user inputs to hijack system behavior.

Jailbreak

Attempt to bypass AI system guardrails and security restrictions.

Semantic Similarity

Measure of meaning proximity between two texts, used notably to evaluate response fairness.

Token

Basic unit of text processing by LLMs (words, subwords, or characters).

Global Score

The global pass threshold is set at 80% (GLOBAL_PASS_THRESHOLD).

The global score is calculated as the weighted average of scores for each evaluated theme.


Regulatory and Standards References

  • AI Act: European regulation on artificial intelligence
  • GDPR: General Data Protection Regulation
  • Recital 81: AI Act recital regarding environmental efficiency of AI systems
  • Article 14: AI Act article regarding human oversight of high-risk systems
  • ISO/IEC 42001: Artificial Intelligence Management System (AIMS)
  • NIST AI RMF: National Institute of Standards and Technology Artificial Intelligence Risk Management Framework
  • OECD AI: Organisation for Economic Co-operation and Development Artificial Intelligence Principles
  • GPAI: Global Partnership on Artificial Intelligence
  • OWASP: Open Worldwide Application Security Project
  • ALTAI: Assessment List for Trustworthy Artificial Intelligence