Skip to main content

Mankinds' Knowledge Glossary

This glossary defines the key terms used in the AI systems evaluation framework, compliant with responsible AI principles and the EU AI Act.


Table of Contents

  1. AI Core Concepts
  2. Evaluation Dimensions
  3. Privacy
  4. Security
  5. Accuracy
  6. Fairness
  7. Explainability
  8. Accountability
  9. Sustainability
  10. Systemic Risk
  11. Metrics and Thresholds
  12. Technical Terms
  13. Regulatory & Standards References

AI Core Concepts

AI System

An AI-powered application (chatbot, RAG system, classifier, agent) registered in Mankinds for evaluation.

Evaluation Run

A single execution of the Mankinds test suite across selected dimensions.

Evaluation Mode

The approach used for evaluation: offline (synthetic test scenarios), online (production traces from observability connectors), or mixed (both).

Overall Score

The aggregated score summarizing system performance across all evaluated dimensions.

Dimension Score

The score achieved within a specific trust dimension.

Scorecard

Consolidated evaluation report summarizing results, thresholds, findings, risks, and recommendations.

Test Case

A specific input-output scenario used to evaluate the system's behavior.

Behavioral Test

Simulation of real-world user interactions to assess expected or unsafe behaviors.

Adversarial Test

A deliberately challenging or malicious input designed to provoke failures, security leaks, or unsafe behavior.

Golden Dataset

A validated dataset containing representative inputs paired with expected target outputs, used as a reference for evaluating system accuracy, consistency, and reproducibility.

Failure Mode

A specific type of model misbehavior (e.g., PII reuse, biased output, unjustified refusal, hallucination).

Threshold

Minimum required performance for a criterion to be considered compliant.

Connector

A pre-built integration with an external data source. Three categories: database (PostgreSQL, MySQL, MongoDB, etc.), observability (Datadog, Langfuse, LangSmith, etc.), and document (Notion, Confluence, etc.).


Evaluation Dimensions

The framework evaluates AI systems across 8 trust dimensions aligned with AI Act requirements and responsible AI best practices.

DimensionDescription
PrivacyProtection of personal data, PII handling, consent, and data minimization.
SecurityResistance to attacks, data exfiltration, jailbreak resilience, and input validation.
AccuracyPerformance, reliability, hallucination detection, factual consistency, and agentic accuracy.
FairnessBias detection across protected attributes, equitable treatment, intersectional analysis.
ExplainabilityAbility to justify decisions, communicate AI nature, purpose, and limitations.
AccountabilityGovernance structure, traceability, audit trail, human oversight, and RACI compliance.
SustainabilityEnvironmental efficiency of data storage and logging practices.
Systemic RiskResistance to societal-scale threats: disinformation, deepfakes, autonomous escalation, malware assistance.

1. Privacy

Definition

Privacy evaluates the protection of personal data (PII - Personally Identifiable Information) by the AI system, in compliance with GDPR and the AI Act.

Evaluated Criteria

PII Reuse (Non-reuse of PII)

  • Definition: System's ability not to reuse personal data from one conversation in another
  • Format: Behavioral tests with LLM Judge

PII Request (PII Request Minimization)

  • Definition: Evaluation of personal data request minimization by the system
  • Format: Behavioral tests

PII Masking Detection

  • Definition: Classification of PII protection level in system responses
  • Categories: clear (unprotected), pseudonymized, anonymized, encrypted, none (no PII)

PII in DB (PII in Database)

  • Definition: Detection of personal data stored in clear text in the database

PII in Logs

  • Definition: Detection of personal data present in log files

PII Masking DB / Logs (Masking Classification)

  • Definition: Analysis of detected PII protection level in database and logs

Refusal Privacy

  • Definition: Quality of explanations when refusing for data protection reasons

2. Security

Definition

Security evaluates the system's resistance to attacks, sensitive data exfiltration attempts, and harmful content generation.

Evaluated Criteria

Exfiltration Resistance

  • PII Exfiltration: Resistance to personal data exfiltration attempts from other users
  • Tech Exfiltration: Resistance to technical secrets exfiltration (API keys, tokens, credentials) — analyzed across chatbot, database, and logs
  • Internal Exfiltration: Resistance to internal data exfiltration (system prompts, configs, architecture)
  • Context Exfiltration: Resistance to contextual data exfiltration (sessions, histories)
  • Traces Exfiltration: Resistance to system log leaks (stack traces, SQL queries, debug logs)

Attack Resistance

  • Multiturn Resistance: Resistance to progressive jailbreak attacks over multiple exchanges
  • Prompt Injection: Resistance to malicious command injection in prompts
  • Social Engineering: Resistance to psychological manipulation techniques
  • Obfuscation: Resistance to attack obfuscation attempts
  • Context Manipulation: Resistance to conversation context manipulation

Compliance

  • IP Copyright Violation: Detection of intellectual property and copyright violations
  • Catastrophic Misuse: Resistance to requests that could lead to catastrophic harm

3. Accuracy

Definition

Accuracy evaluates the system's performance, reliability, factual correctness, and consistency across diverse scenarios.

Evaluated Criteria

Quality & Consistency

  • Reproducibility: Response consistency for identical or similar inputs
  • Quality: Response accuracy compared to a reference golden dataset
  • Response Correctness: Overall correctness of generated responses
  • Response Completeness: Coverage of all relevant aspects in the response
  • Contextual Coherence: Logical consistency within the conversation context

Factual Accuracy

  • Hallucination Detection: Detection of fabricated or unsupported claims
  • Factual Grounding: Verification of factual claims against reliable sources
  • Reformulation Stability: Consistency of answers across rephrased questions

Specialized Accuracy

  • Classification Accuracy: Correctness of classification tasks
  • Structured Output Conformity: Adherence to expected output format (JSON, XML, etc.)
  • Extraction Accuracy: Precision of information extraction tasks
  • Edge Case Handling: Robustness when facing unusual or boundary inputs

Agentic Accuracy

  • Tool Call Accuracy: Correctness of tool/function call invocations
  • Tool Call F1: Precision and recall of tool selection
  • Agent Goal Accuracy: Success rate in achieving multi-step agent objectives

4. Fairness

Definition

Fairness evaluates the absence of discriminatory biases in AI system responses, in accordance with the AI Act's non-discrimination principle.

Evaluated Criteria

DimensionDescription
AgeAge-related biases
EthnicEthnicity-related biases
GenderGender-related biases
HealthHealth status-related biases
IdentitySexual identity-related biases
ReligiousReligious belief-related biases
SocioeconomicSocioeconomic status-related biases
Intersectional BiasCompound biases across multiple protected attributes

Metrics

  • Gap Value: Treatment gap between variants (0 = perfect fairness)
  • Threshold: Gap ≤ 10%
  • Bias Level: none, low, medium, high
  • Semantic Similarity
  • Token Ratio
  • Refusal Rate

5. Explainability

Definition

Explainability evaluates the AI system's ability to explain and justify its decisions, clearly communicate its nature and limitations.

Evaluated Criteria

Justification & Traceability

  • Justification: Quality of explanations provided to justify a decision
  • Purpose Disclosure: Ability to clearly communicate the system's purpose
  • Limitation Explanation: Ability to clearly communicate its limitations

AI Nature Disclosure

  • AI Nature Disclosure: Ability to reveal its artificial intelligence nature when relevant
  • AI Self Disclosure: Ability to proactively present itself as an AI

Scope Management

  • Control Transparency: Clear communication about available control options
  • Ambiguous Scope Clarification: Ability to request clarifications on ambiguous requests
  • Refusal Scope: Quality of explanations when refusing out-of-scope requests
  • Refusal Non-Qualification: Quality of explanations when refusing because the system is not qualified

6. Accountability

Definition

Accountability evaluates decision traceability, audit data protection, and human oversight maintenance, in accordance with AI Act requirements (Article 14).

Evaluated Criteria

Human Oversight

  • Usage Conformity: Compliance with the defined functional scope
  • Scope Creep Detection: Ability to detect and refuse out-of-scope requests
  • Opt-Out Capabilities: Respecting user requests to disengage or stop
  • Decision Override: Acceptance of corrections and overrides by humans
  • Override Refusal Resistance: Maintaining refusals against bypass attempts

Secure Logging

  • Secure Logging DB: Traceability and decision data protection in the database
  • Secure Logging Logs: Traceability and decision data protection in logs

Ethics

  • Traceability: Ability to trace sources and reasoning behind a response
  • Human Escalation: Ability to transfer to a human when necessary

Traceability Categories

The 7 decision data categories required for traceability:

  1. User identity
  2. Timestamp
  3. User input
  4. Model output
  5. Decision context
  6. Model identifier
  7. User feedback

7. Sustainability

Definition

Sustainability evaluates the environmental efficiency of AI system data practices.

Evaluated Criteria

  • DB Environmental Efficiency: Environmental efficiency of database storage practices
  • Log Environmental Efficiency: Environmental efficiency of logging practices

8. Systemic Risk

Definition

Systemic Risk evaluates the AI system's resistance to societal-scale threats and misuse scenarios, aligned with EU AI Act provisions on general-purpose AI (GPAI) models with systemic risk.

Evaluated Criteria

Information Integrity

  • Authoritative Confabulation: Resistance to generating authoritative-sounding false claims
  • Source Fabrication: Resistance to fabricating fake sources or references
  • Expert Impersonation: Resistance to impersonating domain experts or authorities
  • Disinformation Generation: Resistance to generating disinformation content
  • Opinion Manipulation: Resistance to crafting manipulative or persuasive content

Safety

  • Dangerous Content Refusal: Refusal to generate dangerous or harmful content
  • Deepfake Assistance: Resistance to assisting in creating deepfakes
  • Vulnerability Exploitation: Resistance to providing vulnerability exploitation guidance
  • Malware Generation: Resistance to generating malicious code
  • Attack Planning: Resistance to assisting in planning attacks

Control

  • Confidential Exfiltration: Resistance to leaking confidential training data or system internals
  • Context Knowledge Leakage: Resistance to leaking context window content
  • Scope Override: Resistance to instruction override attempts
  • Autonomous Escalation: Resistance to autonomous capability escalation
  • Instruction Resistance: Resistance to ignoring safety instructions

Metrics and Thresholds

Summary of thresholds by dimension. These thresholds represent Mankinds' default benchmark values. They can be fully customised by each organisation based on internal risk tolerance, regulatory requirements, or domain-specific constraints.

DimensionCriterionThreshold
PrivacyPII Reuse≥ 80%
PII Request≥ 80%
Clear PII in DB / Logs= 0
SecurityLeak Rate (Exfiltration)≤ 5%
Multiturn Resistance≥ 95%
Attack Resistance (Robustness)≥ 80%
AccuracyReproducibility Error Rate≤ 15%
Quality / Response Correctness≥ 80%
Hallucination Detection≥ 80%
FairnessTreatment Gap≤ 10%
ExplainabilityJustification≥ 80%
Explained Refusal≥ 80%
AI Disclosure≥ 80%
AccountabilityHuman Oversight (All Criteria)≥ 80%
Traceability (Secure Logging)≥ 70%
Protection (Masking Level)≥ 80%
Systemic RiskAll Criteria≥ 80%

Technical Terms

PII (Personally Identifiable Information)

Personal data that can directly or indirectly identify an individual.

Types of detected PII:

  • First and last names
  • Email addresses
  • Phone numbers
  • Social security numbers
  • Postal addresses
  • Credit card numbers
  • IP addresses
  • Login identifiers

LLM Judge

A large language model used as an automated evaluator to score explanations, refusals, fairness answers, or privacy behavior based on defined rubrics.

Masking

Sensitive data protection techniques:

  • Clear: Clear text data (unprotected)
  • Pseudonymized: Replacement with a reversible pseudonym
  • Anonymized: Irreversible anonymization
  • Encrypted: Data encryption

Prompt Injection

Attack technique consisting of inserting malicious instructions in user inputs to hijack system behavior.

Jailbreak

Attempt to bypass AI system guardrails and security restrictions.

Hallucination

Generation of content that is fabricated, unsupported by the context, or factually incorrect, presented with unwarranted confidence.

Semantic Similarity

Measure of meaning proximity between two texts, used notably to evaluate response fairness.

Token

Basic unit of text processing by LLMs (words, subwords, or characters).

Overall Score

The overall pass threshold is set at 80% (GLOBAL_PASS_THRESHOLD).

The overall score is calculated as the weighted average of scores for each evaluated dimension. Sustainability is excluded from the overall score. Systemic Risk is excluded when fewer than 3 criteria are evaluated.

Trace

A recorded input/output pair from a production AI system, captured via an observability connector (Langfuse, LangSmith, etc.). Used for online evaluation.

Online Evaluation

Assessment mode that analyzes real production traces instead of synthetic scenarios. Requires an observability connector with Traces capability.

Offline Evaluation

Assessment mode where agents run structured test scenarios against a connected AI system via its API endpoint.


Regulatory and Standards References

  • AI Act: European regulation on artificial intelligence
  • GDPR: General Data Protection Regulation
  • DORA: Digital Operational Resilience Act
  • NIS2: Network and Information Security Directive
  • Recital 81: AI Act recital regarding environmental efficiency of AI systems
  • Article 14: AI Act article regarding human oversight of high-risk systems
  • ISO/IEC 42001: Artificial Intelligence Management System (AIMS)
  • ISO 27001: Information Security Management System
  • NIST AI RMF: National Institute of Standards and Technology Artificial Intelligence Risk Management Framework
  • OECD AI: Organisation for Economic Co-operation and Development Artificial Intelligence Principles
  • GPAI: General-Purpose AI model provisions under the AI Act
  • OWASP LLM Top 10: Open Worldwide Application Security Project — LLM threat taxonomy
  • ALTAI: Assessment List for Trustworthy Artificial Intelligence