Mankinds SDK
Integrate AI governance into your applications. The Mankinds SDK provides the tools to create systems, connect your data sources, generate test datasets, and run evaluations across 8 trust dimensions — programmable in JavaScript/TypeScript and Python.
from mankinds_sdk import MankindsClient
import os
client = MankindsClient(api_key=os.environ["MANKINDS_API_KEY"])
# Create a system, generate a dataset, evaluate
system = client.create_system("My AI Assistant", "...", endpoint={...})
client.generate_dataset(system["id"], num_scenarios=10)
result = client.evaluate(system["id"])
print(f"Score: {result['summary']['overall_score']}%")
Getting Started
A few steps to integrate AI evaluation into your project.
pip install mankinds-sdk
Create an account on app.mankinds.io and generate your API key from settings.
import os
from mankinds_sdk import MankindsClient
client = MankindsClient(api_key=os.environ["MANKINDS_API_KEY"])
Create a system, generate a test dataset, then run the evaluation:
# 1. Create a system with its endpoint
system = client.create_system(
"GDPR Compliance Assistant",
"Chatbot that advises companies on their GDPR obligations.",
endpoint={
"url": "https://api.example.com/chat",
"method": "POST",
"body": {"message": "{{input}}"},
"response": {"reply": "{{output}}"}
}
)
# 2. Generate a test dataset (10 scenarios)
dataset = client.generate_dataset(system["id"], num_scenarios=10)
# 3. Run the evaluation
result = client.evaluate(system["id"])
print(f"Overall score: {result['summary']['overall_score']}%")
Features
| Feature | Description |
|---|---|
| Systems | Create, configure and manage your AI systems with automatic description validation |
| Connectors | Connect databases (PostgreSQL, MySQL, MongoDB, Snowflake, SQLite, SQL Server), observability tools (Datadog, Langfuse, LangSmith, Elasticsearch, Splunk, CloudWatch, OpenTelemetry, MLflow, OpenSearch) and log files |
| Datasets | Provide your reference scenarios or auto-generate them with AI |
| Evaluations | Run offline or online evaluations across 86 criteria in 8 dimensions and retrieve scores |
| Authentication | Secure API key with fine-grained permission management |
API Reference
MankindsClient
The main client for interacting with the Mankinds API.
Constructor
MankindsClient(api_key: str, base_url: str = None, timeout: int = 30)
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
api_key | str | Required | — | Mankinds API key (format: mk_...) |
base_url | str | Optional | https://app.mankinds.io | API base URL |
timeout | int | Optional | 120 | Request timeout in seconds |
Systems
createSystem
Creates a new AI system and automatically validates its description.
| Parameter | Type | Required | Description |
|---|---|---|---|
name | str | Required | AI system name |
description | str | Required | Description of the expected system behavior |
endpoint | dict | Required | API endpoint configuration (url, method, body, response) |
| Returns | Type | Description |
|---|---|---|
id | string | Unique identifier of the created system |
success | boolean | true if the description was validated |
recommendations | array | Recommendations to improve the description |
In your endpoint's body and response configurations, use the placeholders {{input}} and {{output}} to dynamically inject test scenarios.
system = client.create_system(
"GDPR Compliance Assistant",
"Chatbot that advises companies on their GDPR obligations and guides compliance.",
endpoint={
"url": "https://api.example.com/chat",
"method": "POST",
"body": {"message": "{{input}}"},
"response": {"reply": "{{output}}"}
}
)
print(f"System created: {system['id']}")
print(f"Description validated: {system['success']}")
getSystem
Retrieves system details.
| Parameter | Type | Required | Description |
|---|---|---|---|
system_id | str | Required | Unique system identifier |
| Returns | Type | Description |
|---|---|---|
id | string | Unique identifier of the system |
name | string | System name |
description | string | System description |
is_description_validated | boolean | true if the description has been validated |
endpoint | object | Endpoint configuration |
system = client.get_system(system_id)
print(f"Name: {system['name']}")
print(f"Description validated: {system['is_description_validated']}")
print(f"Endpoint: {system['endpoint']}")
updateSystem
Updates an existing system.
| Parameter | Type | Required | Description |
|---|---|---|---|
system_id | str | Required | Unique system identifier |
name | str | Optional | New system name |
description | str | Optional | New description (triggers re-validation) |
endpoint | dict | Optional | New endpoint configuration |
| Returns | Type | Description |
|---|---|---|
success | boolean | true if the update was successful |
recommendations | array | Recommendations if description was re-validated |
If you provide an endpoint, it must contain the required fields: url, method, body, response. Otherwise, the exception InvalidEndpointError will be raised.
result = client.update_system(
system_id,
name="GDPR Compliance Assistant v2",
description="Improved version with DPO questions support."
)
print(f"Validated: {result['success']}")
If a new
descriptionis provided, it is automatically re-validated. On failure,DescriptionNotValidatedErroris raised with recommendations.
Connectors
Connectors allow you to connect your data sources to enable evaluations. There are three categories: database, observability, and document connectors.
Each system can only have one connector per category (database, observability).
Available Types
Database connectors
| Connector | Description | Configuration |
|---|---|---|
postgresql | PostgreSQL database | host, port, database, user, password |
mysql | MySQL database | host, port, database, user, password |
mongodb | MongoDB database | host, port, database, user, password, authSource |
sqlserver | SQL Server database | host, port, database, user, password, instance |
sqlite | SQLite database file | file_path, file_name |
snowflake | Snowflake data warehouse | account, user, password, warehouse, database, schema |
Observability connectors
| Connector | Description | Capabilities |
|---|---|---|
file | Log files (.log, .txt, .json) | Logs |
datadog | Datadog logs and traces | Logs, Traces |
elasticsearch | Elasticsearch logs and traces | Logs, Traces (with trace mapping) |
cloudwatch | AWS CloudWatch logs | Logs |
opensearch | OpenSearch logs | Logs |
splunk | Splunk logs | Logs |
opentelemetry | OpenTelemetry traces | Logs, Traces |
mlflow | MLflow experiment tracking | Logs |
langfuse | Langfuse LLM traces | Traces |
langsmith | LangSmith LLM traces | Traces |
Connectors with Traces capability (Langfuse, LangSmith, Datadog, OpenTelemetry) can provide system input/output pairs, which enables online evaluation. Logs connectors provide raw application logs for artifact-scanning criteria (PII detection, secure logging, etc.).
addConnector
Adds a connector to the system.
| Parameter | Type | Required | Description |
|---|---|---|---|
system_id | str | Required | Unique system identifier |
connector | dict | Required | Connector type and configuration |
If a connector of the same category already exists, the exception ConnectorAlreadyExistsError will be raised. Delete the existing one first with deleteConnector().
from mankinds_sdk.connectors import ConnectorConfig
# Observability — log file
client.add_connector(system_id, {
"type": "file",
"config": {"file_path": "./logs/app.log"},
"name": "Application Logs",
})
# Database — PostgreSQL
client.add_connector(system_id, {
"type": "postgresql",
"config": {
"host": "db.example.com",
"port": 5432,
"database": "myapp",
"user": "readonly",
"password": os.environ["DB_PASSWORD"],
},
"name": "Production Database",
})
# Observability — Langfuse (enables online evaluation)
client.add_connector(system_id, {
"type": "langfuse",
"config": {
"public_key": os.environ["LANGFUSE_PUBLIC_KEY"],
"secret_key": os.environ["LANGFUSE_SECRET_KEY"],
"base_url": "https://cloud.langfuse.com",
},
"name": "Langfuse Traces",
})
getConnectors
Lists all connectors for a system.
| Parameter | Type | Required | Description |
|---|---|---|---|
system_id | str | Required | Unique system identifier |
| Returns | Type | Description |
|---|---|---|
name | string | Connector name |
category | string | Connector category (database, observability) |
type | string | Connector type (postgresql, langfuse, file, etc.) |
connectors = client.get_connectors(system_id)
for c in connectors:
print(f"{c['name']} ({c['category']}): {c['type']}")
updateConnector
Updates an existing connector's configuration.
| Parameter | Type | Required | Description |
|---|---|---|---|
system_id | str | Required | Unique system identifier |
connector | dict | Required | Connector type and updated configuration |
result = client.update_connector(system_id, {
"type": "file",
"config": {"file_path": "./logs/new-app.log"},
"name": "Updated Logs",
})
print(f"Connector updated: {result}")
deleteConnector
Deletes a connector from the system.
| Parameter | Type | Required | Description |
|---|---|---|---|
system_id | str | Required | Unique system identifier |
category | str | Required | Connector category to delete (database, observability) |
result = client.delete_connector(system_id, "observability")
print(f"Connector deleted: {result}")
Datasets
generateDataset
Creates and validates a reference scenario dataset. You can provide custom scenarios or request automatic generation.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
system_id | str | Required | — | Unique system identifier |
num_scenarios | int | Optional | 10 | Number of scenarios to auto-generate (ignored if scenarios provided) |
scenarios | list[dict] | Optional | — | Custom scenarios with input (str) and outputs (list) |
| Returns | Type | Description |
|---|---|---|
scenarios | array | List of validated scenarios |
scenarios[].id | string | Unique scenario identifier |
scenarios[].input | object | Input sent to the system |
scenarios[].expected_outputs | array | List of expected responses |
scenarios[].source | string | Scenario origin (user or generated) |
The system description must be validated before generating a dataset. Otherwise, the exception DescriptionNotValidatedError will be raised.
# With custom scenarios
dataset = client.generate_dataset(system_id, scenarios=[
{"input": "Hello, how does this work?", "outputs": ["Welcome! I'm here to help."]},
{"input": "I want a refund", "outputs": ["I'll redirect you to our customer service."]},
])
print(f"{len(dataset['scenarios'])} scenarios validated")
# Automatic generation
dataset = client.generate_dataset(system_id, num_scenarios=20)
print(f"{len(dataset['scenarios'])} scenarios generated")
updateDataset
Updates the dataset with instructions or new scenarios.
| Parameter | Type | Required | Description |
|---|---|---|---|
system_id | str | Required | Unique system identifier |
orientation | str | Optional | Instructions to refine the dataset |
scenarios | list[dict] | Optional | New scenarios to replace existing ones |
# Refine with instructions
dataset = client.update_dataset(
system_id,
orientation="Add more refund request cases"
)
print(f"{len(dataset['scenarios'])} scenarios after update")
The updated and re-validated dataset — same structure as generateDataset (see above).
Evaluations
evaluate
Runs a system evaluation. Supports three modes: offline (scenario-based), online (production traces), or mixed (both).
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
system_id | str | Required | — | Unique system identifier |
mode | str | Optional | offline | Evaluation mode: offline, online, or mixed |
profile | str | Optional | required | Evaluation profile (see table below) |
thematics_config | dict | Optional | — | Custom offline criteria configuration (replaces profile) |
online_thematics_config | dict | Optional | — | Custom online criteria configuration (required for online/mixed mode) |
sampling | dict | Optional | { strategy: "random", size: 20 } | Trace sampling strategy for online evaluation |
time_range | dict | Optional | { preset: "24h" } | Time range for fetching production traces |
wait | bool | Optional | true | Wait for evaluation to complete before returning |
poll_interval | int | Optional | 5 | Seconds between each status check |
on_poll | Callable[[str, int], None] | Optional | — | Callback invoked on each status check (status, elapsed seconds) |
| Returns | Type | Description |
|---|---|---|
run_id | string | Evaluation run identifier |
status | string | completed, failed, running, etc. |
summary.overall_score | number | Overall score as a percentage |
summary.dimensions | object | Detailed scores by dimension (score, passed) |
result = client.evaluate(
system_id,
profile="required",
wait=True,
poll_interval=5,
)
print(f"Status: {result['status']}")
print(f"Overall score: {result['summary']['overall_score']}%")
With
wait=false, onlyrun_idis returned immediately. UsegetEvaluation(runId)to retrieve results later.
Evaluation Profiles
| Profile | Type | Description |
|---|---|---|
required | Scope-based | Required criteria based on the system's regulatory risk analysis |
extended | Scope-based | Extended coverage to anticipate regulatory risks |
minimum | Fixed | Essential evaluation covering core AI safety criteria (8 tests/criterion) |
standard | Fixed | Comprehensive evaluation with extended coverage (15 tests/criterion) |
maximum | Fixed | Full-depth evaluation with maximum coverage (30 tests/criterion) |
Custom Configuration (thematics_config)
For a custom evaluation, use thematics_config instead of profile:
Note: The main key of
thematics_configmust be the exact name of a dimension among those listed below (e.g.,fairness,privacy,security,accuracy, etc.).
result = client.evaluate(
system_id,
thematics_config={
"fairness": {
"gender": {"nb_tests": 5},
"age": {"nb_tests": 5}
},
"accuracy": {
"hallucination_detection": {"nb_tests": 10}
}
},
wait=True,
)
Online Evaluation
Online evaluation analyzes real production traces instead of synthetic scenarios. It requires an observability connector with Traces capability (Langfuse, LangSmith, Datadog, or OpenTelemetry).
# Online-only evaluation on production traces
result = client.evaluate(
system_id,
mode="online",
online_thematics_config={
"security": {
"prompt_injection": {},
"pii_exfiltration": {},
},
"accuracy": {
"hallucination_detection": {},
"factual_grounding": {},
}
},
time_range={"preset": "7d"},
sampling={"strategy": "random", "size": 50},
wait=True,
)
print(f"Online score: {result['summary']['overall_score']}%")
# Mixed evaluation: offline scenarios + online traces
result = client.evaluate(
system_id,
mode="mixed",
thematics_config={
"fairness": {
"gender": {"nb_tests": 10},
"age": {"nb_tests": 10}
}
},
online_thematics_config={
"accuracy": {
"hallucination_detection": {},
"response_completeness": {},
}
},
time_range={"preset": "24h"},
wait=True,
)
Available Dimensions and Criteria
privacy — Data protection and PII handling (8 criteria)
pii_reuse, pii_request, pii_masking_detection, pii_in_logs, pii_in_db, pii_masking_db, pii_masking_logs, refusal_privacy
security — Resistance to attacks and data exfiltration (20 criteria)
Exfiltration: pii_exfiltration, tech_exfiltration, tech_exfiltration_logs, tech_exfiltration_db, internal_exfiltration, internal_exfiltration_logs, internal_exfiltration_db, context_exfiltration, context_exfiltration_db, context_exfiltration_logs, traces_exfiltration, traces_exfiltration_logs, traces_exfiltration_db, refusal_security
Resistance: multiturn_resistance, prompt_injection, social_engineering, obfuscation, context_manipulation
Compliance: ip_copyright_violation, catastrophic_misuse
accuracy — Performance, reliability and factual correctness (15 criteria)
Quality: reproductibility, quality, response_correctness, response_completeness, contextual_coherence
Factual: hallucination_detection, factual_grounding, reformulation_stability
Specialized: classification_accuracy, structured_output_conformity, extraction_accuracy, edge_case_handling
Agentic: tool_call_accuracy, tool_call_f1, agent_goal_accuracy
fairness — Bias detection and equitable treatment (8 criteria)
Bias: age, ethnic, gender, health, identity, religious, socioeconomic
Intersectional: intersectional_bias
explainability — Transparency and decision justification (9 criteria)
justification, purpose_disclosure, ai_nature_disclosure, ai_self_disclosure, control_transparency, ambiguous_scope_clarification, refusal_scope, refusal_nonqualification, limitation_explanation
accountability — Governance, traceability and human oversight (9 criteria)
Oversight: usage_conformity, scope_creep_detection, opt_out_capabilities, decision_override, override_refusal_resistance
Logging: secure_logging_db, secure_logging_logs
Ethics: traceability, human_escalation
sustainability — Environmental efficiency (2 criteria)
db_environmental_efficiency, log_environmental_efficiency
systemic_risk — Societal and systemic threats (15 criteria)
Information integrity: authoritative_confabulation, source_fabrication, expert_impersonation, disinformation_generation, opinion_manipulation
Safety: dangerous_content_refusal, deepfake_assistance, vulnerability_exploitation, malware_generation, attack_planning
Control: confidential_exfiltration, context_knowledge_leakage, scope_override, autonomous_escalation, instruction_resistance
getEvaluation
Retrieves evaluation status or result.
| Parameter | Type | Required | Description |
|---|---|---|---|
run_id | str | Required | Evaluation run identifier |
result = client.get_evaluation(run_id)
print(f"Status: {result['status']}")
if result["status"] == "completed":
print(f"Score: {result['summary']['overall_score']}%")
Same structure as evaluate (see above).
Errors
The SDK throws typed exceptions for easier error handling.
| Exception | Description |
|---|---|
CredentialsError | Missing or invalid API key |
AuthenticationError | Expired or rejected API key (401) |
NotFoundError | Resource not found (404) |
ValidationError | Request validation failed (422) |
RateLimitError | Too many requests (429) |
InvalidEndpointError | Misconfigured endpoint |
EndpointNotConfiguredError | Evaluation without configured endpoint |
DescriptionNotValidatedError | Description not validated |
ConnectorAlreadyExistsError | Connector already exists (same category) |
ServerError | Server error (500) |
Each exception contains contextual information for easier debugging:
from mankinds_sdk.exceptions import ConnectorAlreadyExistsError
try:
client.add_connector(system_id, connector)
except ConnectorAlreadyExistsError as e:
print(f"Connector {e.existing_type} already exists")