Skip to main content

Mankinds SDK

Choose your language:

Integrate AI governance into your applications. The Mankinds SDK provides the tools to create systems, connect your data sources, generate test datasets, and run evaluations across 8 trust dimensions — programmable in JavaScript/TypeScript and Python.

from mankinds_sdk import MankindsClient
import os

client = MankindsClient(api_key=os.environ["MANKINDS_API_KEY"])

# Create a system, generate a dataset, evaluate
system = client.create_system("My AI Assistant", "...", endpoint={...})
client.generate_dataset(system["id"], num_scenarios=10)
result = client.evaluate(system["id"])
print(f"Score: {result['summary']['overall_score']}%")

Getting Started

A few steps to integrate AI evaluation into your project.

Install the SDK
pip install mankinds-sdk
Get your API key

Create an account on app.mankinds.io and generate your API key from settings.

Initialize the client
import os
from mankinds_sdk import MankindsClient

client = MankindsClient(api_key=os.environ["MANKINDS_API_KEY"])
Run your first evaluation

Create a system, generate a test dataset, then run the evaluation:

# 1. Create a system with its endpoint
system = client.create_system(
"GDPR Compliance Assistant",
"Chatbot that advises companies on their GDPR obligations.",
endpoint={
"url": "https://api.example.com/chat",
"method": "POST",
"body": {"message": "{{input}}"},
"response": {"reply": "{{output}}"}
}
)

# 2. Generate a test dataset (10 scenarios)
dataset = client.generate_dataset(system["id"], num_scenarios=10)

# 3. Run the evaluation
result = client.evaluate(system["id"])
print(f"Overall score: {result['summary']['overall_score']}%")

Features

FeatureDescription
SystemsCreate, configure and manage your AI systems with automatic description validation
ConnectorsConnect databases (PostgreSQL, MySQL, MongoDB, Snowflake, SQLite, SQL Server), observability tools (Datadog, Langfuse, LangSmith, Elasticsearch, Splunk, CloudWatch, OpenTelemetry, MLflow, OpenSearch) and log files
DatasetsProvide your reference scenarios or auto-generate them with AI
EvaluationsRun offline or online evaluations across 86 criteria in 8 dimensions and retrieve scores
AuthenticationSecure API key with fine-grained permission management

API Reference

MankindsClient

The main client for interacting with the Mankinds API.

Constructor

MankindsClient(api_key: str, base_url: str = None, timeout: int = 30)
ParameterTypeRequiredDefaultDescription
api_keystrRequiredMankinds API key (format: mk_...)
base_urlstrOptionalhttps://app.mankinds.ioAPI base URL
timeoutintOptional120Request timeout in seconds

Systems

createSystem

Creates a new AI system and automatically validates its description.

ParameterTypeRequiredDescription
namestrRequiredAI system name
descriptionstrRequiredDescription of the expected system behavior
endpointdictRequiredAPI endpoint configuration (url, method, body, response)
ReturnsTypeDescription
idstringUnique identifier of the created system
successbooleantrue if the description was validated
recommendationsarrayRecommendations to improve the description
Endpoint configuration

In your endpoint's body and response configurations, use the placeholders {{input}} and {{output}} to dynamically inject test scenarios.

system = client.create_system(
"GDPR Compliance Assistant",
"Chatbot that advises companies on their GDPR obligations and guides compliance.",
endpoint={
"url": "https://api.example.com/chat",
"method": "POST",
"body": {"message": "{{input}}"},
"response": {"reply": "{{output}}"}
}
)

print(f"System created: {system['id']}")
print(f"Description validated: {system['success']}")

getSystem

Retrieves system details.

ParameterTypeRequiredDescription
system_idstrRequiredUnique system identifier
ReturnsTypeDescription
idstringUnique identifier of the system
namestringSystem name
descriptionstringSystem description
is_description_validatedbooleantrue if the description has been validated
endpointobjectEndpoint configuration
system = client.get_system(system_id)

print(f"Name: {system['name']}")
print(f"Description validated: {system['is_description_validated']}")
print(f"Endpoint: {system['endpoint']}")

updateSystem

Updates an existing system.

ParameterTypeRequiredDescription
system_idstrRequiredUnique system identifier
namestrOptionalNew system name
descriptionstrOptionalNew description (triggers re-validation)
endpointdictOptionalNew endpoint configuration
ReturnsTypeDescription
successbooleantrue if the update was successful
recommendationsarrayRecommendations if description was re-validated
Endpoint validation

If you provide an endpoint, it must contain the required fields: url, method, body, response. Otherwise, the exception InvalidEndpointError will be raised.

result = client.update_system(
system_id,
name="GDPR Compliance Assistant v2",
description="Improved version with DPO questions support."
)

print(f"Validated: {result['success']}")

If a new description is provided, it is automatically re-validated. On failure, DescriptionNotValidatedError is raised with recommendations.


Connectors

Connectors allow you to connect your data sources to enable evaluations. There are three categories: database, observability, and document connectors.

One connector per category

Each system can only have one connector per category (database, observability).

Available Types

Database connectors

ConnectorDescriptionConfiguration
postgresqlPostgreSQL databasehost, port, database, user, password
mysqlMySQL databasehost, port, database, user, password
mongodbMongoDB databasehost, port, database, user, password, authSource
sqlserverSQL Server databasehost, port, database, user, password, instance
sqliteSQLite database filefile_path, file_name
snowflakeSnowflake data warehouseaccount, user, password, warehouse, database, schema

Observability connectors

ConnectorDescriptionCapabilities
fileLog files (.log, .txt, .json)Logs
datadogDatadog logs and tracesLogs, Traces
elasticsearchElasticsearch logs and tracesLogs, Traces (with trace mapping)
cloudwatchAWS CloudWatch logsLogs
opensearchOpenSearch logsLogs
splunkSplunk logsLogs
opentelemetryOpenTelemetry tracesLogs, Traces
mlflowMLflow experiment trackingLogs
langfuseLangfuse LLM tracesTraces
langsmithLangSmith LLM tracesTraces
Traces vs Logs

Connectors with Traces capability (Langfuse, LangSmith, Datadog, OpenTelemetry) can provide system input/output pairs, which enables online evaluation. Logs connectors provide raw application logs for artifact-scanning criteria (PII detection, secure logging, etc.).

addConnector

Adds a connector to the system.

ParameterTypeRequiredDescription
system_idstrRequiredUnique system identifier
connectordictRequiredConnector type and configuration
One connector per category

If a connector of the same category already exists, the exception ConnectorAlreadyExistsError will be raised. Delete the existing one first with deleteConnector().

from mankinds_sdk.connectors import ConnectorConfig

# Observability — log file
client.add_connector(system_id, {
"type": "file",
"config": {"file_path": "./logs/app.log"},
"name": "Application Logs",
})

# Database — PostgreSQL
client.add_connector(system_id, {
"type": "postgresql",
"config": {
"host": "db.example.com",
"port": 5432,
"database": "myapp",
"user": "readonly",
"password": os.environ["DB_PASSWORD"],
},
"name": "Production Database",
})

# Observability — Langfuse (enables online evaluation)
client.add_connector(system_id, {
"type": "langfuse",
"config": {
"public_key": os.environ["LANGFUSE_PUBLIC_KEY"],
"secret_key": os.environ["LANGFUSE_SECRET_KEY"],
"base_url": "https://cloud.langfuse.com",
},
"name": "Langfuse Traces",
})

getConnectors

Lists all connectors for a system.

ParameterTypeRequiredDescription
system_idstrRequiredUnique system identifier
ReturnsTypeDescription
namestringConnector name
categorystringConnector category (database, observability)
typestringConnector type (postgresql, langfuse, file, etc.)
connectors = client.get_connectors(system_id)

for c in connectors:
print(f"{c['name']} ({c['category']}): {c['type']}")

updateConnector

Updates an existing connector's configuration.

ParameterTypeRequiredDescription
system_idstrRequiredUnique system identifier
connectordictRequiredConnector type and updated configuration
result = client.update_connector(system_id, {
"type": "file",
"config": {"file_path": "./logs/new-app.log"},
"name": "Updated Logs",
})
print(f"Connector updated: {result}")

deleteConnector

Deletes a connector from the system.

ParameterTypeRequiredDescription
system_idstrRequiredUnique system identifier
categorystrRequiredConnector category to delete (database, observability)
result = client.delete_connector(system_id, "observability")
print(f"Connector deleted: {result}")

Datasets

generateDataset

Creates and validates a reference scenario dataset. You can provide custom scenarios or request automatic generation.

ParameterTypeRequiredDefaultDescription
system_idstrRequiredUnique system identifier
num_scenariosintOptional10Number of scenarios to auto-generate (ignored if scenarios provided)
scenarioslist[dict]OptionalCustom scenarios with input (str) and outputs (list)
ReturnsTypeDescription
scenariosarrayList of validated scenarios
scenarios[].idstringUnique scenario identifier
scenarios[].inputobjectInput sent to the system
scenarios[].expected_outputsarrayList of expected responses
scenarios[].sourcestringScenario origin (user or generated)
Validated description required

The system description must be validated before generating a dataset. Otherwise, the exception DescriptionNotValidatedError will be raised.

# With custom scenarios
dataset = client.generate_dataset(system_id, scenarios=[
{"input": "Hello, how does this work?", "outputs": ["Welcome! I'm here to help."]},
{"input": "I want a refund", "outputs": ["I'll redirect you to our customer service."]},
])

print(f"{len(dataset['scenarios'])} scenarios validated")
# Automatic generation
dataset = client.generate_dataset(system_id, num_scenarios=20)
print(f"{len(dataset['scenarios'])} scenarios generated")

updateDataset

Updates the dataset with instructions or new scenarios.

ParameterTypeRequiredDescription
system_idstrRequiredUnique system identifier
orientationstrOptionalInstructions to refine the dataset
scenarioslist[dict]OptionalNew scenarios to replace existing ones
# Refine with instructions
dataset = client.update_dataset(
system_id,
orientation="Add more refund request cases"
)
print(f"{len(dataset['scenarios'])} scenarios after update")
Returns

The updated and re-validated dataset — same structure as generateDataset (see above).


Evaluations

evaluate

Runs a system evaluation. Supports three modes: offline (scenario-based), online (production traces), or mixed (both).

ParameterTypeRequiredDefaultDescription
system_idstrRequiredUnique system identifier
modestrOptionalofflineEvaluation mode: offline, online, or mixed
profilestrOptionalrequiredEvaluation profile (see table below)
thematics_configdictOptionalCustom offline criteria configuration (replaces profile)
online_thematics_configdictOptionalCustom online criteria configuration (required for online/mixed mode)
samplingdictOptional{ strategy: "random", size: 20 }Trace sampling strategy for online evaluation
time_rangedictOptional{ preset: "24h" }Time range for fetching production traces
waitboolOptionaltrueWait for evaluation to complete before returning
poll_intervalintOptional5Seconds between each status check
on_pollCallable[[str, int], None]OptionalCallback invoked on each status check (status, elapsed seconds)
ReturnsTypeDescription
run_idstringEvaluation run identifier
statusstringcompleted, failed, running, etc.
summary.overall_scorenumberOverall score as a percentage
summary.dimensionsobjectDetailed scores by dimension (score, passed)
result = client.evaluate(
system_id,
profile="required",
wait=True,
poll_interval=5,
)

print(f"Status: {result['status']}")
print(f"Overall score: {result['summary']['overall_score']}%")

With wait=false, only run_id is returned immediately. Use getEvaluation(runId) to retrieve results later.

Evaluation Profiles

ProfileTypeDescription
requiredScope-basedRequired criteria based on the system's regulatory risk analysis
extendedScope-basedExtended coverage to anticipate regulatory risks
minimumFixedEssential evaluation covering core AI safety criteria (8 tests/criterion)
standardFixedComprehensive evaluation with extended coverage (15 tests/criterion)
maximumFixedFull-depth evaluation with maximum coverage (30 tests/criterion)

Custom Configuration (thematics_config)

For a custom evaluation, use thematics_config instead of profile:

Note: The main key of thematics_config must be the exact name of a dimension among those listed below (e.g., fairness, privacy, security, accuracy, etc.).

result = client.evaluate(
system_id,
thematics_config={
"fairness": {
"gender": {"nb_tests": 5},
"age": {"nb_tests": 5}
},
"accuracy": {
"hallucination_detection": {"nb_tests": 10}
}
},
wait=True,
)

Online Evaluation

Online evaluation analyzes real production traces instead of synthetic scenarios. It requires an observability connector with Traces capability (Langfuse, LangSmith, Datadog, or OpenTelemetry).

# Online-only evaluation on production traces
result = client.evaluate(
system_id,
mode="online",
online_thematics_config={
"security": {
"prompt_injection": {},
"pii_exfiltration": {},
},
"accuracy": {
"hallucination_detection": {},
"factual_grounding": {},
}
},
time_range={"preset": "7d"},
sampling={"strategy": "random", "size": 50},
wait=True,
)

print(f"Online score: {result['summary']['overall_score']}%")
# Mixed evaluation: offline scenarios + online traces
result = client.evaluate(
system_id,
mode="mixed",
thematics_config={
"fairness": {
"gender": {"nb_tests": 10},
"age": {"nb_tests": 10}
}
},
online_thematics_config={
"accuracy": {
"hallucination_detection": {},
"response_completeness": {},
}
},
time_range={"preset": "24h"},
wait=True,
)

Available Dimensions and Criteria

privacy — Data protection and PII handling (8 criteria)

pii_reuse, pii_request, pii_masking_detection, pii_in_logs, pii_in_db, pii_masking_db, pii_masking_logs, refusal_privacy

security — Resistance to attacks and data exfiltration (20 criteria)

Exfiltration: pii_exfiltration, tech_exfiltration, tech_exfiltration_logs, tech_exfiltration_db, internal_exfiltration, internal_exfiltration_logs, internal_exfiltration_db, context_exfiltration, context_exfiltration_db, context_exfiltration_logs, traces_exfiltration, traces_exfiltration_logs, traces_exfiltration_db, refusal_security

Resistance: multiturn_resistance, prompt_injection, social_engineering, obfuscation, context_manipulation

Compliance: ip_copyright_violation, catastrophic_misuse

accuracy — Performance, reliability and factual correctness (15 criteria)

Quality: reproductibility, quality, response_correctness, response_completeness, contextual_coherence

Factual: hallucination_detection, factual_grounding, reformulation_stability

Specialized: classification_accuracy, structured_output_conformity, extraction_accuracy, edge_case_handling

Agentic: tool_call_accuracy, tool_call_f1, agent_goal_accuracy

fairness — Bias detection and equitable treatment (8 criteria)

Bias: age, ethnic, gender, health, identity, religious, socioeconomic

Intersectional: intersectional_bias

explainability — Transparency and decision justification (9 criteria)

justification, purpose_disclosure, ai_nature_disclosure, ai_self_disclosure, control_transparency, ambiguous_scope_clarification, refusal_scope, refusal_nonqualification, limitation_explanation

accountability — Governance, traceability and human oversight (9 criteria)

Oversight: usage_conformity, scope_creep_detection, opt_out_capabilities, decision_override, override_refusal_resistance

Logging: secure_logging_db, secure_logging_logs

Ethics: traceability, human_escalation

sustainability — Environmental efficiency (2 criteria)

db_environmental_efficiency, log_environmental_efficiency

systemic_risk — Societal and systemic threats (15 criteria)

Information integrity: authoritative_confabulation, source_fabrication, expert_impersonation, disinformation_generation, opinion_manipulation

Safety: dangerous_content_refusal, deepfake_assistance, vulnerability_exploitation, malware_generation, attack_planning

Control: confidential_exfiltration, context_knowledge_leakage, scope_override, autonomous_escalation, instruction_resistance

getEvaluation

Retrieves evaluation status or result.

ParameterTypeRequiredDescription
run_idstrRequiredEvaluation run identifier
result = client.get_evaluation(run_id)

print(f"Status: {result['status']}")
if result["status"] == "completed":
print(f"Score: {result['summary']['overall_score']}%")
Returns

Same structure as evaluate (see above).


Errors

The SDK throws typed exceptions for easier error handling.

ExceptionDescription
CredentialsErrorMissing or invalid API key
AuthenticationErrorExpired or rejected API key (401)
NotFoundErrorResource not found (404)
ValidationErrorRequest validation failed (422)
RateLimitErrorToo many requests (429)
InvalidEndpointErrorMisconfigured endpoint
EndpointNotConfiguredErrorEvaluation without configured endpoint
DescriptionNotValidatedErrorDescription not validated
ConnectorAlreadyExistsErrorConnector already exists (same category)
ServerErrorServer error (500)

Each exception contains contextual information for easier debugging:

from mankinds_sdk.exceptions import ConnectorAlreadyExistsError

try:
client.add_connector(system_id, connector)
except ConnectorAlreadyExistsError as e:
print(f"Connector {e.existing_type} already exists")