Mankinds SDK

Choose your language:

Integrate AI governance into your applications. The Mankinds SDK provides the tools to create systems, connect your data sources, generate test datasets, and run evaluations across 8 trust dimensions — programmable in JavaScript/TypeScript and Python.

from mankinds_sdk import MankindsClient
import os

client = MankindsClient(api_key=os.environ["MANKINDS_API_KEY"])

# Create a system, generate a dataset, evaluate
system = client.create_system("My AI Assistant", "...", endpoint={...})
client.generate_dataset(system["id"], num_scenarios=10)
result = client.evaluate(system["id"])
print(f"Score: {result['summary']['overall_score']}%")

Getting Started

A few steps to integrate AI evaluation into your project.

Install the SDK

pip install mankinds-sdk

Get your API key

Create an account on app.mankinds.io and generate your API key from settings.

Initialize the client

import os
from mankinds_sdk import MankindsClient

client = MankindsClient(api_key=os.environ["MANKINDS_API_KEY"])

Run your first evaluation

Create a system, generate a test dataset, then run the evaluation:

# 1. Create a system with its endpoint
system = client.create_system(
  "GDPR Compliance Assistant",
  "Chatbot that advises companies on their GDPR obligations.",
  endpoint={
      "url": "https://api.example.com/chat",
      "method": "POST",
      "body": {"message": "{{input}}"},
      "response": {"reply": "{{output}}"}
  }
)

# 2. Generate a test dataset (10 scenarios)
dataset = client.generate_dataset(system["id"], num_scenarios=10)

# 3. Run the evaluation
result = client.evaluate(system["id"])
print(f"Overall score: {result['summary']['overall_score']}%")

Features

Feature	Description
Systems	Create, configure and manage your AI systems with automatic description validation
Connectors	Connect databases (PostgreSQL, MySQL, MongoDB, Snowflake, SQLite, SQL Server), observability tools (Datadog, Langfuse, LangSmith, Elasticsearch, Splunk, CloudWatch, OpenTelemetry, MLflow, OpenSearch) and log files
Datasets	Provide your reference scenarios or auto-generate them with AI
Evaluations	Run offline or online evaluations across 86 criteria in 8 dimensions and retrieve scores
Authentication	Secure API key with fine-grained permission management

API Reference

MankindsClient

The main client for interacting with the Mankinds API.

Constructor

MankindsClient(api_key: str, base_url: str = None, timeout: int = 30)

Parameter	Type	Required	Default	Description
`api_key`	`str`	Required	—	Mankinds API key (format: mk_...)
`base_url`	`str`	Optional	`https://app.mankinds.io`	API base URL
`timeout`	`int`	Optional	`120`	Request timeout in seconds

Systems

createSystem

Creates a new AI system and automatically validates its description.

Parameter	Type	Required	Description
`name`	`str`	Required	AI system name
`description`	`str`	Required	Description of the expected system behavior
`endpoint`	`dict`	Required	API endpoint configuration (url, method, body, response)

Returns	Type	Description
`id`	`string`	Unique identifier of the created system
`success`	`boolean`	true if the description was validated
`recommendations`	`array`	Recommendations to improve the description

Endpoint configuration

In your endpoint's body and response configurations, use the placeholders {{input}} and {{output}} to dynamically inject test scenarios.

system = client.create_system(
  "GDPR Compliance Assistant",
  "Chatbot that advises companies on their GDPR obligations and guides compliance.",
  endpoint={
      "url": "https://api.example.com/chat",
      "method": "POST",
      "body": {"message": "{{input}}"},
      "response": {"reply": "{{output}}"}
  }
)

print(f"System created: {system['id']}")
print(f"Description validated: {system['success']}")

getSystem

Retrieves system details.

Parameter	Type	Required	Description
`system_id`	`str`	Required	Unique system identifier

Returns	Type	Description
`id`	`string`	Unique identifier of the system
`name`	`string`	System name
`description`	`string`	System description
`is_description_validated`	`boolean`	true if the description has been validated
`endpoint`	`object`	Endpoint configuration

system = client.get_system(system_id)

print(f"Name: {system['name']}")
print(f"Description validated: {system['is_description_validated']}")
print(f"Endpoint: {system['endpoint']}")

updateSystem

Updates an existing system.

Parameter	Type	Required	Description
`system_id`	`str`	Required	Unique system identifier
`name`	`str`	Optional	New system name
`description`	`str`	Optional	New description (triggers re-validation)
`endpoint`	`dict`	Optional	New endpoint configuration

Returns	Type	Description
`success`	`boolean`	true if the update was successful
`recommendations`	`array`	Recommendations if description was re-validated

Endpoint validation

If you provide an endpoint, it must contain the required fields: url, method, body, response. Otherwise, the exception InvalidEndpointError will be raised.

result = client.update_system(
  system_id,
  name="GDPR Compliance Assistant v2",
  description="Improved version with DPO questions support."
)

print(f"Validated: {result['success']}")

If a new description is provided, it is automatically re-validated. On failure, DescriptionNotValidatedError is raised with recommendations.

Connectors

Connectors allow you to connect your data sources to enable evaluations. There are three categories: database, observability, and document connectors.

One connector per category

Each system can only have one connector per category (database, observability).

Available Types

Database connectors

Connector	Description	Configuration
`postgresql`	PostgreSQL database	`host`, `port`, `database`, `user`, `password`
`mysql`	MySQL database	`host`, `port`, `database`, `user`, `password`
`mongodb`	MongoDB database	`host`, `port`, `database`, `user`, `password`, `authSource`
`sqlserver`	SQL Server database	`host`, `port`, `database`, `user`, `password`, `instance`
`sqlite`	SQLite database file	`file_path`, `file_name`
`snowflake`	Snowflake data warehouse	`account`, `user`, `password`, `warehouse`, `database`, `schema`

Observability connectors

Connector	Description	Capabilities
`file`	Log files (.log, .txt, .json)	Logs
`datadog`	Datadog logs and traces	Logs, Traces
`elasticsearch`	Elasticsearch logs and traces	Logs, Traces (with trace mapping)
`cloudwatch`	AWS CloudWatch logs	Logs
`opensearch`	OpenSearch logs	Logs
`splunk`	Splunk logs	Logs
`opentelemetry`	OpenTelemetry traces	Logs, Traces
`mlflow`	MLflow experiment tracking	Logs
`langfuse`	Langfuse LLM traces	Traces
`langsmith`	LangSmith LLM traces	Traces

Traces vs Logs

Connectors with Traces capability (Langfuse, LangSmith, Datadog, OpenTelemetry) can provide system input/output pairs, which enables online evaluation. Logs connectors provide raw application logs for artifact-scanning criteria (PII detection, secure logging, etc.).

addConnector

Adds a connector to the system.

Parameter	Type	Required	Description
`system_id`	`str`	Required	Unique system identifier
`connector`	`dict`	Required	Connector type and configuration

One connector per category

If a connector of the same category already exists, the exception ConnectorAlreadyExistsError will be raised. Delete the existing one first with deleteConnector().

from mankinds_sdk.connectors import ConnectorConfig

# Observability — log file
client.add_connector(system_id, {
  "type": "file",
  "config": {"file_path": "./logs/app.log"},
  "name": "Application Logs",
})

# Database — PostgreSQL
client.add_connector(system_id, {
  "type": "postgresql",
  "config": {
      "host": "db.example.com",
      "port": 5432,
      "database": "myapp",
      "user": "readonly",
      "password": os.environ["DB_PASSWORD"],
  },
  "name": "Production Database",
})

# Observability — Langfuse (enables online evaluation)
client.add_connector(system_id, {
  "type": "langfuse",
  "config": {
      "public_key": os.environ["LANGFUSE_PUBLIC_KEY"],
      "secret_key": os.environ["LANGFUSE_SECRET_KEY"],
      "base_url": "https://cloud.langfuse.com",
  },
  "name": "Langfuse Traces",
})

getConnectors

Lists all connectors for a system.

Parameter	Type	Required	Description
`system_id`	`str`	Required	Unique system identifier

Returns	Type	Description
`name`	`string`	Connector name
`category`	`string`	Connector category (database, observability)
`type`	`string`	Connector type (postgresql, langfuse, file, etc.)

connectors = client.get_connectors(system_id)

for c in connectors:
  print(f"{c['name']} ({c['category']}): {c['type']}")

updateConnector

Updates an existing connector's configuration.

Parameter	Type	Required	Description
`system_id`	`str`	Required	Unique system identifier
`connector`	`dict`	Required	Connector type and updated configuration

result = client.update_connector(system_id, {
  "type": "file",
  "config": {"file_path": "./logs/new-app.log"},
  "name": "Updated Logs",
})
print(f"Connector updated: {result}")

deleteConnector

Deletes a connector from the system.

Parameter	Type	Required	Description
`system_id`	`str`	Required	Unique system identifier
`category`	`str`	Required	Connector category to delete (database, observability)

result = client.delete_connector(system_id, "observability")
print(f"Connector deleted: {result}")

Datasets

generateDataset

Creates and validates a reference scenario dataset. You can provide custom scenarios or request automatic generation.

Parameter	Type	Required	Default	Description
`system_id`	`str`	Required	—	Unique system identifier
`num_scenarios`	`int`	Optional	`10`	Number of scenarios to auto-generate (ignored if scenarios provided)
`scenarios`	`list[dict]`	Optional	—	Custom scenarios with input (str) and outputs (list)

Returns	Type	Description
`scenarios`	`array`	List of validated scenarios
`scenarios[].id`	`string`	Unique scenario identifier
`scenarios[].input`	`object`	Input sent to the system
`scenarios[].expected_outputs`	`array`	List of expected responses
`scenarios[].source`	`string`	Scenario origin (user or generated)

Validated description required

The system description must be validated before generating a dataset. Otherwise, the exception DescriptionNotValidatedError will be raised.

# With custom scenarios
dataset = client.generate_dataset(system_id, scenarios=[
  {"input": "Hello, how does this work?", "outputs": ["Welcome! I'm here to help."]},
  {"input": "I want a refund", "outputs": ["I'll redirect you to our customer service."]},
])

print(f"{len(dataset['scenarios'])} scenarios validated")

# Automatic generation
dataset = client.generate_dataset(system_id, num_scenarios=20)
print(f"{len(dataset['scenarios'])} scenarios generated")

updateDataset

Updates the dataset with instructions or new scenarios.

Parameter	Type	Required	Description
`system_id`	`str`	Required	Unique system identifier
`orientation`	`str`	Optional	Instructions to refine the dataset
`scenarios`	`list[dict]`	Optional	New scenarios to replace existing ones

# Refine with instructions
dataset = client.update_dataset(
  system_id,
  orientation="Add more refund request cases"
)
print(f"{len(dataset['scenarios'])} scenarios after update")

Returns

The updated and re-validated dataset — same structure as generateDataset (see above).

Evaluations

evaluate

Runs a system evaluation. Supports three modes: offline (scenario-based), online (production traces), or mixed (both).

Parameter	Type	Required	Default	Description
`system_id`	`str`	Required	—	Unique system identifier
`mode`	`str`	Optional	`offline`	Evaluation mode: offline, online, or mixed
`profile`	`str`	Optional	`required`	Evaluation profile (see table below)
`thematics_config`	`dict`	Optional	—	Custom offline criteria configuration (replaces profile)
`online_thematics_config`	`dict`	Optional	—	Custom online criteria configuration (required for online/mixed mode)
`sampling`	`dict`	Optional	`{ strategy: "random", size: 20 }`	Trace sampling strategy for online evaluation
`time_range`	`dict`	Optional	`{ preset: "24h" }`	Time range for fetching production traces
`wait`	`bool`	Optional	`true`	Wait for evaluation to complete before returning
`poll_interval`	`int`	Optional	`5`	Seconds between each status check
`on_poll`	`Callable[[str, int], None]`	Optional	—	Callback invoked on each status check (status, elapsed seconds)

Returns	Type	Description
`run_id`	`string`	Evaluation run identifier
`status`	`string`	completed, failed, running, etc.
`summary.overall_score`	`number`	Overall score as a percentage
`summary.dimensions`	`object`	Detailed scores by dimension (score, passed)

result = client.evaluate(
  system_id,
  profile="required",
  wait=True,
  poll_interval=5,
)

print(f"Status: {result['status']}")
print(f"Overall score: {result['summary']['overall_score']}%")

With wait=false, only run_id is returned immediately. Use getEvaluation(runId) to retrieve results later.

Evaluation Profiles

Profile	Type	Description
`required`	Scope-based	Required criteria based on the system's regulatory risk analysis
`extended`	Scope-based	Extended coverage to anticipate regulatory risks
`minimum`	Fixed	Essential evaluation covering core AI safety criteria (8 tests/criterion)
`standard`	Fixed	Comprehensive evaluation with extended coverage (15 tests/criterion)
`maximum`	Fixed	Full-depth evaluation with maximum coverage (30 tests/criterion)

Custom Configuration (thematics_config)

For a custom evaluation, use thematics_config instead of profile:

Note: The main key of thematics_config must be the exact name of a dimension among those listed below (e.g., fairness, privacy, security, accuracy, etc.).

result = client.evaluate(
  system_id,
  thematics_config={
      "fairness": {
          "gender": {"nb_tests": 5},
          "age": {"nb_tests": 5}
      },
      "accuracy": {
          "hallucination_detection": {"nb_tests": 10}
      }
  },
  wait=True,
)

Online Evaluation

Online evaluation analyzes real production traces instead of synthetic scenarios. It requires an observability connector with Traces capability (Langfuse, LangSmith, Datadog, or OpenTelemetry).

# Online-only evaluation on production traces
result = client.evaluate(
  system_id,
  mode="online",
  online_thematics_config={
      "security": {
          "prompt_injection": {},
          "pii_exfiltration": {},
      },
      "accuracy": {
          "hallucination_detection": {},
          "factual_grounding": {},
      }
  },
  time_range={"preset": "7d"},
  sampling={"strategy": "random", "size": 50},
  wait=True,
)

print(f"Online score: {result['summary']['overall_score']}%")

# Mixed evaluation: offline scenarios + online traces
result = client.evaluate(
  system_id,
  mode="mixed",
  thematics_config={
      "fairness": {
          "gender": {"nb_tests": 10},
          "age": {"nb_tests": 10}
      }
  },
  online_thematics_config={
      "accuracy": {
          "hallucination_detection": {},
          "response_completeness": {},
      }
  },
  time_range={"preset": "24h"},
  wait=True,
)

Available Dimensions and Criteria

privacy — Data protection and PII handling (8 criteria)

pii_reuse, pii_request, pii_masking_detection, pii_in_logs, pii_in_db, pii_masking_db, pii_masking_logs, refusal_privacy

security — Resistance to attacks and data exfiltration (20 criteria)

Exfiltration: pii_exfiltration, tech_exfiltration, tech_exfiltration_logs, tech_exfiltration_db, internal_exfiltration, internal_exfiltration_logs, internal_exfiltration_db, context_exfiltration, context_exfiltration_db, context_exfiltration_logs, traces_exfiltration, traces_exfiltration_logs, traces_exfiltration_db, refusal_security

Resistance: multiturn_resistance, prompt_injection, social_engineering, obfuscation, context_manipulation

Compliance: ip_copyright_violation, catastrophic_misuse

accuracy — Performance, reliability and factual correctness (15 criteria)

Quality: reproductibility, quality, response_correctness, response_completeness, contextual_coherence

Factual: hallucination_detection, factual_grounding, reformulation_stability

Specialized: classification_accuracy, structured_output_conformity, extraction_accuracy, edge_case_handling

Agentic: tool_call_accuracy, tool_call_f1, agent_goal_accuracy

fairness — Bias detection and equitable treatment (8 criteria)

Bias: age, ethnic, gender, health, identity, religious, socioeconomic

Intersectional: intersectional_bias

explainability — Transparency and decision justification (9 criteria)

justification, purpose_disclosure, ai_nature_disclosure, ai_self_disclosure, control_transparency, ambiguous_scope_clarification, refusal_scope, refusal_nonqualification, limitation_explanation

accountability — Governance, traceability and human oversight (9 criteria)

Oversight: usage_conformity, scope_creep_detection, opt_out_capabilities, decision_override, override_refusal_resistance

Logging: secure_logging_db, secure_logging_logs

Ethics: traceability, human_escalation

sustainability — Environmental efficiency (2 criteria)

db_environmental_efficiency, log_environmental_efficiency

systemic_risk — Societal and systemic threats (15 criteria)

Information integrity: authoritative_confabulation, source_fabrication, expert_impersonation, disinformation_generation, opinion_manipulation

Safety: dangerous_content_refusal, deepfake_assistance, vulnerability_exploitation, malware_generation, attack_planning

Control: confidential_exfiltration, context_knowledge_leakage, scope_override, autonomous_escalation, instruction_resistance

getEvaluation

Retrieves evaluation status or result.

Parameter	Type	Required	Description
`run_id`	`str`	Required	Evaluation run identifier

result = client.get_evaluation(run_id)

print(f"Status: {result['status']}")
if result["status"] == "completed":
  print(f"Score: {result['summary']['overall_score']}%")

Returns

Same structure as evaluate (see above).

Errors

The SDK throws typed exceptions for easier error handling.

Exception	Description
`CredentialsError`	Missing or invalid API key
`AuthenticationError`	Expired or rejected API key (401)
`NotFoundError`	Resource not found (404)
`ValidationError`	Request validation failed (422)
`RateLimitError`	Too many requests (429)
`InvalidEndpointError`	Misconfigured endpoint
`EndpointNotConfiguredError`	Evaluation without configured endpoint
`DescriptionNotValidatedError`	Description not validated
`ConnectorAlreadyExistsError`	Connector already exists (same category)
`ServerError`	Server error (500)

Each exception contains contextual information for easier debugging:

from mankinds_sdk.exceptions import ConnectorAlreadyExistsError

try:
  client.add_connector(system_id, connector)
except ConnectorAlreadyExistsError as e:
  print(f"Connector {e.existing_type} already exists")

Was this page helpful?

Getting Started​

Features​

API Reference​

MankindsClient​

Constructor​

Systems​

createSystem​

getSystem​

updateSystem​

Connectors​

Available Types​

addConnector​

getConnectors​

updateConnector​

deleteConnector​

Datasets​

generateDataset​

updateDataset​

Evaluations​

evaluate​

Evaluation Profiles​

Custom Configuration (thematics_config)​

Online Evaluation​

Available Dimensions and Criteria​

getEvaluation​

Errors​