# Experiments

An experiment on Mabyduck defines a subjective evaluation study — the type of test, the user
interface, and the data to be evaluated. Each experiment is associated with one or more
[datasets](/api/tutorials/datasets/) and can have multiple **jobs** that define who the raters are and how
much data each rater evaluates.

For a detailed guide on experiment types and configuration, see the
[Experiments documentation](/experiments/).

## Key concepts


```mermaid
graph RL
    Job --> Experiment
    Session --> Job
    Slate --> Session
    Stimulus <-- "Rating" --> Slate
```

- **Experiment** — defines the type of study (e.g., MUSHRA, pairwise comparison) and the datasets to evaluate.
- **Job** — defines the rater pool, session count, and sampling strategy. One experiment can have multiple jobs.
- **Session** — created when a rater participates in a study.
- **Slate** — a group of stimuli evaluated together (e.g., one MUSHRA screen).
- **Rating** — a score assigned to a stimulus within a slate.


## Workflow

The typical workflow for running an experiment via the API is:

1. **Create a dataset** — upload your media files (see the Datasets section).
2. **Create an experiment** — define the experiment type and attach datasets.
3. **Create a job** — set up the rater pool, number of sessions, and strategy.
4. **Get costs** — retrieve a cost estimate and a confirmation token.
5. **Launch** — start the study using the cost token.
6. **Retrieve results** — get slates, ratings, and computed metrics.


### Authentication

All API requests require an API key, which you can generate from your
[project settings](https://app.mabyduck.com).
Include it in the `Authorization` header of every request:


```python
import requests

API_KEY = "YOUR_API_KEY"
PROJECT_ID = "YOUR_PROJECT_ID"

headers = {"Authorization": f"Api-Key {API_KEY}"}
```

### Creating an experiment

> **Prerequisite:** You need at least one dataset in the `ready` state before creating an experiment.
See the Datasets section for instructions on creating a dataset via the API.


To create an experiment, send a `POST` request with the experiment configuration. The easiest
way to build the configuration JSON is to first set up an experiment through the
[Mabyduck UI](https://app.mabyduck.com) and copy its configuration. See also the [experiments API reference](/api/openapi/experiments/experiments_create) for details on the available fields.


```python
DATASET_ID = "YOUR_DATASET_ID"

response = requests.post(
    f"https://api.mabyduck.com/projects/{PROJECT_ID}/experiments/",
    headers=headers,
    json={
        "name": "MUSHRA pilot study",
        "type": "mushra",
        "language": "en",
        "datasets": [DATASET_ID],
        "training_datasets": [],
        "config": {
            "mushra": {
                "labels": ["Excellent", "Good", "Fair", "Poor", "Bad"],
                "showReference": True,
                "hiddenReference": True,
                "showWaveform": "reference",
            }
        },
        "title": "",
        "question": "",
        "description": "",
        "introduction": "",
    },
)

experiment = response.json()
print(experiment["id"])
```

The `type` field determines the experiment type. Available types include `mushra`, `acr_audio`,
`acr_image`, `acr_video`, `pairwise_audio`, `pairwise_image`, `pairwise_video`, `binary_audio`,
`binary_image`, `binary_video`, and [more](/experiments/).

The `config` field is specific to each experiment type. It controls the user interface — labels,
reference display, interaction settings, and so on. The `title`, `question`, and `description`
fields control the instructions displayed at the top of the experiment screen.

### Creating a job

A single experiment can have multiple jobs, each targeting a different rater pool. Before creating
a job, retrieve the available rater pools:


```python
response = requests.get(
    f"https://api.mabyduck.com/projects/{PROJECT_ID}/experiments/{experiment['id']}/rater_pools/",
    headers=headers,
)
rater_pools = response.json()
```

The available rater pools can be seen on the [rater pools API reference page](http://localhost:4000/api/openapi/experiments/experiments_rater_pools_list). Each rater pool has the following structure:


```json
{
    "id": "string",
    "kind": 0,
    "name": "string",
    "label": "string"
}
```

Then create a job with the desired rater pool and parameters:


```python
response = requests.post(
    f"https://api.mabyduck.com/projects/{PROJECT_ID}/experiments/{experiment['id']}/jobs/",
    headers=headers,
    json={
        "rater_pool_id": rater_pools[0]["id"],
        "num_sessions": 10,
        "num_comparisons": 5,
        "num_training": 2,
        "max_repetitions": 1,
        "min_rest_time": 1,
        "strategy": "randomized",
        "note": "",
    },
)

job = response.json()
print(job["id"])
```

Key parameters:

- `num_sessions` — how many times the task is performed (by different raters).
- `num_comparisons` — how many slates each rater evaluates per session.
- `num_training` — training examples shown before the actual evaluation.
- `strategy` — how stimuli are sampled: `randomized`, `lexicographic`, `active`, or `neighbor`.
See [Strategies](/experiments/strategies/) for details.


### Getting job costs

Before launching, retrieve a cost estimate for the job. This also returns a time-limited token
needed to confirm the launch:


```python
response = requests.get(
    f"https://api.mabyduck.com/projects/{PROJECT_ID}/experiments/{experiment['id']}/jobs/{job['id']}/costs/",
    headers=headers,
)
costs = response.json()

print(f"Cost: {costs['cost']} {costs['currency']}")
print(f"Per additional session: {costs['cost_per_additional_session']} {costs['currency']}")
print(f"Token expires in: {costs['token_expires_in']}s")
```

The response includes:

- `cost` — the total cost for the job.
- `cost_per_additional_session` — cost for each additional session.
- `currency` — the currency code.
- `token` — a time-limited token to confirm the cost when launching.


### Launching the job

Launch the job by sending the cost token. This ensures you are charged the quoted price:


```python
response = requests.post(
    f"https://api.mabyduck.com/projects/{PROJECT_ID}/experiments/{experiment['id']}/jobs/{job['id']}/launch/",
    headers=headers,
    json={"token": costs["token"]},
)

print(response.status_code)  # 200 on success
```

Once launched, the job is live and raters can begin participating.

## Retrieving results

After raters complete sessions, you can retrieve the collected data.

**Listing slates**

Slates represent groups of stimuli that were evaluated together (e.g., one MUSHRA screen).
Each slate contains ratings for the stimuli it presented:


```python
response = requests.get(
    f"https://api.mabyduck.com/projects/{PROJECT_ID}/experiments/{experiment['id']}/slates/",
    headers=headers,
)

slates = response.json()
print(f"Total slates: {len(slates)}")

for slate in slates[:3]:
    for rating in slate["ratings"]:
        print(f"  {rating['stimulus']}: {rating['score']}")
```

**Listing results and metrics**

The results endpoint returns computed metrics for your experiment, such as mean opinion scores
(MOS) or Elo ratings:


```python
response = requests.get(
    f"https://api.mabyduck.com/projects/{PROJECT_ID}/experiments/{experiment['id']}/results/",
    headers=headers,
)

results = response.json()
for result in results:
    print(f"Metric: {result['kind']}")
    for score in result["scores"]:
        print(f"  {score}")
```

You can filter results by metric type using the `metric` query parameter (e.g., `?metric=elo`
or `?metric=mean`). You can also retrieve project-wide metrics across multiple experiments.

## Related resources

- [Datasets documentation](/datasets/) — structuring your media files
- [Creating datasets via the API](/api/tutorials/datasets/) — uploading media files via the API
- [Experiment types](/experiments/) — choosing and configuring experiment types
- [Sampling strategies](/experiments/strategies/) — controlling how stimuli are presented
- [Elo metrics](/experiments/metrics/elo) — understanding Elo-based scoring
- [Experiment API reference](/api/openapi/experiments/) — detailed API endpoints and schemas