# Experiments An experiment on Mabyduck defines a subjective evaluation study — the type of test, the user interface, and the data to be evaluated. Each experiment is associated with one or more [datasets](/api/tutorials/datasets/) and can have multiple **jobs** that define who the raters are and how much data each rater evaluates. For a detailed guide on experiment types and configuration, see the [Experiments documentation](/experiments/). ## Key concepts ```mermaid graph RL Job --> Experiment Session --> Job Slate --> Session Stimulus <-- "Rating" --> Slate ``` - **Experiment** — defines the type of study (e.g., MUSHRA, pairwise comparison) and the datasets to evaluate. - **Job** — defines the rater pool, session count, and sampling strategy. One experiment can have multiple jobs. - **Session** — created when a rater participates in a study. - **Slate** — a group of stimuli evaluated together (e.g., one MUSHRA screen). - **Rating** — a score assigned to a stimulus within a slate. ## Workflow The typical workflow for running an experiment via the API is: 1. **Create a dataset** — upload your media files (see the Datasets section). 2. **Create an experiment** — define the experiment type and attach datasets. 3. **Create a job** — set up the rater pool, number of sessions, and strategy. 4. **Get costs** — retrieve a cost estimate and a confirmation token. 5. **Launch** — start the study using the cost token. 6. **Retrieve results** — get slates, ratings, and computed metrics. ### Authentication All API requests require an API key, which you can generate from your [project settings](https://app.mabyduck.com). Include it in the `Authorization` header of every request: ```python import requests API_KEY = "YOUR_API_KEY" PROJECT_ID = "YOUR_PROJECT_ID" headers = {"Authorization": f"Api-Key {API_KEY}"} ``` ### Creating an experiment > **Prerequisite:** You need at least one dataset in the `ready` state before creating an experiment. See the Datasets section for instructions on creating a dataset via the API. To create an experiment, send a `POST` request with the experiment configuration. The easiest way to build the configuration JSON is to first set up an experiment through the [Mabyduck UI](https://app.mabyduck.com) and copy its configuration. See also the [experiments API reference](/api/openapi/experiments/experiments_create) for details on the available fields. ```python DATASET_ID = "YOUR_DATASET_ID" response = requests.post( f"https://api.mabyduck.com/projects/{PROJECT_ID}/experiments/", headers=headers, json={ "name": "MUSHRA pilot study", "type": "mushra", "language": "en", "datasets": [DATASET_ID], "training_datasets": [], "config": { "mushra": { "labels": ["Excellent", "Good", "Fair", "Poor", "Bad"], "showReference": True, "hiddenReference": True, "showWaveform": "reference", } }, "title": "", "question": "", "description": "", "introduction": "", }, ) experiment = response.json() print(experiment["id"]) ``` The `type` field determines the experiment type. Available types include `mushra`, `acr_audio`, `acr_image`, `acr_video`, `pairwise_audio`, `pairwise_image`, `pairwise_video`, `binary_audio`, `binary_image`, `binary_video`, and [more](/experiments/). The `config` field is specific to each experiment type. It controls the user interface — labels, reference display, interaction settings, and so on. The `title`, `question`, and `description` fields control the instructions displayed at the top of the experiment screen. ### Creating a job A single experiment can have multiple jobs, each targeting a different rater pool. Before creating a job, retrieve the available rater pools: ```python response = requests.get( f"https://api.mabyduck.com/projects/{PROJECT_ID}/experiments/{experiment['id']}/rater_pools/", headers=headers, ) rater_pools = response.json() ``` The available rater pools can be seen on the [rater pools API reference page](http://localhost:4000/api/openapi/experiments/experiments_rater_pools_list). Each rater pool has the following structure: ```json { "id": "string", "kind": 0, "name": "string", "label": "string" } ``` Then create a job with the desired rater pool and parameters: ```python response = requests.post( f"https://api.mabyduck.com/projects/{PROJECT_ID}/experiments/{experiment['id']}/jobs/", headers=headers, json={ "rater_pool_id": rater_pools[0]["id"], "num_sessions": 10, "num_comparisons": 5, "num_training": 2, "max_repetitions": 1, "min_rest_time": 1, "strategy": "randomized", "note": "", }, ) job = response.json() print(job["id"]) ``` Key parameters: - `num_sessions` — how many times the task is performed (by different raters). - `num_comparisons` — how many slates each rater evaluates per session. - `num_training` — training examples shown before the actual evaluation. - `strategy` — how stimuli are sampled: `randomized`, `lexicographic`, `active`, or `neighbor`. See [Strategies](/experiments/strategies/) for details. ### Getting job costs Before launching, retrieve a cost estimate for the job. This also returns a time-limited token needed to confirm the launch: ```python response = requests.get( f"https://api.mabyduck.com/projects/{PROJECT_ID}/experiments/{experiment['id']}/jobs/{job['id']}/costs/", headers=headers, ) costs = response.json() print(f"Cost: {costs['cost']} {costs['currency']}") print(f"Per additional session: {costs['cost_per_additional_session']} {costs['currency']}") print(f"Token expires in: {costs['token_expires_in']}s") ``` The response includes: - `cost` — the total cost for the job. - `cost_per_additional_session` — cost for each additional session. - `currency` — the currency code. - `token` — a time-limited token to confirm the cost when launching. ### Launching the job Launch the job by sending the cost token. This ensures you are charged the quoted price: ```python response = requests.post( f"https://api.mabyduck.com/projects/{PROJECT_ID}/experiments/{experiment['id']}/jobs/{job['id']}/launch/", headers=headers, json={"token": costs["token"]}, ) print(response.status_code) # 200 on success ``` Once launched, the job is live and raters can begin participating. ## Retrieving results After raters complete sessions, you can retrieve the collected data. **Listing slates** Slates represent groups of stimuli that were evaluated together (e.g., one MUSHRA screen). Each slate contains ratings for the stimuli it presented: ```python response = requests.get( f"https://api.mabyduck.com/projects/{PROJECT_ID}/experiments/{experiment['id']}/slates/", headers=headers, ) slates = response.json() print(f"Total slates: {len(slates)}") for slate in slates[:3]: for rating in slate["ratings"]: print(f" {rating['stimulus']}: {rating['score']}") ``` **Listing results and metrics** The results endpoint returns computed metrics for your experiment, such as mean opinion scores (MOS) or Elo ratings: ```python response = requests.get( f"https://api.mabyduck.com/projects/{PROJECT_ID}/experiments/{experiment['id']}/results/", headers=headers, ) results = response.json() for result in results: print(f"Metric: {result['kind']}") for score in result["scores"]: print(f" {score}") ``` You can filter results by metric type using the `metric` query parameter (e.g., `?metric=elo` or `?metric=mean`). You can also retrieve project-wide metrics across multiple experiments. ## Related resources - [Datasets documentation](/datasets/) — structuring your media files - [Creating datasets via the API](/api/tutorials/datasets/) — uploading media files via the API - [Experiment types](/experiments/) — choosing and configuring experiment types - [Sampling strategies](/experiments/strategies/) — controlling how stimuli are presented - [Elo metrics](/experiments/metrics/elo) — understanding Elo-based scoring - [Experiment API reference](/api/openapi/experiments/) — detailed API endpoints and schemas