# Datasets

The basic structure of datasets should follow this example:


```treeview
├── source_1/
│   ├── method_a.png
│   ├── method_b.png
│   └── method_c.png
├── source_2/
│   ├── method_c.png
│   └── method_d.png
├── source_3/
│   ├── ...
```

Media files inside each folder correspond to different conditions or methods applied to the same source or context.
For example, different text-to-image models applied to the same prompt, or different compression methods applied to the
same audio file.

## References and anchors

Some experiments support *references* and *anchors*. These are files of known high or low quality.


```treeview
├── source_1/
│   ├── method_a.mp3
│   ├── method_b.mp3
│   ├── reference.mp3  (optional)
│   └── low.mp3        (optional)
├── source_2/
│   ├── ...
```

The names of these files are configurable and can be different for each dataset, but `reference.*`
is the default pattern for reference files.

## Configuration files

JSON Schema
Click [here](https://www.jsonschemavalidator.net/s/4evQ5y4k) to view the full config specification and validate your config.

Each folder may contain one or more configuration files:


```treeview
├── source_1/
│   ├── config.json   (optional)
│   ├── method_a.mp4
│   └── method_b.mp4
├── source_2/
│   ├── ...
```

These control the user interface of the experiment. For example:


```json
{
    "title": "Audio listening test",
    "question": "Left or right?",
    "description": [
        "Is the sound coming from the left or right side of your headphones?"
    ]
}
```

This would be rendered as:

Experiment title and question
A more complex example which illustrates a description rendering as tags:


```json
{
    "title": "Alignment test",
    "question": "Which of the audio files better matches the following description?",
    "description": [
        {
            "genre": "jazz",
            "subgenre": "cool jazz",
            "is_instrumental": true,
            "is_live": true,
            "mood": "relaxed",
            "primary_instrument": "saxophone",
            "tempo": "slow"
        },
        "Live jazz recording with a mellow, late-night vibe – featuring smooth saxophone and brushed drums. Something similar to Stan Getz or Chet Baker that feels intimate and relaxed."
    ]
}
```

This would be rendered as:

Experiment title and question with tags
Each entry in the `description` array is rendered as a new line.

The name of the config file is configurable within each experiment, but `config.json` is the
default.

## Groups

Files may be grouped using the following syntax:


```treeview
├── source_1/
│   ├── method_a@upsampling=2.png
│   ├── method_a@upsampling=4.png
│   ├── method_a@upsampling=8.png
│   ├── method_b@upsampling=2.png
│   ├── method_b@upsampling=4.png
│   └── method_b@upsampling=8.png
├── source_2/
│   ├── ...
```

These groups are only used for visualization purposes. For example, to plot bit-rates against perceptual
quality as in the figure below:

Line graph results for CLIC2024
The parameters can be arbitrary numbers. Different methods may use different
parameters, and some conditions may have no parameters at all (e.g., `reference.png`).
However, note that each filename is treated as a separate
condition. To ensure that each condition is assigned a sufficient number of ratings, we
recommend that parameters are repeated across sources.