Datasets
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude

The basic structure of datasets should follow this example:

├──source_1/
│   ├──method_a.png
│   ├──method_b.png
│   └──method_c.png
├──source_2/
│   ├──method_c.png
│   └──method_d.png
├──source_3/
│   ├──...

Media files inside each folder correspond to different conditions or methods applied to the same source or context. For example, different text-to-image models applied to the same prompt, or different compression methods applied to the same audio file.

References and anchors

Some experiments support references and anchors. These are files of known high or low quality.

├──source_1/
│   ├──method_a.mp3
│   ├──method_b.mp3
│   ├──reference.mp3  (optional)
│   └──low.mp3        (optional)
├──source_2/
│   ├──...

The names of these files are configurable and can be different for each dataset, but reference.* is the default pattern for reference files.

Configuration files

JSON Schema

Click here to view the full config specification and validate your config.

Each folder may contain one or more configuration files:

├──source_1/
│   ├──config.json   (optional)
│   ├──method_a.mp4
│   └──method_b.mp4
├──source_2/
│   ├──...

These control the user interface of the experiment. For example:

{
    "title": "Audio listening test",
    "question": "Left or right?",
    "description": [
        "Is the sound coming from the left or right side of your headphones?"
    ]
}

This would be rendered as:

A more complex example which illustrates a description rendering as tags:

{
    "title": "Alignment test",
    "question": "Which of the audio files better matches the following description?",
    "description": [
        {
            "genre": "jazz",
            "subgenre": "cool jazz",
            "is_instrumental": true,
            "is_live": true,
            "mood": "relaxed",
            "primary_instrument": "saxophone",
            "tempo": "slow"
        },
        "Live jazz recording with a mellow, late-night vibe – featuring smooth saxophone and brushed drums. Something similar to Stan Getz or Chet Baker that feels intimate and relaxed."
    ]
}

This would be rendered as:

Each entry in the description array is rendered as a new line.

The name of the config file is configurable within each experiment, but config.json is the default.

Groups

Files may be grouped using the following syntax:

├──source_1/
│   ├──method_a@upsampling=2.png
│   ├──method_a@upsampling=4.png
│   ├──method_a@upsampling=8.png
│   ├──method_b@upsampling=2.png
│   ├──method_b@upsampling=4.png
│   └──method_b@upsampling=8.png
├──source_2/
│   ├──...

These groups are only used for visualization purposes. For example, to plot bit-rates against perceptual quality as in the figure below:

The parameters can be arbitrary numbers. Different methods may use different parameters, and some conditions may have no parameters at all (e.g., reference.png). However, note that each filename is treated as a separate condition. To ensure that each condition is assigned a sufficient number of ratings, we recommend that parameters are repeated across sources.

DatasetsCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from Claude

References and anchors

Configuration files

Groups

Was this helpful?

Datasets
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude