# Self-hosted datasets

Self-hosted datasets let you use stimuli that are already hosted outside Mabyduck. Instead of uploading the media files themselves, you upload a small manifest that points Mabyduck to each remote stimulus URL.

Use this when your stimuli are already stored in your own bucket, CDN, file server, or hosted frontend.

## Dataset types

Mabyduck supports two self-hosted manifest formats:

- [TXT URL lists](/datasets/self_hosted/txt) are the simplest option. Each line is one remote stimulus URL, and Mabyduck infers the stimulus group from the URL path.
- [CSV manifests](/datasets/self_hosted/csv) are more explicit. Each row defines the stimulus group, condition name, URL, and optionally the file extension.


## How groups and stimuli are created

Self-hosted datasets still follow the same dataset model as uploaded file datasets:

- A **stimulus group** is usually the shared source, prompt, scene, item, or context.
- A **stimulus** is one condition or method inside that group.


The manifest controls how those groups and stimuli are created. TXT datasets infer them from URL paths, while CSV datasets define them directly in columns.

## URL requirements

Remote stimulus URLs must be public URLs that Mabyduck can fetch when processing the dataset. In normal use, URLs should start with `https://`.

Localhost and private network URLs are blocked outside development environments.