IDMT Audio Provenance Analysis Dataset

Dataset for audio provenance evaluation

Dataset

Link to the dataset

Fraunhofer IDMT

Fraunhofer IDMT

Developed by

Fraunhofer-Gesellschaft

License

Creative Commons Attribution 4.0 International License

Main Characteristic

This dataset contains two distinct collections tailored for evaluating audio provenance analysis solutions within specified scenarios: Singular Composition (i. e. fragments of interviews or statements are reused in various contexts) and Multi-Source Composition (segments from two sources are utilized to create new content).

Technical Categories

Audio processing Machine learning

Keywords

Last updated

05.11.2024 - 11:43

Detailed Description

The ability to verify the reliability and origin, i.e. provenance, of audio files is crucial in combating disinformation and ensuring the integrity of media content. This dataset contains two distinct collections tailored for evaluating audio provenance analysis solutions within specified scenarios: Singular Composition and Multi-Source Composition.

Singular Composition (SC) outlines a scenario where a single source is segmented and integrated with other content. This happens, for instance, when fragments of interviews or statements are reused in various contexts. For this purpose, the creation process for the SC test dataset involves starting with a source of interest (SoI), a music source (MS), and a non-reference source (NS). We created 40 sets under the SC scenario, featuring 80 audio files each.

Multi-Source Composition (MSC) introduces a second scenario where segments from two sources are utilized to create new content. This scenario draws inspiration from malicious content creation, such as a manipulated statement from a politician. The MSC test dataset creation involves two SoIs and aNS. We create a total of 40 of these datasets, with 60 audio files each.

Trustworthy AI

N/A

GDPR Requirements

GDPR compliant