IDMT Audio Provenance Analysis Dataset
Dataset for audio provenance evaluation
This dataset contains two distinct collections tailored for evaluating audio provenance analysis solutions within specified scenarios: Singular Composition (i. e. fragments of interviews or statements are reused in various contexts) and Multi-Source Composition (segments from two sources are utilized to create new content).
The ability to verify the reliability and origin, i.e. provenance, of audio files is crucial in combating disinformation and ensuring the integrity of media content. This dataset contains two distinct collections tailored for evaluating audio provenance analysis solutions within specified scenarios: Singular Composition and Multi-Source Composition.
Singular Composition (SC) outlines a scenario where a single source is segmented and integrated with other content. This happens, for instance, when fragments of interviews or statements are reused in various contexts. For this purpose, the creation process for the SC test dataset involves starting with a source of interest (SoI), a music source (MS), and a non-reference source (NS). We created 40 sets under the SC scenario, featuring 80 audio files each.
Multi-Source Composition (MSC) introduces a second scenario where segments from two sources are utilized to create new content. This scenario draws inspiration from malicious content creation, such as a manipulated statement from a politician. The MSC test dataset creation involves two SoIs and aNS. We create a total of 40 of these datasets, with 60 audio files each.