RINE: Leveraging Representations from Intermediate Encoder-Blocks for Synthetic Image Detection

A high-performing synthetic image detection method that utilises intermediate layers of the CLIP image encoder.

ML Model

GitHub

Paper

Developed by

Centre for Research & Technology, Hellas - CERTH

License

Apache License 2.0 (Apache-2.0)

Main Characteristic

RINE is a synthetic image detection framework that leverages the intermediate representations from CLIP’s Vision Transformer blocks to build a forgery-aware feature space, enabling high accuracy in detecting synthetic images with minimal computational resources.

Technical Categories

Computer vision

Keywords

Last updated

05.11.2024 - 11:24

Detailed Description

RINE is a novel approach for Synthetic Image Detection (SID) that utilises intermediate representations from CLIP’s image encoder. Unlike traditional methods that primarily use final-layer features, RINE extracts information from multiple intermediate Transformer blocks. These encapsulate low-level details, which are crucial for identifying synthetic traces.

The RINE architecture first processes an input image through CLIP's encoder, extracting CLS tokens from each Transformer block. These are concatenated and projected into a forgery-aware vector space using a lightweight network. This process ensures that RINE captures nuanced, fine-grained details that indicate synthetic artifacts. A unique feature of RINE is its Trainable Importance Estimator (TIE), which assigns weights to each block’s representation based on its relevance to the SID task, enabling more accurate aggregation of features.

To further enhance learning, RINE employs a combination of binary cross-entropy loss for classification accuracy and supervised contrastive learning, which organises feature vectors into dense clusters based on their class. This approach not only improves the model’s classification but also enhances its ability to generalise across different synthetic image datasets.

Trustworthy AI

RINE is (1) GDPR-compliant, as it has not been trained on any personal information, and (2) contributes to online safety, as it aims to enhance the authenticity and integrity of digital media by identifying and flagging synthetic content.

Related Projects

vera.ai