New AI Framework Cuts Visual 'Hallucinations' in Image Search by 7.37%

Neueste Nachrichten

New AI Framework Cuts Visual 'Hallucinations' in Image Search by 7.37%

An open book with a red letter 'D' on it against a black background, displaying visible text with intricate details.
Jeffrey Morgan
Jeffrey Morgan
2 Min.

New AI Framework Cuts Visual 'Hallucinations' in Image Search by 7.37%

Researchers have developed a new framework to tackle misleading visual 'hallucinations' in text-to-image retrieval systems. The method, called Diffusion-aware Multi-view Contrastive Learning (DMCL), improves how well text queries match the correct images. It also boosts accuracy in multi-round retrieval tasks by up to 7.37% compared to existing approaches.

The work focuses on Diffusion-Interactive Text-to-Image Retrieval (DAI-TIR), where generated images sometimes contain false details that reduce performance.

The team behind DMCL identified a key issue in DAI-TIR: diffusion models occasionally create spurious visual elements that mislead retrieval systems. These inaccuracies, known as 'hallucinations,' degrade alignment between text and images, lowering retrieval accuracy.

To address this, DMCL introduces two training objectives. The first, Multi-View Query, Target Alignment, refines how both text queries and target images are represented. The second, Text-Diffusion Consistency, ensures generated images stay faithful to the input text. Together, these objectives filter out misleading visual cues and improve cross-modal matching.

Testing across five benchmarks showed consistent gains, with DMCL outperforming current methods by as much as 7.37%. The researchers confirmed its effectiveness using attention visualisation and geometric embedding analyses. These tools revealed clearer separation between relevant and irrelevant image-text pairs.

Beyond performance gains, the team has released a large-scale DAI-TIR training dataset. This resource aims to support further research in the field. Future work will explore advanced fusion techniques to enhance retrieval accuracy even more.

DMCL provides a stronger training framework for DAI-TIR, reducing the impact of misleading visual details. The method's improved alignment and accuracy could make text-to-image retrieval systems more reliable. With a new dataset now available, further advancements in this area are expected.