Valhalla hallucination challenge

5/11/2023

Machine Translation (MT) is a core task in natural language processing and has undergone several paradigm shifts over the past few decades, from early rules-based systems to pipelined statistical MT approaches to recent end-to-end neural network-based models. Extensive experiments on three standard translation datasets with a diverse set of language pairs demonstrate the effectiveness of our approach over both text-only baselines and state-of-the-art methods. We train the hallucination transformer jointly with the translation transformer using standard backpropagation with cross-entropy losses while being guided by an additional loss that encourages consistency between predictions using either ground-truth or hallucinated visual representations. In particular, given a source sentence an autoregressive hallucination transformer is used to predict a discrete visual representation from the input text, and the combined text and hallucinated representations are utilized to obtain the target translation.

In this paper, we introduce a visual hallucination framework, called VALHALLA, which requires only source sentences at inference time and instead uses hallucinated visual representations for multimodal machine translation. While existing methods show promising performance over the conventional text-only translation systems, they typically require paired text and image as input during inference, which limits their applicability to real-world scenarios. Designing better machine translation systems by considering auxiliary inputs such as images has attracted much attention in recent years.

0 Comments

Valhalla hallucination challenge

Leave a Reply.

Author

Archives

Categories