Inicio  /  Information  /  Vol: 14 Par: 7 (2023)  /  Artículo
ARTÍCULO
TITULO

On Isotropy of Multimodal Embeddings

Kirill Tyshchuk    
Polina Karpikova    
Andrew Spiridonov    
Anastasiia Prutianova    
Anton Razzhigaev and Alexander Panchenko    

Resumen

Embeddings, i.e., vector representations of objects, such as texts, images, or graphs, play a key role in deep learning methodologies nowadays. Prior research has shown the importance of analyzing the isotropy of textual embeddings for transformer-based text encoders, such as the BERT model. Anisotropic word embeddings do not use the entire space, instead concentrating on a narrow cone in such a pretrained vector space, negatively affecting the performance of applications, such as textual semantic similarity. Transforming a vector space to optimize isotropy has been shown to be beneficial for improving performance in text processing tasks. This paper is the first comprehensive investigation of the distribution of multimodal embeddings using the example of OpenAI?s CLIP pretrained model. We aimed to deepen the understanding of the embedding space of multimodal embeddings, which has previously been unexplored in this respect, and study the impact on various end tasks. Our initial efforts were focused on measuring the alignment of image and text embedding distributions, with an emphasis on their isotropic properties. In addition, we evaluated several gradient-free approaches to enhance these properties, establishing their efficiency in improving the isotropy/alignment of the embeddings and, in certain cases, the zero-shot classification accuracy. Significantly, our analysis revealed that both CLIP and BERT models yielded embeddings situated within a cone immediately after initialization and preceding training. However, they were mostly isotropic in the local sense. We further extended our investigation to the structure of multilingual CLIP text embeddings, confirming that the observed characteristics were language-independent. By computing the few-shot classification accuracy and point-cloud metrics, we provide evidence of a strong correlation among multilingual embeddings. Embeddings transformation using the methods described in this article makes it easier to visualize embeddings. At the same time, multiple experiments that we conducted showed that, in regard to the transformed embeddings, the downstream tasks performance does not drop substantially (and sometimes is even improved). This means that one could obtain an easily visualizable embedding space, without substantially losing the quality of downstream tasks.

Palabras claves

 Artículos similares

       
 
Yihao Fang, Mu Niu, Pokman Cheung and Lizhen Lin    
We propose an extrinsic Bayesian optimization (eBO) framework for general optimization problems on manifolds. Bayesian optimization algorithms build a surrogate of the objective function by employing Gaussian processes and utilizing the uncertainty in th... ver más
Revista: Algorithms

 
Yejin Lee, Suho Lee and Sangheum Hwang    
Fine-grained image recognition aims to classify fine subcategories belonging to the same parent category, such as vehicle model or bird species classification. This is an inherently challenging task because a classifier must capture subtle interclass dif... ver más
Revista: Applied Sciences

 
Youngki Park and Youhyun Shin    
In this paper, we introduce an efficient approach to multi-label image classification that is particularly suited for scenarios requiring rapid adaptation to new classes with minimal training data. Unlike conventional methods that rely solely on neural n... ver más
Revista: Applied Sciences

 
Lukas Busch, Ruben van Heusden and Maarten Marx    
Page stream segmentation (PSS) is the task of retrieving the boundaries that separate source documents given a consecutive stream of documents (for example, sequentially scanned PDF files). The task has recently gained more interest as a result of the di... ver más
Revista: Algorithms

 
Nasrin Elhassan, Giuseppe Varone, Rami Ahmed, Mandar Gogate, Kia Dashtipour, Hani Almoamari, Mohammed A. El-Affendi, Bassam Naji Al-Tamimi, Faisal Albalwy and Amir Hussain    
Social media networks have grown exponentially over the last two decades, providing the opportunity for users of the internet to communicate and exchange ideas on a variety of topics. The outcome is that opinion mining plays a crucial role in analyzing u... ver más
Revista: Computers