REVISTA
Information

TODAS

Inicio / Information / Vol: 14 Par: 7 (2023) / Artículo

ARTÍCULO

TITULO

On Isotropy of Multimodal Embeddings

Kirill Tyshchuk

Polina Karpikova

Andrew Spiridonov

Anastasiia Prutianova

Anton Razzhigaev and Alexander Panchenko

Resumen

Embeddings, i.e., vector representations of objects, such as texts, images, or graphs, play a key role in deep learning methodologies nowadays. Prior research has shown the importance of analyzing the isotropy of textual embeddings for transformer-based text encoders, such as the BERT model. Anisotropic word embeddings do not use the entire space, instead concentrating on a narrow cone in such a pretrained vector space, negatively affecting the performance of applications, such as textual semantic similarity. Transforming a vector space to optimize isotropy has been shown to be beneficial for improving performance in text processing tasks. This paper is the first comprehensive investigation of the distribution of multimodal embeddings using the example of OpenAI?s CLIP pretrained model. We aimed to deepen the understanding of the embedding space of multimodal embeddings, which has previously been unexplored in this respect, and study the impact on various end tasks. Our initial efforts were focused on measuring the alignment of image and text embedding distributions, with an emphasis on their isotropic properties. In addition, we evaluated several gradient-free approaches to enhance these properties, establishing their efficiency in improving the isotropy/alignment of the embeddings and, in certain cases, the zero-shot classification accuracy. Significantly, our analysis revealed that both CLIP and BERT models yielded embeddings situated within a cone immediately after initialization and preceding training. However, they were mostly isotropic in the local sense. We further extended our investigation to the structure of multilingual CLIP text embeddings, confirming that the observed characteristics were language-independent. By computing the few-shot classification accuracy and point-cloud metrics, we provide evidence of a strong correlation among multilingual embeddings. Embeddings transformation using the methods described in this article makes it easier to visualize embeddings. At the same time, multiple experiments that we conducted showed that, in regard to the transformed embeddings, the downstream tasks performance does not drop substantially (and sometimes is even improved). This means that one could obtain an easily visualizable embedding space, without substantially losing the quality of downstream tasks.

Palabras claves

NLP - CLIP - isotropy - visualization - multilingualism - multimodality

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 14 Parte: 7 (2023)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Applied Sciences
Algorithms
Computers

DOI

https://doi.org/10.3390/info14070392

Artículos similares

Extrinsic Bayesian Optimization on Manifolds

Acceso

Yihao Fang, Mu Niu, Pokman Cheung and Lizhen Lin

We propose an extrinsic Bayesian optimization (eBO) framework for general optimization problems on manifolds. Bayesian optimization algorithms build a surrogate of the objective function by employing Gaussian processes and utilizing the uncertainty in th... ver más

Revista: Algorithms

Patch-Level Consistency Regularization in Self-Supervised Transfer Learning for Fine-Grained Image Recognition

Acceso

Yejin Lee, Suho Lee and Sangheum Hwang

Fine-grained image recognition aims to classify fine subcategories belonging to the same parent category, such as vehicle model or bird species classification. This is an inherently challenging task because a classifier must capture subtle interclass dif... ver más

Revista: Applied Sciences

Applying Object Detection and Embedding Techniques to One-Shot Class-Incremental Multi-Label Image Classification

Acceso

Youngki Park and Youhyun Shin

In this paper, we introduce an efficient approach to multi-label image classification that is particularly suited for scenarios requiring rapid adaptation to new classes with minimal training data. Unlike conventional methods that rely solely on neural n... ver más

Revista: Applied Sciences

Using Deep-Learned Vector Representations for Page Stream Segmentation by Agglomerative Clustering

Acceso

Lukas Busch, Ruben van Heusden and Maarten Marx

Page stream segmentation (PSS) is the task of retrieving the boundaries that separate source documents given a consecutive stream of documents (for example, sequentially scanned PDF files). The task has recently gained more interest as a result of the di... ver más

Revista: Algorithms

Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning

Acceso

Nasrin Elhassan, Giuseppe Varone, Rami Ahmed, Mandar Gogate, Kia Dashtipour, Hani Almoamari, Mohammed A. El-Affendi, Bassam Naji Al-Tamimi, Faisal Albalwy and Amir Hussain

Social media networks have grown exponentially over the last two decades, providing the opportunity for users of the internet to communicate and exchange ideas on a variety of topics. The outcome is that opinion mining plays a crucial role in analyzing u... ver más

Revista: Computers

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas