|
|
|
Kirill Tyshchuk, Polina Karpikova, Andrew Spiridonov, Anastasiia Prutianova, Anton Razzhigaev and Alexander Panchenko
Embeddings, i.e., vector representations of objects, such as texts, images, or graphs, play a key role in deep learning methodologies nowadays. Prior research has shown the importance of analyzing the isotropy of textual embeddings for transformer-based ...
ver más
|
|
|
|
|
|
|
Amr Mohamed El Koshiry, Entesar Hamed I. Eliwa, Tarek Abd El-Hafeez and Ahmed Omar
Social media platforms have become the primary means of communication and information sharing, facilitating interactive exchanges among users. Unfortunately, these platforms also witness the dissemination of inappropriate and toxic content, including hat...
ver más
|
|
|
|
|
|
|
Dauren Ayazbayev, Andrey Bogdanchikov, Kamila Orynbekova and Iraklis Varlamis
This work focuses on determining semantically close words and using semantic similarity in general in order to improve performance in information retrieval tasks. The semantic similarity of words is an important task with many applications from informati...
ver más
|
|
|
|
|
|
|
Roberta Rodrigues de Lima, Anita M. R. Fernandes, James Roberto Bombasar, Bruno Alves da Silva, Paul Crocker and Valderi Reis Quietinho Leithardt
Classification problems are common activities in many different domains and supervised learning algorithms have shown great promise in these areas. The classification of goods in international trade in Brazil represents a real challenge due to the comple...
ver más
|
|
|
|
|
|
|
Marco Siino, Elisa Di Nuovo, Ilenia Tinnirello and Marco La Cascia
Guided by a corpus linguistics approach, in this article we present a comparative evaluation of State-of-the-Art (SotA) models, with a special focus on Transformers, to address the task of Fake News Spreaders (i.e., users that share Fake News) detection....
ver más
|
|
|
|
|
|
|
Aliya Rexit, Mahpirat Muhammat, Xuebin Xu, Wenxiong Kang, Alimjan Aysa and Kurban Ubul
Handwritten signatures have traditionally been used as a common form of recognition and authentication in tasks such as financial transactions and document authentication. However, there are few studies on minority languages such as Uyghur and Kazakh use...
ver más
|
|
|
|
|
|
|
Guizhe Song, Degen Huang and Zhifeng Xiao
Multilingual characteristics, lack of annotated data, and imbalanced sample distribution are the three main challenges for toxic comment analysis in a multilingual setting. This paper proposes a multilingual toxic text classifier which adopts a novel fus...
ver más
|
|
|
|
|
|
|
Seid Muhie Yimam, Abinew Ali Ayele, Gopalakrishnan Venkatesh, Ibrahim Gashaw and Chris Biemann
The availability of different pre-trained semantic models has enabled the quick development of machine learning components for downstream applications. However, even if texts are abundant for low-resource languages, there are very few semantic models pub...
ver más
|
|
|
|
|
|
|
Rigas Kotsakis, Maria Matsiola, George Kalliris and Charalampos Dimoulas
The current paper focuses on the investigation of spoken-language classification in audio broadcasting content. The approach reflects a real-word scenario, encountered in modern media/monitoring organizations, where semi-automated indexing/documentation ...
ver más
|
|
|
|
|
|
|
Taufik Fuadi Abidin, Amir Mahazir, Muhammad Subianto, Khairul Munadi and Ridha Ferdhiana
During the previous decades, intelligent identification of acronym and expansion pairs from a large corpus has garnered considerable research attention, particularly in the fields of text mining, entity extraction, and information retrieval. Herein, we p...
ver más
|
|
|
|