|
|
|
Nguyen Trung Tuan, Philip Moore, Dat Ha Vu Thanh and Hai Van Pham
ChatGPT plays significant roles in the third decade of the 21st Century. Smart cities applications can be integrated with ChatGPT in various fields. This research proposes an approach for developing large language models using generative artificial intel...
ver más
|
|
|
|
|
|
|
Fenfang Li, Zhengzhang Zhao, Li Wang and Han Deng
Sentence Boundary Disambiguation (SBD) is crucial for building datasets for tasks such as machine translation, syntactic analysis, and semantic analysis. Currently, most automatic sentence segmentation in Tibetan adopts the methods of rule-based and stat...
ver más
|
|
|
|
|
|
|
Samuel R. Schrader and Eren Gultepe
The evaluation of similarities between natural languages often relies on prior knowledge of the languages being studied. We describe three methods for building phylogenetic trees and clustering languages without the use of language-specific information. ...
ver más
|
|
|
|
|
|
|
Khalil Al-Hussaeni, Mohamed Sameer and Ioannis Karamitsos
Due to the increasing reliance on social network platforms in recent years, hate speech has risen significantly among online users. Government and social media platforms face the challenging responsibility of controlling, detecting, and removing massivel...
ver más
|
|
|
|
|
|
|
Dezhi Cao, Yue Zhao and Licheng Wu
The construction of pronunciation dictionaries relies on high-quality and extensive training data in data-driven way. However, the manual annotation of corpus for this purpose is both costly and time consuming, especially for low-resource languages that ...
ver más
|
|
|
|
|
|
|
Mikel Penagarikano, Amparo Varona, Germán Bordel and Luis Javier Rodriguez-Fuentes
In this paper, a semisupervised speech data extraction method is presented and applied to create a new dataset designed for the development of fully bilingual Automatic Speech Recognition (ASR) systems for Basque and Spanish. The dataset is drawn from an...
ver más
|
|
|
|
|
|
|
Kyungho Yu, Hyoungju Kim, Jeongin Kim, Chanjun Chun and Pankoo Kim
Text-to-image technology enables computers to create images from text by simulating the human process of forming mental images. GAN-based text-to-image technology involves extracting features from input text; subsequently, they are combined with noise an...
ver más
|
|
|
|
|
|
|
Saida Mussakhojayeva, Kaisar Dauletbek, Rustem Yeshpanov and Huseyin Atakan Varol
The primary aim of this study was to contribute to the development of multilingual automatic speech recognition for lower-resourced Turkic languages. Ten languages?Azerbaijani, Bashkir, Chuvash, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Uyghur, and Uzbek?we...
ver más
|
|
|
|
|
|
|
Youngki Park and Youhyun Shin
This paper presents a novel approach for finding the most semantically similar conversational sentences in Korean and English. Our method involves training separate embedding models for each language and using a hybrid algorithm that selects the appropri...
ver más
|
|
|
|
|
|
|
Kirill Tyshchuk, Polina Karpikova, Andrew Spiridonov, Anastasiia Prutianova, Anton Razzhigaev and Alexander Panchenko
Embeddings, i.e., vector representations of objects, such as texts, images, or graphs, play a key role in deep learning methodologies nowadays. Prior research has shown the importance of analyzing the isotropy of textual embeddings for transformer-based ...
ver más
|
|
|
|