|
|
|
Sunghae Jun
In big data analysis, various zero-inflated problems are occurring. In particular, the problem of inflated zeros has a great influence on text big data analysis. In general, the preprocessed data from text documents are a matrix consisting of the documen...
ver más
|
|
|
|
|
|
|
Ayiguli Halike, Aishan Wumaier and Tuergen Yibulayin
Although low-resource relation extraction is vital in knowledge construction and characterization, more research is needed on the generalization of unknown relation types. To fill the gap in the study of low-resource (Uyghur) relation extraction methods,...
ver más
|
|
|
|
|
|
|
Michael R. Lindstrom, Xiaofu Ding, Feng Liu, Anand Somayajula and Deanna Needell
Nonnegative matrix factorization can be used to automatically detect topics within a corpus in an unsupervised fashion. The technique amounts to an approximation of a nonnegative matrix as the product of two nonnegative matrices of lower rank. In certain...
ver más
|
|
|
|
|
|
|
Panagiotis Skondras, Nikos Zotos, Dimitris Lagios, Panagiotis Zervas, Konstantinos C. Giotopoulos and Giannis Tzimas
This article presents a study on the multi-class classification of job postings using machine learning algorithms. With the growth of online job platforms, there has been an influx of labor market data. Machine learning, particularly NLP, is increasingly...
ver más
|
|
|
|
|
|
|
Hossein Hassani and Emmanuel Sirmal Silva
ChatGPT, a conversational AI interface that utilizes natural language processing and machine learning algorithms, is taking the world by storm and is the buzzword across many sectors today. Given the likely impact of this model on data science, through t...
ver más
|
|
|
|
|
|
|
Daiho Uhm and Sunghae Jun
Due to the expansion of the internet, we encounter various types of big data such as web documents or sensing data. Compared to traditional small data such as experimental samples, big data provide more chances to find hidden and novel patterns with big ...
ver más
|
|
|
|
|
|
|
Mayire Ibrayim, Ahmatjan Mattohti and Askar Hamdulla
Uyghur text detection and recognition in images with simple backgrounds is still a challenging task for Uyghur image content analysis. In this paper, we propose a new effective Uyghur text detection method based on channel-enhanced MSERs and the CNN clas...
ver más
|
|
|
|
|
|
|
Claudia Alessandra Libbi, Jan Trienes, Dolf Trieschnigg and Christin Seifert
A major hurdle in the development of natural language processing (NLP) methods for Electronic Health Records (EHRs) is the lack of large, annotated datasets. Privacy concerns prevent the distribution of EHRs, and the annotation of data is known to be cos...
ver más
|
|
|
|
|
|
|
Girma Neshir, Andreas Rauber and Solomon Atnafu
Topic Modeling is a statistical process, which derives the latent themes from extensive collections of text. Three approaches to topic modeling exist, namely, unsupervised, semi-supervised and supervised. In this work, we develop a supervised topic model...
ver más
|
|
|
|
|
|
|
Shurong Sheng, Katrien Laenen, Luc Van Gool and Marie-Francine Moens
In this paper, we target the tasks of fine-grained image?text alignment and cross-modal retrieval in the cultural heritage domain as follows: (1) given an image fragment of an artwork, we retrieve the noun phrases that describe it; (2) given a noun phras...
ver más
|
|
|
|