27 Artículos

Zero-Inflated Text Data Analysis using Generative Adversarial Networks and Statistical Modeling

Acceso

en línea

Sunghae Jun

In big data analysis, various zero-inflated problems are occurring. In particular, the problem of inflated zeros has a great influence on text big data analysis. In general, the preprocessed data from text documents are a matrix consisting of the documen... ver más

Revista: Computers Formato: Electrónico

Tabla de contenido: Vol: 12 Num: 0 Par: 12 Año: 2023

Zero-Shot Relation Triple Extraction with Prompts for Low-Resource Languages

Acceso

en línea

Ayiguli Halike, Aishan Wumaier and Tuergen Yibulayin

Although low-resource relation extraction is vital in knowledge construction and characterization, more research is needed on the generalization of unknown relation types. To fill the gap in the study of low-resource (Uyghur) relation extraction methods,... ver más

Revista: Applied Sciences Formato: Electrónico

Tabla de contenido: Vol: 13 Num: 0 Par: 7 Año: 2023

Continuous Semi-Supervised Nonnegative Matrix Factorization

Acceso

en línea

Michael R. Lindstrom, Xiaofu Ding, Feng Liu, Anand Somayajula and Deanna Needell

Nonnegative matrix factorization can be used to automatically detect topics within a corpus in an unsupervised fashion. The technique amounts to an approximation of a nonnegative matrix as the product of two nonnegative matrices of lower rank. In certain... ver más

Revista: Algorithms Formato: Electrónico

Tabla de contenido: Vol: 16 Num: 0 Par: 4 Año: 2023

Deep Learning Approaches for Big Data-Driven Metadata Extraction in Online Job Postings

Acceso

en línea

Panagiotis Skondras, Nikos Zotos, Dimitris Lagios, Panagiotis Zervas, Konstantinos C. Giotopoulos and Giannis Tzimas

This article presents a study on the multi-class classification of job postings using machine learning algorithms. With the growth of online job platforms, there has been an influx of labor market data. Machine learning, particularly NLP, is increasingly... ver más

Revista: Information Formato: Electrónico

Tabla de contenido: Vol: 14 Num: 0 Par: 11 Año: 2023

The Role of ChatGPT in Data Science: How AI-Assisted Conversational Interfaces Are Revolutionizing the Field

Acceso

en línea

Hossein Hassani and Emmanuel Sirmal Silva

ChatGPT, a conversational AI interface that utilizes natural language processing and machine learning algorithms, is taking the world by storm and is the buzzword across many sectors today. Given the likely impact of this model on data science, through t... ver más

Revista: Big Data and Cognitive Computing Formato: Electrónico

Tabla de contenido: Vol: 7 Num: 0 Par: 2 Año: 2023

Zero-Inflated Patent Data Analysis Using Generating Synthetic Samples

Acceso

en línea

Daiho Uhm and Sunghae Jun

Due to the expansion of the internet, we encounter various types of big data such as web documents or sensing data. Compared to traditional small data such as experimental samples, big data provide more chances to find hidden and novel patterns with big ... ver más

Revista: Future Internet Formato: Electrónico

Tabla de contenido: Vol: 14 Num: 0 Par: 7 Año: 2022

An Effective Method for Detection and Recognition of Uyghur Texts in Images with Backgrounds

Acceso

en línea

Mayire Ibrayim, Ahmatjan Mattohti and Askar Hamdulla

Uyghur text detection and recognition in images with simple backgrounds is still a challenging task for Uyghur image content analysis. In this paper, we propose a new effective Uyghur text detection method based on channel-enhanced MSERs and the CNN clas... ver más

Revista: Information Formato: Electrónico

Tabla de contenido: Vol: 13 Num: 0 Par: 7 Año: 2022

Generating Synthetic Training Data for Supervised De-Identification of Electronic Health Records

Acceso

en línea

Claudia Alessandra Libbi, Jan Trienes, Dolf Trieschnigg and Christin Seifert

A major hurdle in the development of natural language processing (NLP) methods for Electronic Health Records (EHRs) is the lack of large, annotated datasets. Privacy concerns prevent the distribution of EHRs, and the annotation of data is known to be cos... ver más

Revista: Future Internet Formato: Electrónico

Tabla de contenido: Vol: 13 Num: 0 Par: 5 Año: 2021

Topic Modeling for Amharic User Generated Texts

Acceso

en línea

Girma Neshir, Andreas Rauber and Solomon Atnafu

Topic Modeling is a statistical process, which derives the latent themes from extensive collections of text. Three approaches to topic modeling exist, namely, unsupervised, semi-supervised and supervised. In this work, we develop a supervised topic model... ver más

Revista: Information Formato: Electrónico

Tabla de contenido: Vol: 12 Num: 0 Par: 10 Año: 2021

Fine-Grained Cross-Modal Retrieval for Cultural Items with Focal Attention and Hierarchical Encodings

Acceso

en línea

Shurong Sheng, Katrien Laenen, Luc Van Gool and Marie-Francine Moens

In this paper, we target the tasks of fine-grained image?text alignment and cross-modal retrieval in the cultural heritage domain as follows: (1) given an image fragment of an artwork, we retrieve the noun phrases that describe it; (2) given a noun phras... ver más

Revista: Computers Formato: Electrónico

Tabla de contenido: Vol: 10 Num: 0 Par: 9 Año: 2021

« Anterior Página: 1 de 2 Siguiente »