REVISTA
Information

TODAS

Redirigiendo al acceso original de articulo en 24 segundos...

Inicio / Information / Vol: 14 Par: 11 (2023) / Artículo

ARTÍCULO

TITULO

Deep Learning Approaches for Big Data-Driven Metadata Extraction in Online Job Postings

Panagiotis Skondras

Nikos Zotos

Dimitris Lagios

Panagiotis Zervas

Konstantinos C. Giotopoulos and Giannis Tzimas

Resumen

This article presents a study on the multi-class classification of job postings using machine learning algorithms. With the growth of online job platforms, there has been an influx of labor market data. Machine learning, particularly NLP, is increasingly used to analyze and classify job postings. However, the effectiveness of these algorithms largely hinges on the quality and volume of the training data. In our study, we propose a multi-class classification methodology for job postings, drawing on AI models such as text-davinci-003 and the quantized versions of Falcon 7b (Falcon), Wizardlm 7B (Wizardlm), and Vicuna 7B (Vicuna) to generate synthetic datasets. These synthetic data are employed in two use-case scenarios: (a) exclusively as training datasets composed of synthetic job postings (situations where no real data is available) and (b) as an augmentation method to bolster underrepresented job title categories. To evaluate our proposed method, we relied on two well-established approaches: the feedforward neural network (FFNN) and the BERT model. Both the use cases and training methods were assessed against a genuine job posting dataset to gauge classification accuracy. Our experiments substantiated the benefits of using synthetic data to enhance job posting classification. In the first scenario, the models? performance matched, and occasionally exceeded, that of the real data. In the second scenario, the augmented classes consistently outperformed in most instances. This research confirms that AI-generated datasets can enhance the efficacy of NLP algorithms, especially in the domain of multi-class classification job postings. While data augmentation can boost model generalization, its impact varies. It is especially beneficial for simpler models like FNN. BERT, due to its context-aware architecture, also benefits from augmentation but sees limited improvement. Selecting the right type and amount of augmentation is essential.

Palabras claves

metadata extraction - online job postings - big data - web crawling - data preprocessing - ChatGPT - deep learning - embeddings - labor market analysis

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 14 Parte: 11 (2023)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Water
Applied Sciences
Aerospace

DOI

https://doi.org/10.3390/info14110585

Artículos similares

Artificial Intelligence Techniques and Pedigree Charts in Oncogenetics: Towards an Experimental Multioutput Software System for Digitization and Risk Prediction

Acceso

Luana Conte, Emanuele Rizzo, Tiziana Grassi, Francesco Bagordo, Elisabetta De Matteis and Giorgio De Nunzio

Pedigree charts remain essential in oncological genetic counseling for identifying individuals with an increased risk of developing hereditary tumors. However, this valuable data source often remains confined to paper files, going unused. We propose a co... ver más

Revista: Computation

Whisper40: A Multi-Person Chinese Whisper Speaker Recognition Dataset Containing Same-Text Neutral Speech

Acceso

Jingwen Yang and Ruohua Zhou

Whisper speaker recognition (WSR) has received extensive attention from researchers in recent years, and it plays an important role in medical, judicial, and other fields. Among them, the establishment of a whisper dataset is very important for the study... ver más

Revista: Information

Bridging the Gap: Exploring Interpretability in Deep Learning Models for Brain Tumor Detection and Diagnosis from MRI Images

Acceso

Wandile Nhlapho, Marcellin Atemkeng, Yusuf Brima and Jean-Claude Ndogmo

The advent of deep learning (DL) has revolutionized medical imaging, offering unprecedented avenues for accurate disease classification and diagnosis. DL models have shown remarkable promise for classifying brain tumors from Magnetic Resonance Imaging (M... ver más

Revista: Information

Comparative Analysis of NLP-Based Models for Company Classification

Acceso

Maryan Rizinski, Andrej Jankov, Vignesh Sankaradas, Eugene Pinsky, Igor Mishkovski and Dimitar Trajanov

The task of company classification is traditionally performed using established standards, such as the Global Industry Classification Standard (GICS). However, these approaches heavily rely on laborious manual efforts by domain experts, resulting in slow... ver más

Revista: Information

Dementia Detection from Speech: What If Language Models Are Not the Answer?

Acceso

Mondher Bouazizi, Chuheng Zheng, Siyuan Yang and Tomoaki Ohtsuki

A growing focus among scientists has been on researching the techniques of automatic detection of dementia that can be applied to the speech samples of individuals with dementia. Leveraging the rapid advancements in Deep Learning (DL) and Natural Languag... ver más

Revista: Information

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas