Redirigiendo al acceso original de articulo en 17 segundos...
Inicio  /  Information  /  Vol: 14 Par: 11 (2023)  /  Artículo
ARTÍCULO
TITULO

Deep Learning Approaches for Big Data-Driven Metadata Extraction in Online Job Postings

Panagiotis Skondras    
Nikos Zotos    
Dimitris Lagios    
Panagiotis Zervas    
Konstantinos C. Giotopoulos and Giannis Tzimas    

Resumen

This article presents a study on the multi-class classification of job postings using machine learning algorithms. With the growth of online job platforms, there has been an influx of labor market data. Machine learning, particularly NLP, is increasingly used to analyze and classify job postings. However, the effectiveness of these algorithms largely hinges on the quality and volume of the training data. In our study, we propose a multi-class classification methodology for job postings, drawing on AI models such as text-davinci-003 and the quantized versions of Falcon 7b (Falcon), Wizardlm 7B (Wizardlm), and Vicuna 7B (Vicuna) to generate synthetic datasets. These synthetic data are employed in two use-case scenarios: (a) exclusively as training datasets composed of synthetic job postings (situations where no real data is available) and (b) as an augmentation method to bolster underrepresented job title categories. To evaluate our proposed method, we relied on two well-established approaches: the feedforward neural network (FFNN) and the BERT model. Both the use cases and training methods were assessed against a genuine job posting dataset to gauge classification accuracy. Our experiments substantiated the benefits of using synthetic data to enhance job posting classification. In the first scenario, the models? performance matched, and occasionally exceeded, that of the real data. In the second scenario, the augmented classes consistently outperformed in most instances. This research confirms that AI-generated datasets can enhance the efficacy of NLP algorithms, especially in the domain of multi-class classification job postings. While data augmentation can boost model generalization, its impact varies. It is especially beneficial for simpler models like FNN. BERT, due to its context-aware architecture, also benefits from augmentation but sees limited improvement. Selecting the right type and amount of augmentation is essential.

 Artículos similares

       
 
Luis M. de Campos, Juan M. Fernández-Luna, Juan F. Huete, Francisco J. Ribadas-Pena and Néstor Bolaños    
In the context of academic expert finding, this paper investigates and compares the performance of information retrieval (IR) and machine learning (ML) methods, including deep learning, to approach the problem of identifying academic figures who are expe... ver más
Revista: Algorithms

 
Seokjoon Kwon, Jae-Hyeon Park, Hee-Deok Jang, Hyunwoo Nam and Dong Eui Chang    
Deep learning algorithms are widely used for pattern recognition in electronic noses, which are sensor arrays for gas mixtures. One of the challenges of using electronic noses is sensor drift, which can degrade the accuracy of the system over time, even ... ver más
Revista: Applied Sciences

 
Alberto Alvarellos, Andrés Figuero, Santiago Rodríguez-Yáñez, José Sande, Enrique Peña, Paulo Rosa-Santos and Juan Rabuñal    
Port managers can use predictions of the wave overtopping predictors created in this work to take preventative measures and optimize operations, ultimately improving safety and helping to minimize the economic impact that overtopping events have on the p... ver más
Revista: Applied Sciences

 
Shihao Ma, Jiao Wu, Zhijun Zhang and Yala Tong    
Addressing the limitations, including low automation, slow recognition speed, and limited universality, of current mudslide disaster detection techniques in remote sensing imagery, this study employs deep learning methods for enhanced mudslide disaster d... ver más
Revista: Applied Sciences

 
Ryota Higashimoto, Soh Yoshida and Mitsuji Muneyasu    
This paper addresses the performance degradation of deep neural networks caused by learning with noisy labels. Recent research on this topic has exploited the memorization effect: networks fit data with clean labels during the early stages of learning an... ver más
Revista: Applied Sciences