Frequent Errors in Modeling by Machine Learning: A Prototype Case of Predicting the Timely Evolution of COVID-19 Pandemic

Károly Héberger

Resumen

Background: The development and application of machine learning (ML) methods have become so fast that almost nobody can follow their developments in every detail. It is no wonder that numerous errors and inconsistencies in their usage have also spread with a similar speed independently from the tasks: regression and classification. This work summarizes frequent errors committed by certain authors with the aim of helping scientists to avoid them. Methods: The principle of parsimony governs the train of thought. Fair method comparison can be completed with multicriteria decision-making techniques, preferably by the sum of ranking differences (SRD). Its coupling with analysis of variance (ANOVA) decomposes the effects of several factors. Earlier findings are summarized in a review-like manner: the abuse of the correlation coefficient and proper practices for model discrimination are also outlined. Results: Using an illustrative example, the correct practice and the methodology are summarized as guidelines for model discrimination, and for minimizing the prediction errors. The following factors are all prerequisites for successful modeling: proper data preprocessing, statistical tests, suitable performance parameters, appropriate degrees of freedom, fair comparison of models, and outlier detection, just to name a few. A checklist is provided in a tutorial manner on how to present ML modeling properly. The advocated practices are reviewed shortly in the discussion. Conclusions: Many of the errors can easily be filtered out with careful reviewing. Every authors? responsibility is to adhere to the rules of modeling and validation. A representative sampling of recent literature outlines correct practices and emphasizes that no error-free publication exists.

Palabras claves

machine learning - artificial neural networks - performance parameters - degree of freedom - fair method comparison - QSAR - nonlinear - standards for modeling

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 17 Parte: 1 (2024)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Applied Sciences
Computers
Water

DOI

https://doi.org/10.3390/a17010043

Artículos similares

Extracting Production Rules for Cerebrovascular Examination Dataset through Mining of Non-Anomalous Association Rules

Acceso

Chao Ou-Yang, Chandrawati Putri Wulandari, Mohammad Iqbal, Han-Cheng Wang and Chiehfeng Chen

Today, patients generate a massive amount of health records through electronic health records (EHRs). Extracting usable knowledge of patients? pathological conditions or diagnoses is essential for the reasoning process in rule-based systems to support th... ver más

Revista: Applied Sciences

Spatial Downscaling of Suomi NPP?VIIRS Image for Lake Mapping

Acceso

Chang Huang, Yun Chen, Shiqiang Zhang, Linyi Li, Kaifang Shi and Rui Liu

Capturing the dynamics of a lake-water area using remotely sensed images has always been an essential task. Most of the fine spatial resolution data are unsuitable for this purpose because of their low temporal resolution and limited scene coverage. A Vi... ver más

Revista: Water

Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale

Acceso

Saurabh Hukerikar,Christian Engelmann Pág. 4 - 42

Reliability is a serious concern for future extreme-scale high-performance computing (HPC) systems. Projections based on the current generation of HPC systems and technology roadmaps suggest the prevalence of very high fault rates in future systems. The ... ver más

Revista: Supercomputing Frontiers and Innovations

DETECTION OF HAIL THROUGH THE THREE BODY SCATTERING SIGNATURES AND ITS EFFECTS ON RADAR ALGORITHMS OBSERVED IN ROMANIA

Acceso

Daniel Victor Carbunaru,Roxana Monica Sasu,Sorin Ionut Burcea,Aurora Bell

The Romanian National Meteorological Administration (NMA) radar network consists of five S-band and four C-band radars. Observation of convection in Romania through the Doppler radar network offered a new perspective in understanding the climatologic ris... ver más

Revista: Atmósfera

Assessment of the use of statistical methods in articles published in a journal of veterinary science from 2000 to 2010 - doi: 10.4025/actascitechnol.v35i1.13753

Acceso

Roberto Montanhini Neto, Antonio Ostrensky Pág. 97 - 102

Statistics is a key tool to validate the conclusions of scientific papers. However, errors in using this method, including the use of low power tests and inadequate analysis of the studies are still frequent. This research identified, through a census of... ver más

Revista: Acta Scientiarum: Technology

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas