REVISTA
Informatics

TODAS

Inicio / Informatics / Vol: 8 Par: 2 (2021) / Artículo

ARTÍCULO

TITULO

Benchmarking Machine Learning Models to Assist in the Prognosis of Tuberculosis

Maicon Herverton Lino Ferreira da Silva Barros

Geovanne Oliveira Alves

Lubnnia Morais Florêncio Souza

Elisson da Silva Rocha

João Fausto Lorenzato de Oliveira

Theo Lynn

Vanderson Sampaio and Patricia Takako Endo

Resumen

Tuberculosis (TB) is an airborne infectious disease caused by organisms in the Mycobacterium tuberculosis (Mtb) complex. In many low and middle-income countries, TB remains a major cause of morbidity and mortality. Once a patient has been diagnosed with TB, it is critical that healthcare workers make the most appropriate treatment decision given the individual conditions of the patient and the likely course of the disease based on medical experience. Depending on the prognosis, delayed or inappropriate treatment can result in unsatisfactory results including the exacerbation of clinical symptoms, poor quality of life, and increased risk of death. This work benchmarks machine learning models to aid TB prognosis using a Brazilian health database of confirmed cases and deaths related to TB in the State of Amazonas. The goal is to predict the probability of death by TB thus aiding the prognosis of TB and associated treatment decision making process. In its original form, the data set comprised 36,228 records and 130 fields but suffered from missing, incomplete, or incorrect data. Following data cleaning and preprocessing, a revised data set was generated comprising 24,015 records and 38 fields, including 22,876 reported cured TB patients and 1139 deaths by TB. To explore how the data imbalance impacts model performance, two controlled experiments were designed using (1) imbalanced and (2) balanced data sets. The best result is achieved by the Gradient Boosting (GB) model using the balanced data set to predict TB-mortality, and the ensemble model composed by the Random Forest (RF), GB and Multi-Layer Perceptron (MLP) models is the best model to predict the cure class.

Palabras claves

tuberculosis - neglected tropical disease - prognosis - machine learning - ensemble model - imbalanced data sets - feature selection - random search - benchmark

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 8 Parte: 2 (2021)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Information
Algorithms
Computers

DOI

https://doi.org/10.3390/informatics8020027

Artículos similares

Re-Engineered Word Embeddings for Improved Document-Level Sentiment Analysis

Acceso

Su Yang and Farzin Deravi

In this paper, a novel re-engineering mechanism for the generation of word embeddings is proposed for document-level sentiment analysis. Current approaches to sentiment analysis often integrate feature engineering with classification, without optimizing ... ver más

Revista: Applied Sciences

COVID-19 Outbreak Prediction with Machine Learning

Acceso

Sina F. Ardabili, Amir Mosavi, Pedram Ghamisi, Filip Ferdinand, Annamaria R. Varkonyi-Koczy, Uwe Reuter, Timon Rabczuk and Peter M. Atkinson

Several outbreak prediction models for COVID-19 are being used by officials around the world to make informed decisions and enforce relevant control measures. Among the standard models for COVID-19 global pandemic prediction, simple epidemiological and s... ver más

Revista: Algorithms

Flattening of Data-Dependent Nested Loops for Compile-Time Optimization of GPU Programs

Acceso

Vadim Bulavintsev Pág. 7 - 13

Modern Graphics Processing Units (GPUs) belong to the ?Single Instruction Multiple Data? (SIMD) computational architecture class. Due to inefficient execution of divergent branches, SIMD devices can lose performance on nested loops with data-dependent ex... ver más

Revista: International Journal of Open Information Technologies

A Data Quality Strategy to Enable FAIR, Programmatic Access across Large, Diverse Data Collections for High Performance Data Analysis

Acceso

Ben Evans, Kelsey Druken, Jingbo Wang, Rui Yang, Clare Richards and Lesley Wyborn

To ensure seamless, programmatic access to data for High Performance Computing (HPC) and analysis across multiple research domains, it is vital to have a methodology for standardization of both data and services. At the Australian National Computational ... ver más

Revista: Informatics

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas