Inicio  /  Information  /  Vol: 11 Par: 9 (2020)  /  Artículo
ARTÍCULO
TITULO

Identification of Malignancies from Free-Text Histopathology Reports Using a Multi-Model Supervised Machine Learning Approach

Victor Olago    
Mazvita Muchengeti    
Elvira Singh and Wenlong C. Chen    

Resumen

We explored various Machine Learning (ML) models to evaluate how each model performs in the task of classifying histopathology reports. We trained, optimized, and performed classification with Stochastic Gradient Descent (SGD), Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbor (KNN), Adaptive Boosting (AB), Decision Trees (DT), Gaussian Naïve Bayes (GNB), Logistic Regression (LR), and Dummy classifier. We started with 60,083 histopathology reports, which reduced to 60,069 after pre-processing. The F1-scores for SVM, SGD KNN, RF, DT, LR, AB, and GNB were 97%, 96%, 96%, 96%, 92%, 96%, 84%, and 88%, respectively, while the misclassification rates were 3.31%, 5.25%, 4.39%, 1.75%, 3.5%, 4.26%, 23.9%, and 19.94%, respectively. The approximate run times were 2 h, 20 min, 40 min, 8 h, 40 min, 10 min, 50 min, and 4 min, respectively. RF had the longest run time but the lowest misclassification rate on the labeled data. Our study demonstrated the possibility of applying ML techniques in the processing of free-text pathology reports for cancer registries for cancer incidence reporting in a Sub-Saharan Africa setting. This is an important consideration for the resource-constrained environments to leverage ML techniques to reduce workloads and improve the timeliness of reporting of cancer statistics.

 Artículos similares

       
 
Marco Leo, Pierluigi Carcagnì, Luca Signore, Francesco Corcione, Giulio Benincasa, Mikko O. Laukkanen and Cosimo Distante    
Colorectal cancer is one of the most lethal cancers because of late diagnosis and challenges in the selection of therapy options. The histopathological diagnosis of colon adenocarcinoma is hindered by poor reproducibility and a lack of standard examinati... ver más
Revista: AI

 
Darian M. Onchis, Flavia Costi, Codruta Istin, Ciprian Cosmin Secasan and Gabriel V. Cozma    
(1) Background: Lung cancers are the most common cancers worldwide, and prostate cancers are among the second in terms of the frequency of cancers diagnosed in men. Automatic ranking of the risk groups of such diseases is highly in demand, but the clinic... ver más
Revista: Applied Sciences

 
Rowa Aljondi, Salem Saeed Alghamdi, Abdulrahman Tajaldeen, Shareefah Alassiri, Monagi H. Alkinani and Thomas Bertinotti    
Background: Breast cancer has a 14.8% incidence rate and an 8.5% fatality rate in Saudi Arabia. Mammography is useful for the early detection of breast cancer. Researchers have been developing artificial intelligence (AI) algorithms for early breast canc... ver más
Revista: Applied Sciences

 
Cosimo Cardellicchio, Valentino Laquintana, Rosa Maria Iacobazzi, Nunzio Denora, Antonio Scilimati, Maria Grazia Perrone and Maria Annunziata M. Capozzi    
Sulindac is a well-known anti-inflammatory agent, sometimes employed as an adjuvant in antitumor therapy. Due to the recent interest in sulfoximine for its potential chemotherapeutics, we decided to transform sulindac and its methyl ester into the corres... ver más
Revista: Applied Sciences

 
Vincent Schilling, Peter Beyerlein and Jeremy Chien    
The identification of biomarkers is crucial for cancer diagnosis, understanding the underlying biological mechanisms, and developing targeted therapies. In this study, we propose a machine learning approach to predict ovarian cancer patients? outcomes an... ver más
Revista: Algorithms