Resumen
The classification of biomedical literature is engaged in a number of critical issues that physicians are expected to answer. In many cases, these issues are extremely difficult. This can be conducted for jobs such as diagnosis and treatment, as well as efficient representations of ideas such as medications, procedure codes, and patient visits, as well as in the quick search of a document or disease classification. Pathologies are being sought from clinical notes, among other sources. The goal of this systematic review is to analyze the literature on various problems of classification of medical texts of patients based on criteria such as: the quality of the evaluation metrics used, the different methods of machine learning applied, the different data sets, to highlight the best methods in this type of problem, and to identify the different challenges associated. The study covers the period from 1 January 2016 to 10 July 2022. We used multiple databases and archives of research articles, including Web Of Science, Scopus, MDPI, arXiv, IEEE, and ACM, to find 894 articles dealing with the subject of text classification, which we were able to filter using inclusion and exclusion criteria. Following a thorough review, we selected 33 articles dealing with biological text categorization issues. Following our investigation, we discovered two major issues linked to the methodology and data used for biomedical text classification. First, there is the data-centric challenge, followed by the data quality challenge.