Inicio  /  Applied Sciences  /  Vol: 12 Par: 24 (2022)  /  Artículo
ARTÍCULO
TITULO

Dichotomization of Multilevel Variables to Detect Hidden Associations

Asdrúbal López-Chau    
Lisbeth Rodriguez-Mazahua    
Farid García-Lamont    
Maricela Quintana-López and Carlos A. Rojas-Hernández    

Resumen

A test of independence is commonly used to determine differences (or associations) between samples in a nominal level measurement. Fisher?s exact test and Chi-square test are two of the most widely applied tests of independence used in the data analyses in different areas such as information technologies, biostatistics, psychology and health sciences. In some cases, contingency tables with null entries (also called random zeros) arise, particularly if the number of samples is small, and the variables analyzed are multilevel. This situation becomes a problem because if one or more entries in a contingency table are zero or have small values, then the tests of independence produce unreliable results. In this paper, we propose a method to address that issue. The method merges one or more levels of the variables analyzed to create contingency tables with only one degree of freedom, avoiding applying a test of independence on contingency tables with random zeros. The source code (Python) of the method is publicly available for use. The results obtained using our method give a complete panorama of the associations between the variables of a data set. To show the effectiveness of our approach to find dependencies between variables, we use four data sets publicly available on the Internet.