Resumen
Single-cell RNA sequencing (scRNA-seq) has become a powerful technique to investigate cellular heterogeneity and complexity in various fields by revealing the gene expression status of individual cells. Despite the undeniable benefits of scRNA-seq, it is not immune to its inherent limitations, such as sparsity and noise, which would hinder downstream analysis. In this paper, we introduce scCGImpute, a model-based approach for addressing the challenges of sparsity in scRNA-seq data through imputation. After identifying possible dropouts using mixed models, scCGImpute takes advantage of the cellular similarity in the same subpopulation to impute and then uses random forest regression to obtain the final imputation. scCGImpute only imputes the likely dropouts without changing the non-dropout data and can use information from the similarity of cells and genetic correlation simultaneously. Experiments on simulation data and real data were made, respectively, to evaluate the performance of scCGImpute in terms of gene expression recovery and clustering analysis. The results demonstrated that scCGImpute can effectively restore gene expression and improve the identification of cell types.