Inicio  /  Applied Sciences  /  Vol: 13 Par: 13 (2023)  /  Artículo
ARTÍCULO
TITULO

scCGImpute: An Imputation Method for Single-Cell RNA Sequencing Data Based on Similarities between Cells and Relationships among Genes

Tiantian Liu and Yuanyuan Li    

Resumen

Single-cell RNA sequencing (scRNA-seq) has become a powerful technique to investigate cellular heterogeneity and complexity in various fields by revealing the gene expression status of individual cells. Despite the undeniable benefits of scRNA-seq, it is not immune to its inherent limitations, such as sparsity and noise, which would hinder downstream analysis. In this paper, we introduce scCGImpute, a model-based approach for addressing the challenges of sparsity in scRNA-seq data through imputation. After identifying possible dropouts using mixed models, scCGImpute takes advantage of the cellular similarity in the same subpopulation to impute and then uses random forest regression to obtain the final imputation. scCGImpute only imputes the likely dropouts without changing the non-dropout data and can use information from the similarity of cells and genetic correlation simultaneously. Experiments on simulation data and real data were made, respectively, to evaluate the performance of scCGImpute in terms of gene expression recovery and clustering analysis. The results demonstrated that scCGImpute can effectively restore gene expression and improve the identification of cell types.

Palabras claves

 Artículos similares

       
 
Andrés F. Ochoa-Muñoz and Javier E. Contreras-Reyes    
Missing or unavailable data (NA) in multivariate data analysis is often treated with imputation methods and, in some cases, records containing NA are eliminated, leading to the loss of information. This paper addresses the problem of NA in multiple facto... ver más
Revista: Algorithms

 
Xinxi Lu, Lijuan Yuan, Ruifeng Li, Zhihuan Xing, Ning Yao and Yichun Yu    
In recent years, the development of computer technology has promoted the informatization and intelligentization of hospital management systems and thus produced a large amount of medical data. These medical data are valuable resources for research. We ca... ver más
Revista: Algorithms

 
Yufan Qian, Limei Tian, Baichen Zhai, Shufan Zhang and Rui Wu    
Missing observations in time series will distort the data characteristics, change the dataset expectations, high-order distances, and other statistics, and increase the difficulty of data analysis. Therefore, data imputation needs to be performed first. ... ver más
Revista: Algorithms

 
Reza Shahbazian and Irina Trubitsyna    
Insights and analysis are only as good as the available data. Data cleaning is one of the most important steps to create quality data decision making. Machine learning (ML) helps deal with data quickly, and to create error-free or limited-error datasets.... ver más
Revista: Information

 
Fahima Noor, Sanaulla Haq, Mohammed Rakib, Tarik Ahmed, Zeeshan Jamal, Zakaria Shams Siam, Rubyat Tasnuva Hasan, Mohammed Sarfaraz Gani Adnan, Ashraf Dewan and Rashedur M. Rahman    
Bangladesh is in the floodplains of the Ganges, Brahmaputra, and Meghna River delta, crisscrossed by an intricate web of rivers. Although the country is highly prone to flooding, the use of state-of-the-art deep learning models in predicting river water ... ver más
Revista: Water