ARTÍCULO
TITULO

Development of Computational Pipeline Software for Genome/Exome Analysis on the K Computer

Kento Aoyama    
Masanori Kakuta    
Yuri Matsuzaki    
Takashi Ishida    
Masahito Ohue    
Yutaka Akiyama    

Resumen

Pipeline software that comprise tool and application chains for specific data processing have found extensive utilization in the analysis of several data types, such as genome, in bioinformatics research. Recent trends in genome analysis require use of pipeline software for optimum utilization of computational resources, thereby facilitating efficient handling of large-scale biological data accumulated on a daily basis. However, use of pipeline software in bioinformatics tends to be problematic owing to their large memory and storage capacity requirements, increasing number of job submissions, and a wide range of software dependencies. This paper presents a massive parallel genome/exome analysis pipeline software that addresses these difficulties. Additionally, it can be executed on a large number of K computer nodes. The proposed pipeline incorporates workflow management functionality that performs effectively when considering the task-dependency graph of internal executions via extension of the dynamic task distribution framework. Performance results pertaining to the core pipeline functionality, obtained via evaluation experiments performed using an actual exome dataset, demonstrate good scalability when using over a thousand nodes. Additionally, this study proposes several approaches to resolve performance bottlenecks of a pipeline by considering the domain knowledge pertaining to internal pipeline executions as a major challenge facing pipeline parallelization. 

 Artículos similares

       
 
Radoslaw Piotr Katarzyniak, Grzegorz Popek and Marcin Zurawski    
This article presents a model of an architecture of an artificial cognitive agent that performs the function of generating autoepistemic membership statements used to communicate beliefs about the belonging of an observed external object to a category wi... ver más
Revista: Applied Sciences

 
Nguyen Trung Tuan, Philip Moore, Dat Ha Vu Thanh and Hai Van Pham    
ChatGPT plays significant roles in the third decade of the 21st Century. Smart cities applications can be integrated with ChatGPT in various fields. This research proposes an approach for developing large language models using generative artificial intel... ver más
Revista: Applied Sciences

 
Khalid Alnajim and Ahmed A. Abokifa    
In the wake of the terrorist attacks of 11 September 2001, extensive research efforts have been dedicated to the development of computational algorithms for identifying contamination sources in water distribution systems (WDSs). Previous studies have ext... ver más
Revista: Water

 
Matija Milanic and Rok Hren    
The Adding-Doubling (AD) algorithm is a general analytical solution of the radiative transfer equation (RTE). AD offers a favorable balance between accuracy and computational efficiency, surpassing other RTE solutions, such as Monte Carlo (MC) simulation... ver más
Revista: Algorithms

 
Juan Luis Pérez-Ruiz, Yu Tang, Igor Loboda and Luis Angel Miró-Zárate    
In the field of aircraft engine diagnostics, many advanced algorithms have been proposed over the last few years. However, there is still wide room for improvement, especially in the development of more integrated and complete engine health management sy... ver más
Revista: Aerospace