REVISTA
Supercomputing Frontiers and Innovations

TODAS

Redirigiendo al acceso original de articulo en 19 segundos...

Inicio / Supercomputing Frontiers and Innovations / Vol: 2 Núm: 1 Par: 0 (2015) / Artículo

ARTÍCULO

TITULO

Neo-hetergeneous Programming and Parallelized Optimization of a Human Genome Re-sequencing Analysis Software Pipeline on TH-2 Supercomputer

Xiangke Liao

Shaoliang Peng

Yutong Lu

Yingbo Cui

Chengkun Wu

Heng Wang

Jiajun Wen

Resumen

The growing velocity of biological big data is way beyond Moore's Law of compute power growth. The amount of genomic data has been explosively accumulating, which calls for an enormous amount of computing power, while current computation methods cannot scale out with the data explosion. In this paper, we try to utilize huge computing resources to solve thebig dataproblems of genome processing on TH-2 supercomputer. TH-2supercomputer adopts neo-heterogeneous architecture and owns 16,000 compute nodes: 32000 Intel Xeon CPUs + 48000 Xeon Phi MICs. The heterogeneity, scalability, and parallel efficiency pose great challenges forthe deployment of the genomeanalysis software pipeline on TH-2. Runtime profiling shows that SOAP3-dp and SOAPsnp are the most time-consuming parts (up to 70% of total runtime) in the whole pipeline, which need parallelized optimization deeply and large-scale deployment. To address this issue, we first designa series of new parallel algorithms for SOAP3-dp and SOAPsnp, respectively, to eliminatethe spatial-temporal redundancy. Then we propose a CPU/MIC collaboratedparallel computing method in one node to fully fill the CPU/MIC time slots. We also propose a series ofscalable parallel algorithms and large scaleprogramming methods to reduce the amount of communications between different nodes. Moreover, we deploy and evaluate our works on the TH-2 supercomputer in different scales. At the most large scale, the whole process takes 8.37 hours using 8192 nodes to finish the analysis of a 300TB dataset of whole genome sequences from 2,000 human beings, which can take as long as 8 months on a commodity server. The speedup is about 700x.

Acceso

PÁGINAS

pp. 73 - 83

NÚMERO

Volumen: 2 Número: 1 Parte: 0 (2015)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Journal of Science and Applicative Technology
Inteligencia Artificial
Journal of Marine Science and Engineering

DOI

http://dx.doi.org/10.14529/jsfi150104

Artículos similares

Drowning in the Information Flood: Machine-Learning-Based Relevance Classification of Flood-Related Tweets for Disaster Management

Acceso

Eike Blomeier, Sebastian Schmidt and Bernd Resch

In the early stages of a disaster caused by a natural hazard (e.g., flood), the amount of available and useful information is low. To fill this informational gap, emergency responders are increasingly using data from geo-social media to gain insights fro... ver más

Revista: Information

Exploring the Feasibility of Autonomous Lighting Systems for Pedestrian Crossings in Off-Grid Areas

Acceso

Krzysztof Tomczuk, Piotr Tomczuk and Marcin Chrzanowicz

A properly designed and manufactured autonomous lighting system has an impact on reducing the amount of conflicts between pedestrians and drivers. For pedestrian crossings located outside of urban areas, one of the utilized solutions is PV installations ... ver más

Revista: Applied Sciences

Deep Learning Models for Waterfowl Detection and Classification in Aerial Images

Acceso

Yang Zhang, Yuan Feng, Shiqi Wang, Zhicheng Tang, Zhenduo Zhai, Reid Viegut, Lisa Webb, Andrew Raedeke and Yi Shang

Waterfowl populations monitoring is essential for wetland conservation. Lately, deep learning techniques have shown promising advancements in detecting waterfowl in aerial images. In this paper, we present performance evaluation of several popular superv... ver más

Revista: Information

Design of Safety Evaluation and Risk Traceability System for Agricultural Product Quality

Acceso

Chen Li, Yinxu Lu, Yong Bian, Jie Tian and Mu Yuan

The quality and safety of agricultural products involve a variety of risk factors, a large amount of risk information data, and multiple circulation and disposal processes, making it difficult to accurately trace the source of risks. To achieve precise t... ver más

Revista: Applied Sciences

An Unsupervised Learning Method for Suppressing Ground Roll in Deep Pre-Stack Seismic Data Based on Wavelet Prior Information for Deep Learning in Seismic Data

Acceso

Jiarui Xia and Yongshou Dai

Ground roll noise suppression is a crucial step in processing deep pre-stack seismic data. Recently, supervised deep learning methods have gained popularity in this field due to their ability to adaptively learn and extract powerful features. However, th... ver más

Revista: Applied Sciences

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas