Redirigiendo al acceso original de articulo en 15 segundos...
ARTÍCULO
TITULO

Record-and-Replay Techniques for HPC Systems: A Survey

Dylan Chapp    
Kento Sato    
Dong H Ahn    
Michela Taufer    

Resumen

Record-and-replay techniques provide the ability to record executions of nondeterministic applications and re-execute them identically. These techniques find use in the contexts of debugging, reproducibility, and fault-tolerance, especially in the presence of nondeterministic factors such as message races. Record-and-replay techniques are highly diverse in terms of the fidelity of replay they provide, the assumptions they make about the recorded application, the programming models they target, and the runtime overheads they impose.                                                                           In the high performance computing (HPC) environment, all the above factors must be considered in concert, thus presenting additional implementation challenges. In this manuscript, we survey record-and-replay techniques in terms of the programming models they target and the workloads on which they were evaluated, providing a categorization of these techniques benefiting application developers and researchers targeting exascale challenges. This manuscript answers three questions through this survey: What are the gaps in the existing space of record-and-replay techniques? What is the roadmap to widespread use of record-and-replay on production-scale HPC workloads? And, what are the critical open problems that must be addressed to make record-and-replay viable at exascale?                                             Keywords: Reproducibility, nondeterminism, fault-tolerance, exascale, message-passing, shared memory, proxy application, HPC benchmarks

 Artículos similares

       
 
Kultigin Demirlioglu and Emrah Erduran    
Bridges serve as vital engineering structures crafted to facilitate secure and effective transportation networks. Throughout their life-cycle, they withstand various factors, including diverse environmental conditions, natural hazards, and substantial lo... ver más
Revista: Applied Sciences

 
Eugenia I. Toki, Jenny Pange, Giorgos Tatsis, Konstantinos Plachouras and Ioannis G. Tsoulos    
Autism Spectrum Disorder is known to cause difficulties in social interaction and communication, as well as repetitive patterns of behavior, interests, or hobbies. These challenges can significantly affect the individual?s daily life. Therefore, it is cr... ver más
Revista: Applied Sciences

 
Mohamed Shenify, Fokrul Alom Mazarbhuiya and A. S. Wungreiphi    
There are many applications of anomaly detection in the Internet of Things domain. IoT technology consists of a large number of interconnecting digital devices not only generating huge data continuously but also making real-time computations. Since IoT d... ver más
Revista: Applied Sciences

 
Ahmed Sewify, Maria Antico, Marian Steffens, Jacqueline Roots, Ashish Gupta, Kenneth Cutbush, Peter Pivonka and Davide Fontanarosa    
A protocol is proposed to acquire a tomographic ultrasound (US) scan of the musculoskeletal (MSK) anatomy in the rotator cuff region. Current clinical US imaging techniques are hindered by occlusions and a narrow field of view and require expert acquisit... ver más
Revista: Applied Sciences

 
Jian Wang, Ze Chen, Linghao Li, Chuan Wang, Kangle Teng, Qiang He, Jiren Zhou, Shanshan Li, Weidong Cao, Xiuli Wang and Hongliang Wang    
Submersible tubular pumps are an ideal choice for pump stations that require high flow rates and low lift. These pumps combine the unique features of submersible motors with axial flow pump technology, making them highly efficient and cost-effective. The... ver más
Revista: Water