Redirigiendo al acceso original de articulo en 15 segundos...
ARTÍCULO
TITULO

Record-and-Replay Techniques for HPC Systems: A Survey

Dylan Chapp    
Kento Sato    
Dong H Ahn    
Michela Taufer    

Resumen

Record-and-replay techniques provide the ability to record executions of nondeterministic applications and re-execute them identically. These techniques find use in the contexts of debugging, reproducibility, and fault-tolerance, especially in the presence of nondeterministic factors such as message races. Record-and-replay techniques are highly diverse in terms of the fidelity of replay they provide, the assumptions they make about the recorded application, the programming models they target, and the runtime overheads they impose.                                                                           In the high performance computing (HPC) environment, all the above factors must be considered in concert, thus presenting additional implementation challenges. In this manuscript, we survey record-and-replay techniques in terms of the programming models they target and the workloads on which they were evaluated, providing a categorization of these techniques benefiting application developers and researchers targeting exascale challenges. This manuscript answers three questions through this survey: What are the gaps in the existing space of record-and-replay techniques? What is the roadmap to widespread use of record-and-replay on production-scale HPC workloads? And, what are the critical open problems that must be addressed to make record-and-replay viable at exascale?                                             Keywords: Reproducibility, nondeterminism, fault-tolerance, exascale, message-passing, shared memory, proxy application, HPC benchmarks

 Artículos similares

       
 
Mohamed Shenify, Fokrul Alom Mazarbhuiya and A. S. Wungreiphi    
There are many applications of anomaly detection in the Internet of Things domain. IoT technology consists of a large number of interconnecting digital devices not only generating huge data continuously but also making real-time computations. Since IoT d... ver más
Revista: Applied Sciences

 
Alya Alshammari and Khalil El Hindi    
The combination of collaborative deep learning and Cyber-Physical Systems (CPSs) has the potential to improve decision-making, adaptability, and efficiency in dynamic and distributed environments. However, it brings privacy, communication, and resource r... ver más
Revista: Applied Sciences

 
Haidi Badr, Nayer Wanas and Magda Fayek    
Unsupervised domain adaptation (UDA) presents a significant challenge in sentiment analysis, especially when faced with differences between source and target domains. This study introduces Weighted Sequential Unsupervised Domain Adaptation (WS-UDA), a no... ver más
Revista: Applied Sciences

 
Sadiq Gbagba, Lorenzo Maccioni and Franco Concli    
In the shipbuilding, construction, automotive, and aerospace industries, welding is still a crucial manufacturing process because it can be utilized to create massive, intricate structures with exact dimensional specifications. These kinds of structures ... ver más
Revista: Applied Sciences

 
Eugenia I. Toki, Jenny Pange, Giorgos Tatsis, Konstantinos Plachouras and Ioannis G. Tsoulos    
Autism Spectrum Disorder is known to cause difficulties in social interaction and communication, as well as repetitive patterns of behavior, interests, or hobbies. These challenges can significantly affect the individual?s daily life. Therefore, it is cr... ver más
Revista: Applied Sciences