ARTÍCULO
TITULO

An Autonomic Performance Environment for Exascale

Kevin A. Huck    
Allan Porterfield    
Nick Chaimov    
Hartmut Kaiser    
Allen D. Malony    
Thomas Sterling    
Rob Fowler    

Resumen

Exascale systems will require  new approaches to performance observation, analysis, and runtime decision-making to optimize for performance and efficiency. The standard "first-person" model, in which multiple operating system processes and threads observe themselves and record first-person performance profiles or traces for offline analysis, is not adequate to observe and capture interactions at shared resources in highly concurrent, dynamic systems. Further, it does not support mechanisms for runtime adaptation. Our approach, called APEX (Autonomic Performance Environment for eXascale), provides mechanisms for sharing information among the layers of the software stack, including hardware, operating and runtime systems, and application code, both new and legacy. The performance measurement components share information  across layers, merging first-person data sets with information collected by  third-person tools observing shared hardware and software states at  node- and global-levels. Critically, APEX provides a policy engine designed to guide runtime adaptation mechanisms to make algorithmic changes, re-allocate resources, or change scheduling rules when appropriate conditions occur.