Redirigiendo al acceso original de articulo en 21 segundos...
ARTÍCULO
TITULO

Model-Driven One-Sided Factorizations on Multicore Accelerated Systems

Jack Dongarra    
Azzam Haidar    
Jakub Kurzak    
Piotr Luszczek    
Stanimire Tomov    
Asim YarKhan    

Resumen

Hardware heterogeneity of the HPC platforms is no longer considered unusual but instead have become the most viable way forward towards Exascale.  In fact, the multitude of the heterogeneous resources available to modern computers are designed for different workloads and their efficient use is closely aligned with the specialized role envisaged by their design.  Commonly in order to efficiently use such GPU resources, the workload in question must have a much greater degree of parallelism than workloads often associated with multicore processors (CPUs).  Available GPU variants differ in their internal architecture and, as a result, are capable of handling workloads of varying degrees of complexity and a range of computational patterns.  This vast array of applicable workloads will likely lead to an ever accelerated mixing of multicore-CPUs and GPUs in multi-user environments with the ultimate goal of offering adequate computing facilities for a wide range of scientific and technical workloads.  In the following paper, we present a research prototype that uses a lightweight runtime environment to manage the resource-specific workloads, and to control the dataflow and parallel execution in hybrid systems.  Our lightweight runtime environment uses task superscalar concepts to enable the developer to write serial code while providing parallel execution.  This concept is reminiscent of dataflow and systolic architectures in its conceptualization of a workload as a set of side-effect-free tasks that pass data items whenever the associated work assignment have been completed.  Additionally, our task abstractions and their parametrization enable uniformity in the algorithmic development across all the heterogeneous resources without sacrificing precious compute cycles.  We include performance results for dense linear algebra functions which demonstrate the practicality and effectiveness of our approach that is aptly capable of full utilization of a wide range of accelerator hardware.

 Artículos similares

       
 
Davy Preuveneers and Wouter Joosen    
This paper presents the architecture, implementation and evaluation of a middleware support layer for NoSQL storage systems. Our middleware automatically selects performance and scalability tactics in terms of application specific workloads. Enterprises ... ver más
Revista: Informatics

 
Jakob Lüttgau,Michael Kuhn,Kira Duwe,Yevhen Alforov,Eugen Betke,Julian Kunkel,Thomas Ludwig     Pág. 31 - 58
In current supercomputers, storage is typically provided by parallel distributed file systems for hot data and tape archives for cold data. These file systems are often compatible with local file systems due to their use of the POSIX interface and semant... ver más

 
Julia Siderska, Katarzyna Perkowska     Pág. 89 - 96
The aim of this work is to present and discuss the possibility of using computer simulation to improve the production flow of sheet metal screws in the carpentry plant. The paper includes descriptive and schematic characterization of the present producti... ver más

 
Abadi Dwi Saputra     Pág. 37 - 43
This type of activity or work with high stress level and requires more concentration and attention, in this case is the aircraft operation. Thereby mental workload is the most dominant than the physical workload. And this is what should have been a conce... ver más

 
Julian Martin Kunkel     Pág. 34 - 39
Understanding the characteristics of data stored in data centers helps computer scientists in identifying the most suitable storage infrastructure to deal with these workloads. For example, knowing the relevance of file formats allows optimizing the rele... ver más