Inicio  /  Applied Sciences  /  Vol: 11 Par: 13 (2021)  /  Artículo
ARTÍCULO
TITULO

Improvements to Supercomputing Service Availability Based on Data Analysis

Jae-Kook Lee    
Min-Woo Kwon    
Do-Sik An    
Junweon Yoon    
Taeyoung Hong    
Joon Woo    
Sung-Jun Kim and Guohua Li    

Resumen

As the demand for high-performance computing (HPC) resources has increased in the field of computational science, an inevitable consideration is service availability in large cluster systems such as supercomputers. In particular, the factor that most affects availability in supercomputing services is the job scheduler utilized for allocating resources. Consequent to submitting user data through the job scheduler for data analysis, 25.6% of jobs failed because of program errors, scheduler errors, or I/O errors. Based on this analysis, we propose a K-hook method for scheduling to increase the success rate of job submissions and improve the availability of supercomputing services. By applying this method, the job-submission success rate was improved by 15% without negatively affecting users? waiting time. We also achieved a mean time between interrupts (MTBI) of 24.3 days and maintained average system availability at 97%. As this research was verified on the Nurion supercomputer in a real service environment, the value of the research is expected to be found in significant service improvements.

 Artículos similares

       
 
Yaser Al Mtawa    
High availability is vital for network operators to ensure reliable services. Network faults can disrupt functionality and require quick recovery. Multipath networking enhances availability through load balancing and optimal link utilization. However, eq... ver más
Revista: Computation

 
Ali Aghazadeh Ardebili, Antonio Ficarella, Antonella Longo, Adem Khalil and Sabri Khalil    
Autonomous aircraft are the key enablers of future urban services, such as postal and transportation systems. Digital twins (DTs) are promising cutting-edge technologies that can transform the future transport ecosystem into an autonomous and resilient s... ver más
Revista: Aerospace

 
Qifeng Mou, Ze Yang and Liming Zhang    
To effectively estimate and optimize the airport terminal maneuvering area throughput based on the equilibrium of air traffic service resource supply and demand, this research proposes an approach to assess terminal maneuvering area capacity from the per... ver más
Revista: Aerospace

 
Benedikt Badanik, Rebeka Remenysegova and Antonin Kazda    
This paper focuses on the analysis of traditional methods of service quality evaluation and represents a new sentimental approach to airline service quality evaluation employing user-generated content. It identifies aspects of airline service that passen... ver más
Revista: Aerospace

 
Kwangseob Kim and Kiwon Lee    
This study introduces a multi-cloud model that combines private and public cloud services for processing and managing satellite images. The multi-cloud service is established by incorporating private clouds within organizations and integrating them with ... ver más
Revista: Applied Sciences