Redirigiendo al acceso original de articulo en 21 segundos...
ARTÍCULO
TITULO

Analyzing Data Properties using Statistical Sampling ? Illustrated on Scientific File Formats

Julian Martin Kunkel    

Resumen

Understanding the characteristics of data stored in data centers helps computer scientists in identifying the most suitable storage infrastructure to deal with these workloads. For example, knowing the relevance of file formats allows optimizing the relevant formats but also helps in a procurement to define benchmarks that cover these formats. Existing studies that investigate performance improvements and techniques for data reduction such as deduplication and compression operate on a subset of data. Some of those studies claim the selected data is representative and scale their result to the scale of the data center. One hurdle of running novel schemes on the complete data is the vast amount of data stored and, thus, the resources required to analyze the complete data set. Even if this would be feasible, the costs for running many of those experiments must be justified.This paper investigates stochastic sampling methods to compute and analyze quantities of interest on file numbers but also on the occupied storage space. It will be demonstrated that on our production system, scanning 1% of files and data volume is sufficient to deduct conclusions. This speeds up the analysis process and reduces costs of such studies significantly.

 Artículos similares

       
 
Lin Xu, Shanxiu Ma, Zhiyuan Shen, Shiyu Huang and Ying Nan    
In order to determine the fatigue state of air traffic controllers from air talk, an algorithm is proposed for discriminating the fatigue state of controllers based on applying multi-speech feature fusion to voice data using a Fuzzy Support Vector Machin... ver más
Revista: Aerospace

 
Jacek G. Puchalski, Janusz D. Fidelus and Pawel Fotowicz    
One of the fundamental challenges in analyzing wind turbine performance is the occurrence of torque creep under load and without load. This phenomenon significantly impacts the proper functioning of torque transducers, thus necessitating the utilization ... ver más
Revista: Algorithms

 
Syed Safdar Hussain and Syed Sajjad Haider Zaidi    
This study introduces a novel predictive methodology for diagnosing and predicting gear problems in DC motors. Leveraging AdaBoost with weak classifiers and regressors, the diagnostic aspect categorizes the machine?s current operational state by analyzin... ver más
Revista: Applied Sciences

 
Mengyu Sun, Jianxin Liu, Jian Ou, Rong Liu and Ling Zhu    
Electrical resistivity tomography is a non-destructive and efficient geophysical exploration method that can effectively reveal the geological structure and sliding surface characteristics inside landslide bodies. This is crucial for analyzing the stabil... ver más
Revista: Applied Sciences

 
Mingxin Zou, Yanqing Zhou, Xinhua Jiang, Julin Gao, Xiaofang Yu and Xuelei Ma    
Field manual labor behavior recognition is an important task that applies deep learning algorithms to industrial equipment for capturing and analyzing people?s behavior during field labor. In this study, we propose a field manual labor behavior recogniti... ver más
Revista: Applied Sciences