Redirigiendo al acceso original de articulo en 15 segundos...
Inicio  /  Informatics  /  Vol: 5 Par: 1 (2018)  /  Artículo
ARTÍCULO
TITULO

Using Introspection to Collect Provenance in R

Barbara Lerner    
Emery Boose and Luis Perez    

Resumen

Data provenance is the history of an item of data from the point of its creation to its present state. It can support science by improving understanding of and confidence in data. RDataTracker is an R package that collects data provenance from R scripts (https://github.com/End-to-end-provenance/RDataTracker). In addition to details on inputs, outputs, and the computing environment collected by most provenance tools, RDataTracker also records a detailed execution trace and intermediate data values. It does this using R?s powerful introspection functions and by parsing R statements prior to sending them to the interpreter so it knows what provenance to collect. The provenance is stored in a specialized graph structure called a Data Derivation Graph, which makes it possible to determine exactly how an output value is computed or how an input value is used. In this paper, we provide details about the provenance RDataTracker collects and the mechanisms used to collect it. We also speculate about how this rich source of information could be used by other tools to help an R programmer gain a deeper understanding of the software used and to support reproducibility.

 Artículos similares

       
 
Tristan Langer and Tobias Meisen    
Exploratory data analysis (EDA) is an iterative process where data scientists interact with data to extract information about their quality and shape as well as derive knowledge and new insights into the related domain of the dataset. However, data scien... ver más
Revista: Information

 
Eniko Bitay, Irina Kacsó, Claudiu Tanaselia, Dana Toloman, Gheorghe Borodi, Szilamér-Péter Pánczél, Zsombor Kisfaludi-Bak and Erzsébet Veress    
Iron slag samples unearthed from archaeological sites lying on the Eastern limes sector of Roman Dacia (the Brâncovenesti and Calugareni auxiliary forts and the Vatava watchtower) were studied in order to assess the probability of local iron working (sme... ver más
Revista: Applied Sciences

 
Yuncheng Shen, Bing Guo, Yan Shen, Fan Wu, Hong Zhang, Xuliang Duan and Xiangqian Dong    
Data have become an important asset. Mining the value contained in personal data, making personal data an exchangeable commodity, has become a hot spot of industry research. Then, how to price personal data reasonably becomes a problem we have to face. B... ver más
Revista: Applied Sciences

 
Timothy Sands    
Objects that possess mass (e.g., automobiles, manufactured items, etc.) translationally accelerate in direct proportion to the force applied scaled by the object?s mass in accordance with Newton?s Law, while the rotational companion is Euler?s moment equ... ver más
Revista: Algorithms

 
Delmar B. Davis, Jonathan Featherston, Hoa N. Vo, Munehiro Fukuda and Hazeline U. Asuncion    
Agent-Based Models (ABMs) assist with studying emergent collective behavior of individual entities in social, biological, economic, network, and physical systems. Data provenance can support ABM by explaining individual agent behavior. However, there is ... ver más
Revista: Informatics