Redirigiendo al acceso original de articulo en 19 segundos...
Inicio  /  Future Internet  /  Vol: 14 Par: 8 (2022)  /  Artículo
ARTÍCULO
TITULO

Automatic Detection of Sensitive Data Using Transformer- Based Classifiers

Michael Petrolini    
Stefano Cagnoni and Monica Mordonini    

Resumen

The General Data Protection Regulation (GDPR) has allowed EU citizens and residents to have more control over their personal data, simplifying the regulatory environment affecting international business and unifying and homogenising privacy legislation within the EU. This regulation affects all companies that process data of European residents regardless of the place in which they are processed and their registered office, providing for a strict discipline of data protection. These companies must comply with the GDPR and be aware of the content of the data they manage; this is especially important if they are holding sensitive data, that is, any information regarding racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, data relating to the sexual life or sexual orientation of the person, as well as data on physical and mental health. These classes of data are hardly structured, and most frequently they appear within a document such as an email message, a review or a post. It is extremely difficult to know if a company is in possession of sensitive data at the risk of not protecting them properly. The goal of the study described in this paper is to use Machine Learning, in particular the Transformer deep-learning model, to develop classifiers capable of detecting documents that are likely to include sensitive data. Additionally, we want the classifiers to recognize the particular type of sensitive topic with which they deal, in order for a company to have a better knowledge of the data they own. We expect to make the model described in this paper available as a web service, customized to private data of possible customers, or even in a free-to-use version based on the freely available data set we have built to train the classifiers.

 Artículos similares

       
 
Giorgio Dell?Immagine, Jacopo Soldani and Antonio Brogi    
As microservice-based architectures are increasingly adopted, microservices security has become a crucial aspect to consider for IT businesses. Starting from a set of ?security smells? for microservice applications that were recently proposed in the lite... ver más
Revista: Future Internet

 
Pradeep Kumar Jena, Bonomali Khuntia, Charulata Palai, Manjushree Nayak, Tapas Kumar Mishra and Sachi Nandan Mohanty    
Automatic screening of diabetic retinopathy (DR) is a well-identified area of research in the domain of computer vision. It is challenging due to structural complexity and a marginal contrast difference between the retinal vessels and the background of t... ver más

 
Manar M. F. Donia, Wessam H. El-Behaidy and Aliaa A. A. Youssif    
The study of human behaviors aims to gain a deeper perception of stimuli that control decision making. To describe, explain, predict, and control behavior, human behavior can be classified as either non-aggressive or anomalous behavior. Anomalous behavio... ver más

 
Renata Duraciová    
The mutual identification of spatial objects is a fundamental issue when updating geographic data with other data sets. Representations of spatial objects in different sources may not have the same identifiers, which would unambiguously assign them to ea... ver más

 
Alessandro Nalin, Andrea Simone, Claudio Lantieri, Umberto Rosatella, Giulio Dondi and Valeria Vignali    
The need for clear and updated information is pivotal when authorities plan and perform routinary, periodic and emergency maintenance of both road network and their roadside assets, e.g., curbs, signals, and barriers. With particular regard to road barri... ver más
Revista: Infrastructures