REVISTA
Future Internet

TODAS

Redirigiendo al acceso original de articulo en 19 segundos...

Inicio / Future Internet / Vol: 14 Par: 8 (2022) / Artículo

ARTÍCULO

TITULO

Automatic Detection of Sensitive Data Using Transformer- Based Classifiers

Michael Petrolini

Stefano Cagnoni and Monica Mordonini

Resumen

The General Data Protection Regulation (GDPR) has allowed EU citizens and residents to have more control over their personal data, simplifying the regulatory environment affecting international business and unifying and homogenising privacy legislation within the EU. This regulation affects all companies that process data of European residents regardless of the place in which they are processed and their registered office, providing for a strict discipline of data protection. These companies must comply with the GDPR and be aware of the content of the data they manage; this is especially important if they are holding sensitive data, that is, any information regarding racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, data relating to the sexual life or sexual orientation of the person, as well as data on physical and mental health. These classes of data are hardly structured, and most frequently they appear within a document such as an email message, a review or a post. It is extremely difficult to know if a company is in possession of sensitive data at the risk of not protecting them properly. The goal of the study described in this paper is to use Machine Learning, in particular the Transformer deep-learning model, to develop classifiers capable of detecting documents that are likely to include sensitive data. Additionally, we want the classifiers to recognize the particular type of sensitive topic with which they deal, in order for a company to have a better knowledge of the data they own. We expect to make the model described in this paper available as a web service, customized to private data of possible customers, or even in a free-to-use version based on the freely available data set we have built to train the classifiers.

Palabras claves

GDPR - sensitive data - personal data - natural language processing - BERT - transformers

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 14 Parte: 8 (2022)

MATERIAS

INFRAESTRUCTURA

REVISTAS SIMILARES

Buildings
ISPRS International Journal of Geo-Information
Big Data and Cognitive Computing

DOI

https://doi.org/10.3390/fi14080228

Artículos similares

KubeHound: Detecting Microservices? Security Smells in Kubernetes Deployments

Acceso

Giorgio Dell?Immagine, Jacopo Soldani and Antonio Brogi

As microservice-based architectures are increasingly adopted, microservices security has become a crucial aspect to consider for IT businesses. Starting from a set of ?security smells? for microservice applications that were recently proposed in the lite... ver más

Revista: Future Internet

A Novel Approach for Diabetic Retinopathy Screening Using Asymmetric Deep Learning Features

Acceso

Pradeep Kumar Jena, Bonomali Khuntia, Charulata Palai, Manjushree Nayak, Tapas Kumar Mishra and Sachi Nandan Mohanty

Automatic screening of diabetic retinopathy (DR) is a well-identified area of research in the domain of computer vision. It is challenging due to structural complexity and a marginal contrast difference between the retinal vessels and the background of t... ver más

Revista: Big Data and Cognitive Computing

Impulsive Aggression Break, Based on Early Recognition Using Spatiotemporal Features

Acceso

Manar M. F. Donia, Wessam H. El-Behaidy and Aliaa A. A. Youssif

The study of human behaviors aims to gain a deeper perception of stimuli that control decision making. To describe, explain, predict, and control behavior, human behavior can be classified as either non-aggressive or anomalous behavior. Anomalous behavio... ver más

Revista: Big Data and Cognitive Computing

An Aggregated Shape Similarity Index: A Case Study of Comparing the Footprints of OpenStreetMap and INSPIRE Buildings

Acceso

Renata Duraciová

The mutual identification of spatial objects is a fundamental issue when updating geographic data with other data sets. Representations of spatial objects in different sources may not have the same identifiers, which would unambiguously assign them to ea... ver más

Revista: ISPRS International Journal of Geo-Information

Indexing the Maintenance Priority of Road Safety Barriers in Urban and Peri-Urban Contexts: Application of a Ranking Methodology in Bologna, Italy

Acceso

Alessandro Nalin, Andrea Simone, Claudio Lantieri, Umberto Rosatella, Giulio Dondi and Valeria Vignali

The need for clear and updated information is pivotal when authorities plan and perform routinary, periodic and emergency maintenance of both road network and their roadside assets, e.g., curbs, signals, and barriers. With particular regard to road barri... ver más

Revista: Infrastructures

Revistas destacadas

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas