Resumen
Recent advancements of Generative Adversarial Networks (GANs) pose emerging yet serious privacy risks threatening digital media?s integrity and trustworthiness, specifically digital video, through synthesizing hyper-realistic images and videos, i.e., DeepFakes. The need for ascertaining the trustworthiness of digital media calls for automatic yet accurate DeepFake detection algorithms. This paper presents an attention-based DeepFake detection (ADD) method that exploits the fine-grained and spatial locality attributes of artificially synthesized videos for enhanced detection. ADD framework is composed of two main components including face close-up and face shut-off data augmentation methods and is applicable to any classifier based on convolutional neural network architecture. ADD first locates potentially manipulated areas of the input image to extract representative features. Second, the detection model is forced to pay more attention to these forgery regions in the decision-making process through a particular focus on interpreting the sample in the learning phase. ADD?s performance is evaluated against two challenging datasets of DeepFake forensics, i.e., Celeb-DF (V2) and WildDeepFake. We demonstrated the generalization of ADD by evaluating four popular classifiers, namely VGGNet, ResNet, Xception, and MobileNet. The obtained results demonstrate that ADD can boost the detection performance of all four baseline classifiers significantly on both benchmark datasets. Particularly, ADD with ResNet backbone detects DeepFakes with more than 98.3% on Celeb-DF (V2), outperforming state-of-the-art DeepFake detection methods.