Benford's law applied to digital forensic analysis

Fernandes, Pedro; Antunes, Mário

View/Open

Benford's law applied to digital forensic analysis.pdf (1.721Mb)

Date

2023-01-30

Author

Fernandes, Pedro

Antunes, Mário

Metadata

Show full item record

Abstract

Tampered digital multimedia content has been increasingly used in a wide set of cyberattacks, challenging criminal investigations and law enforcement authorities. The motivations are immense and range from the attempt to manipulate public opinion by disseminating fake news to digital kidnapping and ransomware, to mention a few cybercrimes that use this medium as a means of propagation. Digital forensics has recently incorporated a set of computational learning-based tools to automatically detect manipulations in digital multimedia content. Despite the promising results attained by machine learning and deep learning methods, these techniques require demanding computational resources and make digital forensic analysis and investigation expensive. Applied statistics techniques have also been applied to automatically detect anomalies and manipulations in digital multimedia content by statistically analysing the patterns and features. These techniques are computationally faster and have been applied isolated or as a member of a classifier committee to boost the overall artefact classification. This paper describes a statistical model based on Benford's Law and the results obtained with a dataset of 18000 photos, being 9000 authentic and the remaining manipulated. Benford's Law dates from the 18th century and has been successfully adopted in digital forensics, namely in fraud detection. In the present investigation, Benford's law was applied to a set of features (colours, textures) extracted from digital images. After extracting the first digits, the frequency with which they occurred in the set of values obtained from that extraction was calculated. This process allowed focusing the investigation on the behaviour with which the frequency of each digit occurred in comparison with the frequency expected by Benford's law. The method proposed in this paper for applying Benford's Law uses Pearson's and Spearman's correlations and Cramer-Von Mises (CVM) fitting model, applied to the first digit of a number consisting of several digits, obtained by extracting digital photos features through Fast Fourier Transform (FFT) method. The overall results obtained, although not exceeding those attained by machine learning approaches, namely Support Vector Machines (SVM) and Convolutional Neural Networks (CNN), are promising, reaching an average F1-score of 90.47% when using Pearson correlation. With non-parametric approaches, namely Spearman correlation and CVM fitting model, an F1-Score of 56.55% and 76.61% were obtained respectively. Furthermore, the Pearson's model showed the highest homogeneity compared to the Spearman's and CVM models in detecting manipulated images, 8526, and authentic ones, 7662, due to the strong correlation between the frequencies of each digit and the frequency expected by Benford's law. The results were obtained with different feature sets length, ranging from 3000 features to the totality of the features available in the digital image. However, the investigation focused on extracting 1000 features since it was concluded that increasing the features did not imply an improvement in the results. The results obtained with the model based on Benford's Law compete with those obtained from the models based on CNN and SVM, generating confidence regarding its application as decision support in a criminal investigation for the identification of manipulated images.

URI

https://research.thea.ie/handle/20.500.12065/4802

Collections

Articles - Department of Applied Sciences [38]

The following license files are associated with this item:

Creative Commons

Except where otherwise noted, this item's license is described as Attribution 3.0 United States