Benford's law applied to digital forensic analysis
Abstract
Tampered digital multimedia content has been increasingly used in a wide set of cyberattacks, challenging criminal investigations and law enforcement authorities. The motivations are immense and
range from the attempt to manipulate public opinion by disseminating fake news to digital kidnapping
and ransomware, to mention a few cybercrimes that use this medium as a means of propagation.
Digital forensics has recently incorporated a set of computational learning-based tools to automatically
detect manipulations in digital multimedia content. Despite the promising results attained by machine
learning and deep learning methods, these techniques require demanding computational resources and
make digital forensic analysis and investigation expensive. Applied statistics techniques have also been
applied to automatically detect anomalies and manipulations in digital multimedia content by statistically analysing the patterns and features. These techniques are computationally faster and have been
applied isolated or as a member of a classifier committee to boost the overall artefact classification.
This paper describes a statistical model based on Benford's Law and the results obtained with a dataset
of 18000 photos, being 9000 authentic and the remaining manipulated.
Benford's Law dates from the 18th century and has been successfully adopted in digital forensics,
namely in fraud detection. In the present investigation, Benford's law was applied to a set of features
(colours, textures) extracted from digital images. After extracting the first digits, the frequency with
which they occurred in the set of values obtained from that extraction was calculated. This process
allowed focusing the investigation on the behaviour with which the frequency of each digit occurred in
comparison with the frequency expected by Benford's law.
The method proposed in this paper for applying Benford's Law uses Pearson's and Spearman's correlations and Cramer-Von Mises (CVM) fitting model, applied to the first digit of a number consisting of
several digits, obtained by extracting digital photos features through Fast Fourier Transform (FFT) method.
The overall results obtained, although not exceeding those attained by machine learning approaches,
namely Support Vector Machines (SVM) and Convolutional Neural Networks (CNN), are promising, reaching
an average F1-score of 90.47% when using Pearson correlation. With non-parametric approaches, namely
Spearman correlation and CVM fitting model, an F1-Score of 56.55% and 76.61% were obtained respectively. Furthermore, the Pearson's model showed the highest homogeneity compared to the Spearman's
and CVM models in detecting manipulated images, 8526, and authentic ones, 7662, due to the strong
correlation between the frequencies of each digit and the frequency expected by Benford's law.
The results were obtained with different feature sets length, ranging from 3000 features to the totality
of the features available in the digital image. However, the investigation focused on extracting 1000
features since it was concluded that increasing the features did not imply an improvement in the results.
The results obtained with the model based on Benford's Law compete with those obtained from the
models based on CNN and SVM, generating confidence regarding its application as decision support in a
criminal investigation for the identification of manipulated images.
Collections
The following license files are associated with this item: