Document forgery detection is becoming
increasingly important in the current era, as forgery techniques
are available to even inexperienced users. Source printer
identification is a method for identifying the source printer and
classifying the questioned document into one of the printer
classes. According to what we know, most earlier studies
segmented documents into characters, words, and patches or
cropped them to obtain large datasets. In contrast, in this paper,
we worked with the document as a whole and a small dataset.
This paper uses three techniques dependent on CNN to find the
document source printer without segmenting the document into
characters, words, or patches and with small datasets. Three
separate datasets of 1185, 1200, and 2385 documents are used to
estimate the performance of the suggested techniques. In the first
technique, 13 pre-trained CNN were tested, and they were only
used for feature extraction, while SVM was used for
classification. In the second technique, a pre-trained neural
network is retrained using transfer learning for feature
extraction and classification. In the third technique, CNN is
trained from scratch and then used for feature extraction and
SVM for classification. Many experiments are done in the three
techniques, showing that the third technique gives the best result.
This technique achieved 99.16%, 99.58%, and 98.3% accuracy
for datasets 1, 2, and 3. The three techniques are compared with
some previously published papers, and found that the third
technique gives better results. |