Comparison of CNN, ResNet50, and Xception for Deepfake Image Detection

Rachmat; Mohammad Zainuddin; Handini Arga Damar Rani

doi:10.36526/ztr.v8i1.7524

Authors

Rachmat Universitas Pejuang Republik Indonesia
Mohammad Zainuddin Institut Teknologi dan Bisnis Asia Malang
Handini Arga Damar Rani Universitas IVET Semarang

DOI:

https://doi.org/10.36526/ztr.v8i1.7524

Keywords:

Deepfake, Convolutional Neural Network (CNN), ResNet50, Xception, Transfer Learning

Abstract

This study compares the performance of three deep learning architectures—Convolutional Neural Network , ResNet50, and Xception—for frame-based deepfake image detection and identifies the most effective model in terms of accuracy, precision, recall, F1-score, and generalization. The study followed the Knowledge Discovery in Databases (KDD) framework using the Deepfake Detection Dataset (DFD Entire Original) from Kaggle, which consists of 3,432 videos, including 3,068 fake and 364 real videos. Videos were converted into frames using OpenCV, followed by face detection and cropping using MTCNN. The resulting face images were resized to 224×224 pixels, normalized, augmented, and labeled. To reduce classification bias caused by class imbalance, the training data were balanced using random undersampling, resulting in real frames and fake frames. The dataset was then split into training, validation, and testing sets using a stratified 60:20:20 ratio. The results show that Xception achieved the best performance among the three models, with an accuracy of 95.21%, precision of 0.95, recall of 0.95, and F1-score of 0.95, followed by ResNet50 with an accuracy of 93.42% and CNN with an accuracy of 87.65%. These findings indicate that transfer learning-based architectures, particularly Xception, are more effective than conventional CNNs for deepfake image detection under a consistent experimental setting. This study is limited to a single dataset and frame-based evaluation, thus future work will explore the potential of hybrid models, such as Vision Transformer (ViT) combined with Capsule Networks , to improve detection performance and address challenges like temporal analysis and cross-dataset validation.

Comparison of CNN, ResNet50, and Xception for Deepfake Image Detection

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

HOME PAGE

visitor

referencemanagementsoftware

format

indexe

Developed By