Multimodal Deepfake Detection Using Transformer-Based Large Language Models: A Path Toward Secure Media and Clinical Integrity

Sanjida Akter Tisha; Jannatul Ferdous Shorna; Sadia Afrin; Eftekhar Hossain Ayon

doi:10.37547/tajet/Volume07Issue05-16

Engineering and Technology | Open Access | DOI: https://doi.org/10.37547/tajet/Volume07Issue05-16

Multimodal Deepfake Detection Using Transformer-Based Large Language Models: A Path Toward Secure Media and Clinical Integrity

Sanjida Akter Tisha , Master of Science in Information Technology, Washington University of Science and Technology, USA
Jannatul Ferdous Shorna , College of Engineering and Computer Science, Florida Atlantic University, Boca Raton, Florida
Sadia Afrin , Department of Computer & Information Science, Gannon University, USA
Eftekhar Hossain Ayon , Department of Computer & Info Science, Gannon University, Erie, Pennsylvania, USA

Download PDF

Published Date 2025-05-24

Pages 169-177

Abstract

Deepfakes pose a significant threat across various domains by generating highly realistic manipulated audio-visual content, with critical implications for security and clinical environments. This paper presents a robust multimodal deepfake detection framework powered by transformer-based large language models (LLMs) that effectively analyze and integrate visual, auditory, and textual modalities. Utilizing the FakeAVCeleb dataset, we compare our proposed model with traditional machine learning and deep learning methods, including Logistic Regression, Support Vector Machine (SVM), Random Forest, and Long Short-Term Memory (LSTM) networks. Experimental results demonstrate that the transformer-based model significantly outperforms others, achieving an accuracy of 96.55%, precision of 96.47%, recall of 96.50%, F1-score of 96.48%, and an AUC of 0.97. This enhanced performance is attributed to the model’s ability to capture complex semantic and temporal dependencies across modalities. The findings suggest the proposed model’s strong potential for real-world applications such as telemedicine, clinical video authentication, and digital identity verification, establishing a promising direction for deploying deepfake detection technologies in sensitive and high-stakes environments.

Keywords

Deepfake detection, multimodal fusion, transformer models, large language models, FakeAVCeleb dataset, , telemedicine security, artificial intelligence, audio-visual manipulation, clinical data integrity, digital forensics

References

Y. Afchar, V. Nozick, J. Yamagishi, and I. Echizen, "MesoNet: A Compact Facial Video Forgery Detection Network," in Proc. IEEE Int. Workshop Inf. Forensics Security (WIFS), 2018, pp. 1–7.

Y. Li, M.-C. Chang, and S. Lyu, "In Ictu Oculi: Exposing AI Generated Fake Face Videos by Detecting Eye Blinking," in Proc. IEEE Int. Workshop Inf. Forensics Security (WIFS), 2018, pp. 1–7.

E. Sabir, J. Cheng, A. Jaiswal, W. AbdAlmageed, I. Masi, and P. Natarajan, "Recurrent Convolutional Strategies for Face Manipulation Detection in Videos," Interfaces, arXiv preprint arXiv:1905.00582, 2019.

H. H. Nguyen, J. Yamagishi, and I. Echizen, "Capsule-Forensics: Using Capsule Networks to Detect Forged Images and Videos," in ICASSP 2019 - IEEE Int. Conf. Acoust., Speech Signal Process., 2019, pp. 2307–2311.

F. Matern, C. Riess, and M. Stamminger, "Exploiting Visual Artifacts to Expose Deepfakes and Face Manipulations," in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), 2019, pp. 83–92.

P. Korshunov and S. Marcel, "VoxCeleb2 Dataset for Deepfake Detection," Comput. Speech Lang., vol. 64, 2020, pp. 101097.

K. Jawahar, B. Sagot, and D. Seddah, "What Does BERT Learn About the Structure of Language?" in ACL 2019 - Proc. 57th Annu. Meet. Assoc. Comput. Linguist., 2019, pp. 3651–3657.

L. Verdoliva, "Media Forensics and DeepFakes: An Overview," IEEE J. Sel. Top. Signal Process., vol. 14, no. 5, pp. 910–932, Aug. 2020.

T. Mittal, R. Bhattacharya, A. Chandra, and A. Bera, "Emotions Don't Lie: Multi-Modal Emotion-Based Deepfake Detection," in Proc. 28th ACM Int. Conf. Multimedia, 2020, pp. 2823–2832.

Das, P., Pervin, T., Bhattacharjee, B., Karim, M. R., Sultana, N., Khan, M. S., ... & Kamruzzaman, F. N. U. (2024). OPTIMIZING REAL-TIME DYNAMIC PRICING STRATEGIES IN RETAIL AND E-COMMERCE USING MACHINE LEARNING MODELS. The American Journal of Engineering and Technology, 6(12), 163-177.

Hossain, M. N., Hossain, S., Nath, A., Nath, P. C., Ayub, M. I., Hassan, M. M., ... & Rasel, M. (2024). ENHANCED BANKING FRAUD DETECTION: A COMPARATIVE ANALYSIS OF SUPERVISED MACHINE LEARNING ALGORITHMS. American Research Index Library, 23-35.

Rishad, S. S. I., Shakil, F., Tisha, S. A., Afrin, S., Hassan, M. M., Choudhury, M. Z. M. E., & Rahman, N. (2025). LEVERAGING AI AND MACHINE LEARNING FOR PREDICTING, DETECTING, AND MITIGATING CYBERSECURITY THREATS: A COMPARATIVE STUDY OF ADVANCED MODELS. American Research Index Library, 6-25.

Uddin, A., Pabel, M. A. H., Alam, M. I., KAMRUZZAMAN, F., Haque, M. S. U., Hosen, M. M., ... & Ghosh, S. K. (2025). Advancing Financial Risk Prediction and Portfolio Optimization Using Machine Learning Techniques. The American Journal of Management and Economics Innovations, 7(01), 5-20.

Nguyen, Q. G., Nguyen, L. H., Hosen, M. M., Rasel, M., Shorna, J. F., Mia, M. S., & Khan, S. I. (2025). Enhancing Credit Risk Management with Machine Learning: A Comparative Study of Predictive Models for Credit Default Prediction. The American Journal of Applied sciences, 7(01), 21-30.

Bhattacharjee, B., Mou, S. N., Hossain, M. S., Rahman, M. K., Hassan, M. M., Rahman, N., ... & Haque, M. S. U. (2024). MACHINE LEARNING FOR COST ESTIMATION AND FORECASTING IN BANKING: A COMPARATIVE ANALYSIS OF ALGORITHMS. Frontline Marketing,Management and Economics Journal, 4(12), 66-83.

Hossain, S., Siddique, M. T., Hosen, M. M., Jamee, S. S., Akter, S., Akter, P., ... & Khan, M. S. (2025). Comparative Analysis of Sentiment Analysis Models for Consumer Feedback: Evaluating the Impact of Machine Learning and Deep Learning Approaches on Business Strategies. Frontline Social Sciences and History Journal, 5(02), 18-29.

Nath, F., Chowdhury, M. O. S., & Rhaman, M. M. (2023). Navigating produced water sustainability in the oil and gas sector: A Critical review of reuse challenges, treatment technologies, and prospects ahead. Water, 15(23), 4088.

PHAN, H. T. N., & AKTER, A. (2024). HYBRID MACHINE LEARNING APPROACH FOR ORAL CANCER DIAGNOSIS AND CLASSIFICATION USING HISTOPATHOLOGICAL IMAGES. Universal Publication Index e-Library, 63-76.

Hossain, S., Siddique, M. T., Hosen, M. M., Jamee, S. S., Akter, S., Akter, P., ... & Khan, M. S. (2025). Comparative Analysis of Sentiment Analysis Models for Consumer Feedback: Evaluating the Impact of Machine Learning and Deep Learning Approaches on Business Strategies. Frontline Social Sciences and History Journal, 5(02), 18-29.

Nath, F., Asish, S., Debi, H. R., Chowdhury, M. O. S., Zamora, Z. J., & Muñoz, S. (2023, August). Predicting hydrocarbon production behavior in heterogeneous reservoir utilizing deep learning models. In Unconventional Resources Technology Conference, 13–15 June 2023 (pp. 506-521). Unconventional Resources Technology Conference (URTeC).

Ahmmed, M. J., Rahman, M. M., Das, A. C., Das, P., Pervin, T., Afrin, S., ... & Rahman, N. (2024). COMPARATIVE ANALYSIS OF MACHINE LEARNING ALGORITHMS FOR BANKING FRAUD DETECTION: A STUDY ON PERFORMANCE, PRECISION, AND REAL-TIME APPLICATION. American Research Index Library, 31-44.

Al-Imran, M., Ayon, E. H., Islam, M. R., Mahmud, F., Akter, S., Alam, M. K., ... & Aziz, M. M. (2024). TRANSFORMING BANKING SECURITY: THE ROLE OF DEEP LEARNING IN FRAUD DETECTION SYSTEMS. The American Journal of Engineering and Technology, 6(11), 20-32.

Akhi, S. S., Shakil, F., Dey, S. K., Tusher, M. I., Kamruzzaman, F., Jamee, S. S., ... & Rahman, N. (2025). Enhancing Banking Cybersecurity: An Ensemble-Based Predictive Machine Learning Approach. The American Journal of Engineering and Technology, 7(03), 88-97.

Pabel, M. A. H., Bhattacharjee, B., Dey, S. K., Jamee, S. S., Obaid, M. O., Mia, M. S., ... & Sharif, M. K. (2025). BUSINESS ANALYTICS FOR CUSTOMER SEGMENTATION: A COMPARATIVE STUDY OF MACHINE LEARNING ALGORITHMS IN PERSONALIZED BANKING SERVICES. American Research Index Library, 1-13.

Siddique, M. T., Jamee, S. S., Sajal, A., Mou, S. N., Mahin, M. R. H., Obaid, M. O., ... & Hasan, M. (2025). Enhancing Automated Trading with Sentiment Analysis: Leveraging Large Language Models for Stock Market Predictions. The American Journal of Engineering and Technology, 7(03), 185-195.

Mohammad Iftekhar Ayub, Biswanath Bhattacharjee, Pinky Akter, Mohammad Nasir Uddin, Arun Kumar Gharami, Md Iftakhayrul Islam, Shaidul Islam Suhan, Md Sayem Khan, & Lisa Chambugong. (2025). Deep Learning for Real-Time Fraud Detection: Enhancing Credit Card Security in Banking Systems. The American Journal of Engineering and Technology, 7(04), 141–150. https://doi.org/10.37547/tajet/Volume07Issue04-19

Nguyen, A. T. P., Jewel, R. M., & Akter, A. (2025). Comparative Analysis of Machine Learning Models for Automated Skin Cancer Detection: Advancements in Diagnostic Accuracy and AI Integration. The American Journal of Medical Sciences and Pharmaceutical Research, 7(01), 15-26.

Nguyen, A. T. P., Shak, M. S., & Al-Imran, M. (2024). ADVANCING EARLY SKIN CANCER DETECTION: A COMPARATIVE ANALYSIS OF MACHINE LEARNING ALGORITHMS FOR MELANOMA DIAGNOSIS USING DERMOSCOPIC IMAGES. International Journal of Medical Science and Public Health Research, 5(12), 119-133.

Phan, H. T. N., & Akter, A. (2025). Predicting the Effectiveness of Laser Therapy in Periodontal Diseases Using Machine Learning Models. The American Journal of Medical Sciences and Pharmaceutical Research, 7(01), 27-37.

Phan, H. T. N. (2024). EARLY DETECTION OF ORAL DISEASES USING MACHINE LEARNING: A COMPARATIVE STUDY OF PREDICTIVE MODELS AND DIAGNOSTIC ACCURACY. International Journal of Medical Science and Public Health Research, 5(12), 107-118.

Download and View Statistics

Views: 0 | Downloads: 0

Copyright License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.

Download Citations

How to Cite

Sanjida Akter Tisha, Jannatul Ferdous Shorna, Sadia Afrin, & Eftekhar Hossain Ayon. (2025). Multimodal Deepfake Detection Using Transformer-Based Large Language Models: A Path Toward Secure Media and Clinical Integrity. The American Journal of Engineering and Technology, 7(05), 169–177. https://doi.org/10.37547/tajet/Volume07Issue05-16

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX

Multimodal Deepfake Detection Using Transformer-Based Large Language Models: A Path Toward Secure Media and Clinical Integrity

Abstract

Keywords

References

Download and View Statistics

Copyright License

Download Citations

How to Cite

Download Citation

Information

Instructions

Policies

Multimodal Deepfake Detection Using Transformer-Based Large Language Models: A Path Toward Secure Media and Clinical Integrity

Abstract

Keywords

References

Download and View Statistics

Copyright License

Download Citations

How to Cite

Download Citation

Journal Citation Report

Search article, authors.....