Multimodal Deepfake Detection Using Transformer-Based Large Language Models: A Path Toward Secure Media and Clinical Integrity
Kutub Thakur , Department of Professional Security Studies, New Jersey City University, Jersey City, New Jersey, USA Md Abu Sayed , Department of Professional Security Studies, New Jersey City University, Jersey City, New Jersey, USA Sanjida Akter Tisha , Master of Science in Information Technology, Washington University of Science and Technology, USA Md Khorshed Alam , Department of Professional Security Studies, New Jersey City University, Jersey City, New Jersey, USA Md Tarek Hasan , Department of Professional Security Studies, New Jersey City University, Jersey City, New Jersey, USA Jannatul Ferdous Shorna , College of Engineering and Computer Science, Florida Atlantic University, Boca Raton, Florida Sadia Afrin , Department of Computer & Information Science, Gannon University, USA Md Zahin Hossain George , Department of Professional Security Studies, New Jersey City University, Jersey City, New Jersey, USA Eftekhar Hossain Ayon , Department of Computer & Info Science, Gannon University, Erie, Pennsylvania, USAAbstract
Deepfakes pose a significant threat across various domains by generating highly realistic manipulated audio-visual content, with critical implications for security and clinical environments. This paper presents a robust multimodal deepfake detection framework powered by transformer-based large language models (LLMs) that effectively analyze and integrate visual, auditory, and textual modalities. Utilizing the FakeAVCeleb dataset, we compare our proposed model with traditional machine learning and deep learning methods, including Logistic Regression, Support Vector Machine (SVM), Random Forest, and Long Short-Term Memory (LSTM) networks. Experimental results demonstrate that the transformer-based model significantly outperforms others, achieving an accuracy of 96.55%, precision of 96.47%, recall of 96.50%, F1-score of 96.48%, and an AUC of 0.97. This enhanced performance is attributed to the model’s ability to capture complex semantic and temporal dependencies across modalities. The findings suggest the proposed model’s strong potential for real-world applications such as telemedicine, clinical video authentication, and digital identity verification, establishing a promising direction for deploying deepfake detection technologies in sensitive and high-stakes environments.
Keywords
Deepfake detection, multimodal fusion, transformer models, large language models, FakeAVCeleb dataset, , telemedicine security, artificial intelligence, audio-visual manipulation, clinical data integrity, digital forensics
References
Y. Afchar, V. Nozick, J. Yamagishi, and I. Echizen, "MesoNet: A Compact Facial Video Forgery Detection Network," in Proc. IEEE Int. Workshop Inf. Forensics Security (WIFS), 2018, pp. 1–7.
Y. Li, M.-C. Chang, and S. Lyu, "In Ictu Oculi: Exposing AI Generated Fake Face Videos by Detecting Eye Blinking," in Proc. IEEE Int. Workshop Inf. Forensics Security (WIFS), 2018, pp. 1–7.
E. Sabir, J. Cheng, A. Jaiswal, W. AbdAlmageed, I. Masi, and P. Natarajan, "Recurrent Convolutional Strategies for Face Manipulation Detection in Videos," Interfaces, arXiv preprint arXiv:1905.00582, 2019.
H. H. Nguyen, J. Yamagishi, and I. Echizen, "Capsule-Forensics: Using Capsule Networks to Detect Forged Images and Videos," in ICASSP 2019 - IEEE Int. Conf. Acoust., Speech Signal Process., 2019, pp. 2307–2311.
F. Matern, C. Riess, and M. Stamminger, "Exploiting Visual Artifacts to Expose Deepfakes and Face Manipulations," in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), 2019, pp. 83–92.
P. Korshunov and S. Marcel, "VoxCeleb2 Dataset for Deepfake Detection," Comput. Speech Lang., vol. 64, 2020, pp. 101097.
K. Jawahar, B. Sagot, and D. Seddah, "What Does BERT Learn About the Structure of Language?" in ACL 2019 - Proc. 57th Annu. Meet. Assoc. Comput. Linguist., 2019, pp. 3651–3657.
L. Verdoliva, "Media Forensics and DeepFakes: An Overview," IEEE J. Sel. Top. Signal Process., vol. 14, no. 5, pp. 910–932, Aug. 2020.
T. Mittal, R. Bhattacharya, A. Chandra, and A. Bera, "Emotions Don't Lie: Multi-Modal Emotion-Based Deepfake Detection," in Proc. 28th ACM Int. Conf. Multimedia, 2020, pp. 2823–2832.
Das, P., Pervin, T., Bhattacharjee, B., Karim, M. R., Sultana, N., Khan, M. S., ... & Kamruzzaman, F. N. U. (2024). OPTIMIZING REAL-TIME DYNAMIC PRICING STRATEGIES IN RETAIL AND E-COMMERCE USING MACHINE LEARNING MODELS. The American Journal of Engineering and Technology, 6(12), 163-177.
Hossain, M. N., Hossain, S., Nath, A., Nath, P. C., Ayub, M. I., Hassan, M. M., ... & Rasel, M. (2024). ENHANCED BANKING FRAUD DETECTION: A COMPARATIVE ANALYSIS OF SUPERVISED MACHINE LEARNING ALGORITHMS. American Research Index Library, 23-35.
Rishad, S. S. I., Shakil, F., Tisha, S. A., Afrin, S., Hassan, M. M., Choudhury, M. Z. M. E., & Rahman, N. (2025). LEVERAGING AI AND MACHINE LEARNING FOR PREDICTING, DETECTING, AND MITIGATING CYBERSECURITY THREATS: A COMPARATIVE STUDY OF ADVANCED MODELS. American Research Index Library, 6-25.
Uddin, A., Pabel, M. A. H., Alam, M. I., KAMRUZZAMAN, F., Haque, M. S. U., Hosen, M. M., ... & Ghosh, S. K. (2025). Advancing Financial Risk Prediction and Portfolio Optimization Using Machine Learning Techniques. The American Journal of Management and Economics Innovations, 7(01), 5-20.
Nguyen, Q. G., Nguyen, L. H., Hosen, M. M., Rasel, M., Shorna, J. F., Mia, M. S., & Khan, S. I. (2025). Enhancing Credit Risk Management with Machine Learning: A Comparative Study of Predictive Models for Credit Default Prediction. The American Journal of Applied sciences, 7(01), 21-30.
Bhattacharjee, B., Mou, S. N., Hossain, M. S., Rahman, M. K., Hassan, M. M., Rahman, N., ... & Haque, M. S. U. (2024). MACHINE LEARNING FOR COST ESTIMATION AND FORECASTING IN BANKING: A COMPARATIVE ANALYSIS OF ALGORITHMS. Frontline Marketing,Management and Economics Journal, 4(12), 66-83.
Hossain, S., Siddique, M. T., Hosen, M. M., Jamee, S. S., Akter, S., Akter, P., ... & Khan, M. S. (2025). Comparative Analysis of Sentiment Analysis Models for Consumer Feedback: Evaluating the Impact of Machine Learning and Deep Learning Approaches on Business Strategies. Frontline Social Sciences and History Journal, 5(02), 18-29.
Nath, F., Chowdhury, M. O. S., & Rhaman, M. M. (2023). Navigating produced water sustainability in the oil and gas sector: A Critical review of reuse challenges, treatment technologies, and prospects ahead. Water, 15(23), 4088.
PHAN, H. T. N., & AKTER, A. (2024). HYBRID MACHINE LEARNING APPROACH FOR ORAL CANCER DIAGNOSIS AND CLASSIFICATION USING HISTOPATHOLOGICAL IMAGES. Universal Publication Index e-Library, 63-76.
Hossain, S., Siddique, M. T., Hosen, M. M., Jamee, S. S., Akter, S., Akter, P., ... & Khan, M. S. (2025). Comparative Analysis of Sentiment Analysis Models for Consumer Feedback: Evaluating the Impact of Machine Learning and Deep Learning Approaches on Business Strategies. Frontline Social Sciences and History Journal, 5(02), 18-29.
Nath, F., Asish, S., Debi, H. R., Chowdhury, M. O. S., Zamora, Z. J., & Muñoz, S. (2023, August). Predicting hydrocarbon production behavior in heterogeneous reservoir utilizing deep learning models. In Unconventional Resources Technology Conference, 13–15 June 2023 (pp. 506-521). Unconventional Resources Technology Conference (URTeC).
Ahmmed, M. J., Rahman, M. M., Das, A. C., Das, P., Pervin, T., Afrin, S., ... & Rahman, N. (2024). COMPARATIVE ANALYSIS OF MACHINE LEARNING ALGORITHMS FOR BANKING FRAUD DETECTION: A STUDY ON PERFORMANCE, PRECISION, AND REAL-TIME APPLICATION. American Research Index Library, 31-44.
Al-Imran, M., Ayon, E. H., Islam, M. R., Mahmud, F., Akter, S., Alam, M. K., ... & Aziz, M. M. (2024). TRANSFORMING BANKING SECURITY: THE ROLE OF DEEP LEARNING IN FRAUD DETECTION SYSTEMS. The American Journal of Engineering and Technology, 6(11), 20-32.
Akhi, S. S., Shakil, F., Dey, S. K., Tusher, M. I., Kamruzzaman, F., Jamee, S. S., ... & Rahman, N. (2025). Enhancing Banking Cybersecurity: An Ensemble-Based Predictive Machine Learning Approach. The American Journal of Engineering and Technology, 7(03), 88-97.
Pabel, M. A. H., Bhattacharjee, B., Dey, S. K., Jamee, S. S., Obaid, M. O., Mia, M. S., ... & Sharif, M. K. (2025). BUSINESS ANALYTICS FOR CUSTOMER SEGMENTATION: A COMPARATIVE STUDY OF MACHINE LEARNING ALGORITHMS IN PERSONALIZED BANKING SERVICES. American Research Index Library, 1-13.
Siddique, M. T., Jamee, S. S., Sajal, A., Mou, S. N., Mahin, M. R. H., Obaid, M. O., ... & Hasan, M. (2025). Enhancing Automated Trading with Sentiment Analysis: Leveraging Large Language Models for Stock Market Predictions. The American Journal of Engineering and Technology, 7(03), 185-195.
Mohammad Iftekhar Ayub, Biswanath Bhattacharjee, Pinky Akter, Mohammad Nasir Uddin, Arun Kumar Gharami, Md Iftakhayrul Islam, Shaidul Islam Suhan, Md Sayem Khan, & Lisa Chambugong. (2025). Deep Learning for Real-Time Fraud Detection: Enhancing Credit Card Security in Banking Systems. The American Journal of Engineering and Technology, 7(04), 141–150. https://doi.org/10.37547/tajet/Volume07Issue04-19
Nguyen, A. T. P., Jewel, R. M., & Akter, A. (2025). Comparative Analysis of Machine Learning Models for Automated Skin Cancer Detection: Advancements in Diagnostic Accuracy and AI Integration. The American Journal of Medical Sciences and Pharmaceutical Research, 7(01), 15-26.
Nguyen, A. T. P., Shak, M. S., & Al-Imran, M. (2024). ADVANCING EARLY SKIN CANCER DETECTION: A COMPARATIVE ANALYSIS OF MACHINE LEARNING ALGORITHMS FOR MELANOMA DIAGNOSIS USING DERMOSCOPIC IMAGES. International Journal of Medical Science and Public Health Research, 5(12), 119-133.
Phan, H. T. N., & Akter, A. (2025). Predicting the Effectiveness of Laser Therapy in Periodontal Diseases Using Machine Learning Models. The American Journal of Medical Sciences and Pharmaceutical Research, 7(01), 27-37.
Phan, H. T. N. (2024). EARLY DETECTION OF ORAL DISEASES USING MACHINE LEARNING: A COMPARATIVE STUDY OF PREDICTIVE MODELS AND DIAGNOSTIC ACCURACY. International Journal of Medical Science and Public Health Research, 5(12), 107-118.
Article Statistics
Copyright License
Copyright (c) 2025 Kutub Thakur, Md Abu Sayed, Sanjida Akter Tisha, Md Khorshed Alam, Md Tarek Hasan, Jannatul Ferdous Shorna, Sadia Afrin, Md Zahin Hossain George, Eftekhar Hossain Ayon

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.