Privacy-Preserving Processing of User Messages for LLM Services: Anonymization Methods, PII Leakage Assessment, and the “Confidentiality–Answer Quality” Trade-off
Andrei Shcherbinin , Social Discovery Group, Team Lead ML Engineer Tbilisi, GeorgiaAbstract
This study provides a comprehensive analysis of architectural and algorithmic approaches aimed at maintaining data confidentiality during the operation of large-scale language models. Particular attention is paid to the identification and protection of personally identifiable information under conditions of continuous user interaction with intelligent systems. Evidence on security breaches from recent years is systematised, demonstrating a sharp increase in incidents associated with information leakage through generative models. The work examines in detail the hybrid RECAP methodology, combining deterministic algorithms with context-dependent prompts, as well as approaches grounded in differential privacy and machine “defocused” learning. The analysis further addresses the trade-off between the level of protection and the quality of generated answers, including the impact of anonymization on factual accuracy and on model capabilities, which are often described as cognitive. Based on the findings, recommendations are formulated for introducing adaptive routing strategies and multi-stage data cleansing into contemporary MLOps cycles.
Keywords
large language models, PII anonymization, differential privacy, machine learning, information security, ROC AUC, message anonymization, confidentiality–quality trade-off, RECAP, personal data protection.
References
AI Index Steering Committee. (2025). AI Index Report 2025 | Stanford Institute for Human-Centered Artificial Intelligence (Stanford HAI). Retrieved from: https://aiindex.stanford.edu/report/ (date accessed: October 3, 2025).
Cheng, S., Li, Z., Meng, S., Ren, M., Xu, H., Hao, S., Yue, C., & Zhang, F. (2025). Understanding PII leakage in large language models: A systematic survey. In Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence (IJCAI-25) (pp. 10409–10417). https://doi.org/10.24963/ijcai.2025/1156
Verizon. (2025). Data Breach Investigations Report (DBIR) 2025 | Verizon Business. Retrieved from: https://www.verizon.com/business/resources/reports/dbir/ (date accessed: October 6, 2025).
Cost of a Data Breach Report 2025: The AI Oversight Gap | Baker Donelson. (2025). Retrieved from: https://www.bakerdonelson.com/webfiles/Publications/20250822_Cost-of-a-Data-Breach-Report-2025.pdf (date accessed: October 9, 2025).
Regulation (EU) 2024/1689 (Artificial Intelligence Act) | EUR-Lex. (2024). Retrieved from: https://eur-lex.europa.eu/eli/reg/2024/1689/oj (date accessed: October 12, 2025).
AI Act | Shaping Europe’s digital future | European Commission. Retrieved from: https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai (date accessed: October 15, 2025).
IBM Report: 13% Of Organizations Reported Breaches Of AI Models Or Applications, 97% Of Which Reported Lacking Proper AI Access Controls | IBM Newsroom. (2025). Retrieved from: https://newsroom.ibm.com/2025-07-30-ibm-report-13-of-organizations-reported-breaches-of-ai-models-or-applications%2C-97-of-which-reported-lacking-proper-ai-access-controls (date accessed: October 18, 2025).
Cheng, S., Meng, S., Xu, H., Zhang, H., Hao, S., Yue, C., Ma, W., Han, M., Zhang, F., & Li, Z. (2025). Effective PII extraction from LLMs through augmented few-shot learning. In Proceedings of the 34th USENIX Security Symposium (USENIX Security 25) (pp. 8155–8173). https://doi.org/10.5555/3766078.3766496
AI Privacy Risks & Mitigations – Large Language Models (LLMs) | European Data Protection Board (EDPB).(2025). Retrieved from: https://www.edpb.europa.eu/system/files/2025-04/ai-privacy-risks-and-mitigations-in-llms.pdf(date accessed: October 21, 2025).
Lukas, N., Salem, A., Sim, R., Tople, S., Wutschitz, L., & Zanella-Béguelin, S. (2023). Analyzing leakage of personally identifiable information in language models. In Proceedings of the 2023 IEEE Symposium on Security and Privacy (pp. 346–363). https://doi.org/10.1109/SP46215.2023.00028
Rajgarhia, H., Gupta, S., Shaik, A., Kumar, G. P., Santhoshraj, Y., Nishitha, S. N. T., & Mukherji, A. (2025). An evaluation study of hybrid methods for multilingual PII detection. arXiv. https://doi.org/10.48550/arXiv.2510.07551
Manzanares-Salor, B., & Sánchez, D. (2025). A comparative analysis, enhancement and evaluation of text anonymization with pre-trained large language models. Expert Systems with Applications, 297, 129474. https://doi.org/10.1016/j.eswa.2025.129474
McCallister, E., Grance, T., & Scarfone, K. (2010). Guide to Protecting the Confidentiality of Personally Identifiable Information (PII) (NIST SP 800-122) | NIST. Retrieved from: https://csrc.nist.gov/pubs/sp/800/122/final (date accessed: November 3, 2025).
Presidio: Data Protection and De-identification SDK | Microsoft. Retrieved from: https://microsoft.github.io/presidio/text_anonymization/ (date accessed: November 7, 2025).
Edemacu, K., & Wu, X. (2025). Privacy preserving prompt engineering: A survey. ACM Computing Surveys, 57(10). https://doi.org/10.1145/3729219
Ji, W., & Ying, Z. (2026). An LLM-powered framework for privacy-preserving and scalable labor market analysis. Mathematics, 14(1), 53. https://doi.org/10.3390/math14010053
Abbasi, W., Mori, P., & Saracino, A. (2025). Trading-off privacy, utility, and explainability in deep learning-based image data analysis. IEEE Transactions on Dependable and Secure Computing. https://doi.org/10.1109/TDSC.2024.3400608
Parii, D., van Osch, T., & Sun, C. (2025). Machine unlearning of personally identifiable information in large language models. In Proceedings of the Natural Legal Language Processing Workshop 2025 (pp. 54–67). https://doi.org/10.18653/v1/2025.nllp-1.6
Yao, J., Chien, E., Du, M., Niu, X., Wang, T., Cheng, Z., & Yue, X. (2024). Machine unlearning of pre-trained large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 8403–8419). https://doi.org/10.18653/v1/2024.acl-long.457
Cutler, E., Levonian, Z., & Christie, S. T. (2025). Detecting student intent for chat-based intelligent tutoring systems. arXiv. https://doi.org/10.48550/arXiv.2502.15096
Saito, T., & Rehmsmeier, M. (2015). The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE, 10(3), e0118432. https://doi.org/10.1371/journal.pone.0118432
Das, B. C., Amini, M. H., & Wu, Y. (2025). Security and privacy challenges of large language models: A survey. ACM Computing Surveys, 57(6), Article 152, 1–39. https://doi.org/10.1145/3712001
Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models | European Data Protection Board (EDPB). (2024). Retrieved from: https://www.edpb.europa.eu/system/files/2024-12/edpb_opinion_202428_ai-models_en.pdf (date accessed: November 12, 2025).
Hui, Z., Dong, Y. R., Sivapiromrat, S., Shareghi, E., & Collier, N. (2025). PrivacyPAD: A reinforcement learning framework for dynamic privacy-aware delegation. arXiv. https://doi.org/10.48550/arXiv.2510.16054
Abdelnabi, S., Fay, A., Cherubin, G., Salem, A., Fritz, M., & Paverd, A. (2024). Are you still on track!? Catching LLM task drift with activations. arXiv. https://doi.org/10.48550/arXiv.2406.00799
Download and View Statistics
Copyright License
Copyright (c) 2026 Andrei Shcherbinin

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.

Articles
| Open Access |
DOI: