Use Of User Feedback for Adaptive Model Tuning
Nisarg B Shah , Product Manager | AI/ML Product Development Seattle, USAAbstract
This paper discusses a possible path toward adaptive finetuning of large-scale language models over user signal continual learning. In our study, we are trying to organize an approach for explicit and implicit channels. Inside channel heterogeneous feedback filtering, interpreting, and integrating all of them into one regular tuning cycle that would help keep the model updated and qualitative in real-time usage. This paper validates these claims with studies of how fast static model parameterizations get outdated on one hand, and an observation limitation from classic offline process drops in answer accuracy and user trust on the other hand. This unification is novel because it unifies three classes of feedback into a multi-objective loss function with dynamic weights thereof; implemented through a microservice hierarchy architecture that logs, streams filtering, anonymizing, annotating data — then trains in several stages including supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) plus contextual bandit plus rolling A/B test with confidence bounds. In fact, after going through several iterations of SFT and RLHF, it is the live model that steadily beats by a good margin all static baselines in terms of human preference. At the same time, the contextual bandit reduces average regret in online mode, and scaling to billions of queries is achieved without loss of metadata integrity or update flexibility. Key challenges are identified: catastrophic forgetting of rare skills, narrow-group preference bias, privacy risks when processing live data, and high manual annotation costs, for which regularization, stratified sampling, differential privacy, and active self-evaluation learning are proposed as solutions. This article should interest and benefit those who investigate and architect systems for natural language, machine learning, and recommendation engines.
Keywords
Fine-tuning adaptation, Feedback from user, Process of continual learning, large language models, multi-stage retraining
References
Cooper, N., & Zafiroglu, A. (2024). Constraining Participation: Affordances of Feedback Features in Interfaces to Large Language Models. Arxiv. https://doi.org/10.48550/arxiv.2408.15066
Covington, P., Adams, J., & Sargin, E. (2016). Deep Neural Networks for YouTube Recommendations. Proceedings of the 10th ACM Conference on Recommender Systems - RecSys ’16, 191–198. https://doi.org/10.1145/2959100.2959190
Dai, J., & Pan, X. (2024). Safe RLHF: Safe Reinforcement Learning From Human Feedback. https://openreview.net/pdf?id=TyFrPOKYXw
Guan, C., Huang, C., Li, H., Li, Y., Cheng, N., Liu, Z., Chen, Y., Xu, J., & Liu, J. (2025). Multi-Stage LLM Fine-Tuning with a Continual Learning Setting. Findings of the Association for Computational Linguistics: NAACL 2022, 5484–5498. https://doi.org/10.18653/v1/2025.findings-naacl.303
Haruyama, M., & Hidaka, K. (2023). What influences users to provide explicit feedback? A case of food delivery recommenders. User Modeling and User-Adapted Interaction, 34, 753–796. https://doi.org/10.1007/s11257-023-09385-8
Huang, T. (2025, July 19). Content moderation by LLM: from accuracy to legitimacy. Artificial Intelligence Review. https://link.springer.com/article/10.1007/s10462-025-11328-1
Langley, H. (2025, March 5). ChatGPT isn’t slowing down Google yet — these numbers prove it. Business Insider. https://www.businessinsider.com/chatgpt-isnt-slowing-down-google-ai-search-overviews-2025-3
Lee, H., Phatale, S., Mansoor, H., Lu, K., Mesnard, T., Bishop, C., Carbune, V., & Rastogi, A. (2023, September 1). RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback. Arxiv. https://doi.org/10.48550/arXiv.2309.00267
Liang, W., Zhang, Y., Codreanu, M., Wang, J., Cao, H., & Zou, J. (2025). The Widespread Adoption of Large Language Model-Assisted Writing Across
Society. Arxiv. https://doi.org/10.48550/arxiv.2502.09747
Open Review. (2024). PII-Bench: Evaluating Query-Aware Privacy Protection Systems Anonymous ACL submission. https://openreview.net/pdf/38ef9fbb478ee9e9e1964beadbc3e029aef8f2c3.pdf
OpenAI. (2023). GPT-4 Technical Report. Arxiv. https://doi.org/10.48550/arxiv.2303.08774
OpenAI. (2024). OpenAI Report on Government Requests for User Data. OpenAI. https://cdn.openai.com/trust-and-transparency/report-2024h1-government-requests-for-user-data.pdf
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A. K., Schulman, J., Hilton, J. K., Kelton, F., Miller, L. P., Simens, M., Askell, A., Welinder, P., Christiano, P. F., Leike, J., & Lowe, R. J. (2022). Training language models to follow instructions with human feedback. Arxiv. https://doi.org/10.48550/arxiv.2203.02155
Reuters. (2025, February 20). OpenAI’s weekly active users surpass 400 million. Reuters. https://www.reuters.com/technology/artificial-intelligence/openais-weekly-active-users-surpass-400-million-2025-02-20/
Sguerra, B., Tran, V.-A., Hennequin, R., & Moussallam, M. (2025). Uncertainty in Repeated Implicit Feedback as a Measure of Reliability. Arxiv. https://arxiv.org/abs/2505.02492
Ye, Z., Yoganarasimhan, H., & Zheng, Y. (2024). LOLA: LLM-Assisted Online Learning Algorithm for Content Experiments. Arxiv. https://doi.org/10.48550/arxiv.2406.02611
Article Statistics
Downloads
Copyright License
Copyright (c) 2025 Nisarg B Shah

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.