Articles | Open Access | DOI: https://doi.org/10.37547/tajiir/Volume07Issue09-11

Use Of User Feedback for Adaptive Model Tuning

Nisarg B Shah , Product Manager | AI/ML Product Development Seattle, USA

Abstract

This paper discusses a possible path toward adaptive finetuning of large-scale language models over user signal continual learning. In our study, we are trying to organize an approach for explicit and implicit channels. Inside channel heterogeneous feedback filtering, interpreting, and integrating all of them into one regular tuning cycle that would help keep the model updated and qualitative in real-time usage. This paper validates these claims with studies of how fast static model parameterizations get outdated on one hand, and an observation limitation from classic offline process drops in answer accuracy and user trust on the other hand. This unification is novel because it unifies three classes of feedback into a multi-objective loss function with dynamic weights thereof; implemented through a microservice hierarchy architecture that logs, streams filtering, anonymizing, annotating data — then trains in several stages including supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) plus contextual bandit plus rolling A/B test with confidence bounds. In fact, after going through several iterations of SFT and RLHF, it is the live model that steadily beats by a good margin all static baselines in terms of human preference. At the same time, the contextual bandit reduces average regret in online mode, and scaling to billions of queries is achieved without loss of metadata integrity or update flexibility. Key challenges are identified: catastrophic forgetting of rare skills, narrow-group preference bias, privacy risks when processing live data, and high manual annotation costs, for which regularization, stratified sampling, differential privacy, and active self-evaluation learning are proposed as solutions. This article should interest and benefit those who investigate and architect systems for natural language, machine learning, and recommendation engines.

Keywords

Fine-tuning adaptation, Feedback from user, Process of continual learning, large language models, multi-stage retraining

References

Cooper, N., & Zafiroglu, A. (2024). Constraining Participation: Affordances of Feedback Features in Interfaces to Large Language Models. Arxiv. https://doi.org/10.48550/arxiv.2408.15066

Covington, P., Adams, J., & Sargin, E. (2016). Deep Neural Networks for YouTube Recommendations. Proceedings of the 10th ACM Conference on Recommender Systems - RecSys ’16, 191–198. https://doi.org/10.1145/2959100.2959190

Dai, J., & Pan, X. (2024). Safe RLHF: Safe Reinforcement Learning From Human Feedback. https://openreview.net/pdf?id=TyFrPOKYXw

Guan, C., Huang, C., Li, H., Li, Y., Cheng, N., Liu, Z., Chen, Y., Xu, J., & Liu, J. (2025). Multi-Stage LLM Fine-Tuning with a Continual Learning Setting. Findings of the Association for Computational Linguistics: NAACL 2022, 5484–5498. https://doi.org/10.18653/v1/2025.findings-naacl.303

Haruyama, M., & Hidaka, K. (2023). What influences users to provide explicit feedback? A case of food delivery recommenders. User Modeling and User-Adapted Interaction, 34, 753–796. https://doi.org/10.1007/s11257-023-09385-8

Huang, T. (2025, July 19). Content moderation by LLM: from accuracy to legitimacy. Artificial Intelligence Review. https://link.springer.com/article/10.1007/s10462-025-11328-1

Langley, H. (2025, March 5). ChatGPT isn’t slowing down Google yet — these numbers prove it. Business Insider. https://www.businessinsider.com/chatgpt-isnt-slowing-down-google-ai-search-overviews-2025-3

Lee, H., Phatale, S., Mansoor, H., Lu, K., Mesnard, T., Bishop, C., Carbune, V., & Rastogi, A. (2023, September 1). RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback. Arxiv. https://doi.org/10.48550/arXiv.2309.00267

Liang, W., Zhang, Y., Codreanu, M., Wang, J., Cao, H., & Zou, J. (2025). The Widespread Adoption of Large Language Model-Assisted Writing Across

Society. Arxiv. https://doi.org/10.48550/arxiv.2502.09747

Open Review. (2024). PII-Bench: Evaluating Query-Aware Privacy Protection Systems Anonymous ACL submission. https://openreview.net/pdf/38ef9fbb478ee9e9e1964beadbc3e029aef8f2c3.pdf

OpenAI. (2023). GPT-4 Technical Report. Arxiv. https://doi.org/10.48550/arxiv.2303.08774

OpenAI. (2024). OpenAI Report on Government Requests for User Data. OpenAI. https://cdn.openai.com/trust-and-transparency/report-2024h1-government-requests-for-user-data.pdf

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A. K., Schulman, J., Hilton, J. K., Kelton, F., Miller, L. P., Simens, M., Askell, A., Welinder, P., Christiano, P. F., Leike, J., & Lowe, R. J. (2022). Training language models to follow instructions with human feedback. Arxiv. https://doi.org/10.48550/arxiv.2203.02155

Reuters. (2025, February 20). OpenAI’s weekly active users surpass 400 million. Reuters. https://www.reuters.com/technology/artificial-intelligence/openais-weekly-active-users-surpass-400-million-2025-02-20/

Sguerra, B., Tran, V.-A., Hennequin, R., & Moussallam, M. (2025). Uncertainty in Repeated Implicit Feedback as a Measure of Reliability. Arxiv. https://arxiv.org/abs/2505.02492

Ye, Z., Yoganarasimhan, H., & Zheng, Y. (2024). LOLA: LLM-Assisted Online Learning Algorithm for Content Experiments. Arxiv. https://doi.org/10.48550/arxiv.2406.02611

Article Statistics

Downloads

Download data is not yet available.

Copyright License

Download Citations

How to Cite

Nisarg B Shah. (2025). Use Of User Feedback for Adaptive Model Tuning. The American Journal of Interdisciplinary Innovations and Research, 7(09), 108–115. https://doi.org/10.37547/tajiir/Volume07Issue09-11