Leveraging LLMs in recommendation systems
Nisarg B Shah , Product Manager | AI/ML Product Development Seattle, USAAbstract
This paper discusses how large language models are being integrated into recommender systems. The paper attempts to carry out a broad evaluation of the capability of LLM embeddings and generative mechanisms in increasing ranking accuracy as well as decreasing inference latency, ensuring robustness toward the cold-start effect. An application such as this has enormous economic consequences since recommendations drive the central portion of viewing hours on Netflix and purchases on Amazon—over 80% and about 35%, respectively, initiated by recommendation algorithms. The novelty here is that the study applies a unified vector space setup for converting heterogeneous signals—textual descriptions, identifiers, and multilingual metadata—in a comparative analysis of classical and LLM-oriented schemes based on Recall@k, nDCG, and inference-latency metrics. Also included is the hybrid architectures systematization scope that ranges from feature-enrichment pipelines up to fully agent-based solutions. Vocabulary Expansion Techniques, Compressed alongside zero-shot ranking with GPT-4 in the loop, return a dramatic leap in recommendation accuracy at orders of magnitude less latency. Takeaways: LLM helps reduce manual feature engineering, increases ranking accuracy up to 62 per cent, remains stable in multilingual and low data scenarios, and generative and agent components make possible conversational interfaces and multi-step service orchestration. Hybrid solutions offer an optimal trade-off between recommendation quality and computational cost in industrial deployment. This article will be helpful to machine-learning researchers and practitioners, recommender-system developers, and personalization-service architects.
Keywords
large language models, recommender systems, cold start, semantic vectors, generative recommendations
References
Choudhary, V. (2025). AI dominance in e-commerce has a new focus: agentic checkout technology. Retail Brew. https://www.retailbrew.com/stories/2025/07/30/ai-dominance-in-e-commerce-has-a-new-focus-agentic-checkout-technology
Cui, Y., Liu, F., Wang, P., Wang, B., Tang, H., Wan, Y., Wang, J., & Chen, J. (2024). Distillation Matters: Empowering Sequential Recommenders to Match the Performance of Large Language Model. ArXiv (Cornell University). https://doi.org/10.1145/3640457.3688118
Hou, Y., Zhang, J., Lin, Z., Lu, H., Xie, R., & McAuley, J. (2023). Large Language Models are Zero-Shot Rankers for Recommender Systems. Arxiv. https://doi.org/10.48550/arxiv.2305.08845
Iana, A., Glavaš, G., & Paulheim, H. (2024). MIND Your Language: A Multilingual Dataset for Cross-lingual News Recommendation. Arxiv. https://doi.org/10.48550/arxiv.2403.17876
Krysik, A. (2024, June 14). Netflix Algorithm: How Netflix Uses AI to Improve Personalization. Stratoflow. https://stratoflow.com/how-netflix-recommendation-algorithm-work/
Liang, Y., Yang, L., Wang, C., Xu, X., Yu, P. S., & Shu, K. (2024). Taxonomy-Guided Zero-Shot Recommendations with LLMs. Arxiv. https://arxiv.org/abs/2406.14043
Liu, Q., Zhao, X., Wang, Y., Wang, Y., Zhang, Z., Sun, Y., Li, X., Wang, M., Jia, P., Chen, C., Huang, W., & Tian, F. (2024). Large Language Model Enhanced Recommender Systems: Taxonomy, Trend, Application and Future. Arxiv. https://doi.org/10.48550/arxiv.2412.13432
Liu, W., Du, Z., Zhao, H., Zhang, W., Zhao, X., Wang, G., Dong, Z., & Xu, J. (2025). Inference Computation Scaling for Feature Augmentation in Recommendation Systems. Arxiv. https://arxiv.org/abs/2502.16040
McLymore, A., & Bensinger, G. (2024, February 5). When Amazon’s new AI tool answers shoppers’ queries, who benefits? Reuters. https://www.reuters.com/technology/when-amazons-new-ai-tool-answers-shoppers-queries-who-benefits-2024-02-05/
New America. (2025). Why Am I Seeing This? New America. https://www.newamerica.org/oti/reports/why-am-i-seeing-this/case-study-amazon/
Shehmir, S., & Kashef, R. (2025). LLM4Rec: A Comprehensive Survey on the Integration of Large Language Models in Recommender Systems—Approaches, Applications and Challenges. Future Internet, 17(6), 252. https://doi.org/10.3390/fi17060252
Wang, Y., Jiang, Z., Chen, Z., Yang, F., Zhou, Y., Cho, E., Fan, X., Huang, X., Lu, Y., & Yang, Y. (2023, August 28). RecMind: Large Language Model Powered Agent For Recommendation. Arxiv. https://doi.org/10.48550/arXiv.2308.14296
Yang, W., Zhang, W., Liu, Y., Han, Y., Wang, Y., Lee, J., & Yu, P. S. (2025). Cold-Start Recommendation with Knowledge-Guided Retrieval-Augmented Generation. Arxiv. https://arxiv.org/abs/2505.20773
Yun, S., & Lim, Y. (2025). User Experience with LLM-powered Conversational Recommendation Systems: A Case of Music Recommendation. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 1–15. https://doi.org/10.1145/3706598.3713347
Zhang, H., Zhang, T., Yin, J., Gal, O., Shrivastava, A., & Braverman, V. (2025). CoVE: Compressed Vocabulary Expansion Makes Better LLM-based Recommender Systems. Arxiv. https://arxiv.org/abs/2506.19993
Article Statistics
Downloads
Copyright License
Copyright (c) 2025 Nisarg B Shah

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.


Applied Sciences
| Open Access |
DOI: