Engineering and Technology | Open Access | DOI: https://doi.org/10.37547/tajet/Volume08Issue03-09

Bounded Determinism: A Framework for Analyzing the Operational, Functional, And Architectural Limits of LLM Inference

Serhii Melnyk , Senior Lead Software Engineer, NC, USA

Abstract

A recent paper from Thinking Machines Lab (TML), "Defeating Nondeterminism in LLM Inference," has provided a new perspective on the prevalence of nondeterministic outputs in Large Language Models (LLMs) configured for deterministic behavior.[1] This issue undermines reliability, complicates testing, and hinders scientific reproducibility, with studies showing accuracy variations of up to 15% across identical runs.[2] This paper's primary contribution is to analyze the TML findings through a novel three-part framework, categorizing the boundaries of any determinism solution as: (1) an operational boundary (reproducibility is local to a specific hardware/software stack); (2) a functional boundary (it applies only to greedy decoding, not generative sampling); and (3) an architectural boundary (it does not solve nondeterminism in distributed, multi-GPU systems). This analysis argues that the TML work provides a critical engineering trade-off for reproducibility rather than a complete solution to nondeterminism. By situating the TML work within the proposed framework, this analysis clarifies what is practically achievable versus what is fundamentally impossible in the pursuit of deterministic AI.

Keywords

Nondeterminism, Large Language Models (LLMs), Batch Invariance, Reproducibility

References

H. He and Thinking Machines Lab, "Defeating Nondeterminism in LLM Inference," Thinking Machines Lab: Connectionism, Sep. 2025. Accessed: Nov. 11, 2025. [Online]. Available: https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/

B. Atil, "Non-Determinism of 'Deterministic' LLM Settings," arXiv:2408.04667 [cs.CL], Aug. 2024 (updated Apr. 2025). [Online]. Available: https://arxiv.org/abs/2408.04667

A. Sedova, G. Sivaraman, M. Coletti, W. Elwasif, M. Smith, and O. Hernandez, "Impacts of floating-point non-associativity on reproducibility for HPC and deep learning applications," 2024. [Online]. Available: https://www.researchgate.net/publication/383037277_Impacts_of_floating-point_non-associativity_on_reproducibility_for_HPC_and_deep_learning_applications

J. Yuan, H. Li, X. Ding, et al., "Understanding and Mitigating Numerical Sources of Nondeterminism in LLM Inference," arXiv:2506.09501 [cs.CL], Jun. 2025 (updated Oct. 2025). [Online]. Available: https://arxiv.org/abs/2506.09501

S. Shanmugavelu, M. Taillefumier, C. Culver, et al., "Impacts of Floating-Point Non-Associativity on Reproducibility for HPC and Deep Learning Applications," in Proceedings of SC24 Workshops (SCW24), 2024. doi: 10.1109/SCW63240.2024.00028.

"Towards Deterministic Inference in SGLang and Reproducible RL Training," LMSYS Blog, Sep. 2025. Accessed: Nov. 11, 2025. [Online]. Available: https://lmsys.org/blog/2025-09-22-sglang-deterministic/

Kubiya, "What is Deterministic AI: Concepts, Benefits, and Its Role in Building Reliable AI Agents (2025 Guide)," 2025. Accessed: Nov. 11, 2025. [Online]. Available: https://www.kubiya.ai/blog/what-is-deterministic-ai

NVIDIA, "cuBLAS Library," in CUDA Toolkit Documentation, version 13.0, 2025. Accessed: Nov. 11, 2025. [Online]. Available: https://docs.nvidia.com/cuda/cublas/index.html

S. Troshin, "Control the Temperature: Selective Sampling for Diverse and High-Quality LLM Outputs," arXiv:2510.01218 [cs.CL], Sep. 2025. [Online]. Available: https://arxiv.org/abs/2510.01218

B. Siklósi, G. R. Mudalige, and I. Z. Reguly, "Enabling Bitwise Reproducibility for the Unstructured Computational Motif," Applied Sciences, vol. 14, no. 2, p. 639, 2024. doi: 10.3390/app14020639.

TensorFlow Team, "Reproducible Training," 2023. Accessed: Nov. 11, 2025. [Online]. Available: https://www.tensorflow.org/guide/random_numbers#determinism

PyTorch Core Team, "Reproducibility," 2024. Accessed: Nov. 11, 2025. [Online]. Available: https://pytorch.org/docs/stable/notes/randomness.html

Download and View Statistics

Views: 0   |   Downloads: 0

Copyright License

Download Citations

How to Cite

Melnyk, S. (2026). Bounded Determinism: A Framework for Analyzing the Operational, Functional, And Architectural Limits of LLM Inference. The American Journal of Engineering and Technology, 8(03), 136–143. https://doi.org/10.37547/tajet/Volume08Issue03-09