Bounded Determinism: A Framework for Analyzing the Operational, Functional, And Architectural Limits of LLM Inference

Serhii Melnyk

doi:10.37547/tajet/Volume08Issue03-09

PDF

Engineering and Technology | Open Access | DOI: https://doi.org/10.37547/tajet/Volume08Issue03-09

Bounded Determinism: A Framework for Analyzing the Operational, Functional, And Architectural Limits of LLM Inference

Serhii Melnyk , Senior Lead Software Engineer, NC, USA

Download PDF

Published Date 2026-03-24

Pages 136-143

Abstract

A recent paper from Thinking Machines Lab (TML), "Defeating Nondeterminism in LLM Inference," has provided a new perspective on the prevalence of nondeterministic outputs in Large Language Models (LLMs) configured for deterministic behavior.[1] This issue undermines reliability, complicates testing, and hinders scientific reproducibility, with studies showing accuracy variations of up to 15% across identical runs.[2] This paper's primary contribution is to analyze the TML findings through a novel three-part framework, categorizing the boundaries of any determinism solution as: (1) an operational boundary (reproducibility is local to a specific hardware/software stack); (2) a functional boundary (it applies only to greedy decoding, not generative sampling); and (3) an architectural boundary (it does not solve nondeterminism in distributed, multi-GPU systems). This analysis argues that the TML work provides a critical engineering trade-off for reproducibility rather than a complete solution to nondeterminism. By situating the TML work within the proposed framework, this analysis clarifies what is practically achievable versus what is fundamentally impossible in the pursuit of deterministic AI.

Keywords

Nondeterminism, Large Language Models (LLMs), Batch Invariance, Reproducibility

References

H. He and Thinking Machines Lab, "Defeating Nondeterminism in LLM Inference," Thinking Machines Lab: Connectionism, Sep. 2025. Accessed: Nov. 11, 2025. [Online]. Available: https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/

B. Atil, "Non-Determinism of 'Deterministic' LLM Settings," arXiv:2408.04667 [cs.CL], Aug. 2024 (updated Apr. 2025). [Online]. Available: https://arxiv.org/abs/2408.04667

A. Sedova, G. Sivaraman, M. Coletti, W. Elwasif, M. Smith, and O. Hernandez, "Impacts of floating-point non-associativity on reproducibility for HPC and deep learning applications," 2024. [Online]. Available: https://www.researchgate.net/publication/383037277_Impacts_of_floating-point_non-associativity_on_reproducibility_for_HPC_and_deep_learning_applications

J. Yuan, H. Li, X. Ding, et al., "Understanding and Mitigating Numerical Sources of Nondeterminism in LLM Inference," arXiv:2506.09501 [cs.CL], Jun. 2025 (updated Oct. 2025). [Online]. Available: https://arxiv.org/abs/2506.09501

S. Shanmugavelu, M. Taillefumier, C. Culver, et al., "Impacts of Floating-Point Non-Associativity on Reproducibility for HPC and Deep Learning Applications," in Proceedings of SC24 Workshops (SCW24), 2024. doi: 10.1109/SCW63240.2024.00028.

"Towards Deterministic Inference in SGLang and Reproducible RL Training," LMSYS Blog, Sep. 2025. Accessed: Nov. 11, 2025. [Online]. Available: https://lmsys.org/blog/2025-09-22-sglang-deterministic/

Kubiya, "What is Deterministic AI: Concepts, Benefits, and Its Role in Building Reliable AI Agents (2025 Guide)," 2025. Accessed: Nov. 11, 2025. [Online]. Available: https://www.kubiya.ai/blog/what-is-deterministic-ai

NVIDIA, "cuBLAS Library," in CUDA Toolkit Documentation, version 13.0, 2025. Accessed: Nov. 11, 2025. [Online]. Available: https://docs.nvidia.com/cuda/cublas/index.html

S. Troshin, "Control the Temperature: Selective Sampling for Diverse and High-Quality LLM Outputs," arXiv:2510.01218 [cs.CL], Sep. 2025. [Online]. Available: https://arxiv.org/abs/2510.01218

B. Siklósi, G. R. Mudalige, and I. Z. Reguly, "Enabling Bitwise Reproducibility for the Unstructured Computational Motif," Applied Sciences, vol. 14, no. 2, p. 639, 2024. doi: 10.3390/app14020639.

TensorFlow Team, "Reproducible Training," 2023. Accessed: Nov. 11, 2025. [Online]. Available: https://www.tensorflow.org/guide/random_numbers#determinism

PyTorch Core Team, "Reproducibility," 2024. Accessed: Nov. 11, 2025. [Online]. Available: https://pytorch.org/docs/stable/notes/randomness.html

Download and View Statistics

Views: 0 | Downloads: 0

Copyright License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.

Download Citations

How to Cite

Melnyk, S. (2026). Bounded Determinism: A Framework for Analyzing the Operational, Functional, And Architectural Limits of LLM Inference. The American Journal of Engineering and Technology, 8(03), 136–143. https://doi.org/10.37547/tajet/Volume08Issue03-09

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX

Bounded Determinism: A Framework for Analyzing the Operational, Functional, And Architectural Limits of LLM Inference

Abstract

Keywords

References

Download and View Statistics

Copyright License

Download Citations

How to Cite

Download Citation

Information

Instructions

Policies

Bounded Determinism: A Framework for Analyzing the Operational, Functional, And Architectural Limits of LLM Inference

Abstract

Keywords

References

Download and View Statistics

Copyright License

Download Citations

How to Cite

Download Citation

Search article, authors.....