Overcoming the Real-World Pitfalls of Google Document AI
Oleksandr Tserkovnyi , TrialBase Inc., Principal Engineer Dominican Republic, Punta CanaAbstract
This paper discusses the practical and feature gaps that were encountered with Google Document AI in building the AI product at TrialBase platform (ai.trialbase.com), which automates legal document analysis. Results matter because there is an explosion of electronic legal documents that require fast and reliable parsing, which is essential for systems based on LLMs and retrieval-augmented generation. Standard Document AIs seldom work well in practice, even if there are no damaged PDFs, and if a large dataset is being used, wherein the API quota is not hit, and processing costs do not matter. The architecture proposed in this paper is robust, efficient at transforming various documents into structured data. Event-driven microservice architecture with message queues and a PDF sanitization pipeline solves real-world problems by enabling ProcessorPool (multiple processors using synchronous Document AI API to go beyond quota limitation concurrently drastically reducing processing times). Pre-sanitization, coupled with asynchronous batch processing and a custom load balancer, got a tenfold speed increase with enhanced reliability over real-world legal documents. The article is meant to help LegalTech researchers and practitioners, workflow developers, and engineers working on high-performance, reliable Google Cloud-based projects.
Keywords
Google Document AI, LegalTech, document analysis automation, PDF processing, RAG, LLM, microservice architecture, asynchronous processing, ProcessorPool
References
Appalaraju, S., Jasani, B., Kota, B. U., Xie, Y., & Manmatha, R. (2021). DocFormer: End-to-End Transformer for Document Understanding. 2021 IEEE/CVF International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv48922.2021.00103
He, S., & Schomaker, L. (2017). Beyond OCR: Multi-faceted understanding of handwritten document characteristics. Pattern Recognition, 63, 321–333. https://doi.org/10.1016/j.patcog.2016.09.017
Li, Z., Guo, L., Cheng, J., Chen, Q., He, B., & Guo, M. (2022). The Serverless Computing Survey: A Technical Primer for Design Architecture. ACM Computing Surveys, 54(10s), 1-34. https://doi.org/10.1145/3508360
Powalski, R., Borchmann, Ł., Jurkiewicz, D., Dwojak, T., Pietruszka, M., & Pałka, G. (2021). Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer. Arxiv. https://doi.org/10.48550/arxiv.2102.09550
Xu, Y., Xu, Y., Lv, T., Cui, L., Wei, F., Wang, G., Lu, Y., Florencio, D., Zhang, C., Che, W., Zhang, M., & Zhou, L. (2021). LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2579–2591. https://doi.org/10.18653/v1/2021.acl-long.201
Article Statistics
Copyright License
Copyright (c) 2025 Oleksandr Tserkovnyi

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.


Engineering and Technology
| Open Access |
DOI: