Completeness And Reliability Of SOFA/SOAL: NLP Assessment And Impact On Reconciliation Requirements
Bakhovuddin Sadriddin ogli Muratov , Alixpartners LLP, Chicago, USAbstract
Under conditions of exponential growth of data sets accompanied by a deepening crisis of trust in them, the completeness and reliability of legal and financial reporting, including such forms as the Statement of Financial Affairs (SOFA) and the Schedule of Assets and Liabilities (SOAL), are acquiring critical importance. In this context, traditional data reconciliation procedures prove to be methodologically and technologically insufficient for the analysis of unstructured text fields, within which the most significant manifestations of hidden and hard-to-detect fraud are concentrated. The purpose of the article is to examine how the implementation of natural language processing (NLP) methods in the analysis of unstructured SOFA/SOAL fields modifies and effectively redefines the classical requirements for financial reconciliation. To achieve this aim, a mixed methodological approach is used, including a systematic review of academic literature, content analysis of industry and legal documents, as well as analog modeling based on case studies. The results obtained indicate that traditional manual reconciliation is of low effectiveness for identifying so-called semantic fraud, which is manifested not in formal arithmetic inconsistencies but in substantive and contextual distortions of information. In response to this challenge, the paper proposes a conceptual NLP model that enables the stratification of risks on the basis of metrics of completeness and reliability of disclosed data. The analysis of practical analogues in related domains demonstrates that the use of NLP tools can significantly increase data completeness (for example, from 60% to 73%) and ensure high reliability of results (accuracy at the level of 93%) compared to traditional manual analysis. The introduction of NLP technologies leads to a qualitative transformation of the reconciliation requirement itself: it evolves from comprehensive manual control to an automated, proactive and risk-oriented mode of working with data. The provisions and conclusions set out in the article have practical value for insolvency practitioners, forensic accounting experts, compliance professionals, as well as for regulators involved in the development and updating of new standards for auditing and control of financial reporting.
Keywords
SOFA, SOAL, natural language processing (NLP)
References
What is data reconciliation? [Electronic resource]. - Access mode: https://www.ibm.com/think/topics/data-reconciliation (date accessed: 10/18/2025).
How finance leaders can maximize financial excellence and efficiency [Electronic resource]. - Access mode: https://www.signavio.com/downloads/white-papers/maximize-financial-excellence-and-efficiency/?crm_code=CRM-XG25-GLO-PPC_WPFE1&source (date accessed: 10/18/2025).
Gross T. et al. The economic consequences of bankruptcy reform //American Economic Review. – 2021. – Vol. 111 (7). – pp. 2309-2341.
Doherty L. J. Chapter 11 Bankruptcy & Corporate Accountability: How Large Economic Players Use Reorganization as a Liability Shield //Bus. Entrepreneurship & Tax L. Rev. – 2022. – Vol. 6. – pp. 1-25.
Nesca M. et al. A scoping review of preprocessing methods for unstructured text data to assess data quality //International Journal of Population Data Science. – 2022. – Vol. 7 (1). – pp. 1-15. https://doi.org/10.23889/ijpds.v6i1.1757.
Lu Z. et al. Natural language processing and machine learning methods to characterize unstructured patient-reported outcomes: validation study //Journal of Medical Internet Research. – 2021. – Vol. 23 (11). https://doi.org/10.2196/26777.
Four data and model quality challenges tied to generative AI - Deloitte [Electronic resource]. - Access mode: https://www.deloitte.com/us/en/insights/topics/digital-transformation/data-integrity-in-ai-engineering.html (date accessed: 10/20/2025).
Cohen A. B. et al. A natural language processing algorithm to improve completeness of ECOG performance status in real-world data //Applied Sciences. – 2023. – Vol. 13 (10). https://doi.org/10.3390/app13106209.
Shah K. N., Gami S. J., Trehan A. An intelligent approach to data quality management AI-Powered quality monitoring in analytics //International Journal of Advanced Research in Science Communication and Technology. – 2024. – Vol. 4 (3). – pp. 109-119. https://doi.org/10.48175/IJARSCT-22820.
Combating Bankruptcy Fraud with SOFA and SOAL Forms [Electronic resource]. - Access mode: https://www.eisneramper.com/insights/bankruptcy-restructuring/bankruptcy-forms-combat-fraud-1121/ (date accessed: 10/21/2025).
UNITED STATES BANKRUPTCY COURT SOUTHERN DISTRICT OF TEXAS HOUSTON DIVISION ) In re: ) Chapter 11 ) WHITING PETROLEUM CORPORATI [Electronic resource]. - Access mode: https://cases.stretto.com/public/X059/10187/PLEADINGS/1018707222080000000154.pdf (date accessed: 10/21/2025).
Lee K. et al. Optimizing clinical trial eligibility design using natural language processing models and real-world data: algorithm development and validation //JMIR AI. – 2024. – Vol. 3. https://doi.org/10.2196/50800.
UNITED STATES BANKRUPTCY COURT SOUTHERN DISTRICT OF NEW YORK NOT FOR PUBLICATION Chapter 7 Case No. 18-13098 (MG) Adv. Proc. N [Electronic resource]. - Access mode: https://www.nysb.uscourts.gov/sites/default/files/opinions/287136_193_opinion.pdf (date accessed: 10/21/2025).
UNITED STATES BANKRUPTCY COURT SOUTHERN DISTRICT OF NEW YORK [Electronic resource]. - Access mode: http://cases.gcginc.com/arcapita/pdflib/1609_11076.pdf (date accessed: 10/21/2025).
Performance Metrics: Confusion matrix, Precision, Recall, and F1 Score [Electronic resource]. - Access mode: https://towardsdatascience.com/performance-metrics-confusion-matrix-precision-recall-and-f1-score-a8fe076a2262/ (date accessed: 10/22/2025).
Lan Z., Turchin A. Impact of possible errors in natural language processing-derived data on downstream epidemiologic analysis //JAMIA open. – 2023. – Vol. 6 (4). – pp. 1-8. https://doi.org/10.1093/jamiaopen/ooad111.
Oregon Department of Transportation - Driver and Motor Vehicle Services [Electronic resource]. - Access mode: https://www.oregon.gov/odot/DMV/docs/ODOT_DIR_Report.pdf (date accessed: 10/22/2025).
Data and Migration Testing G-Cloud 14 Service Definition Document - GOV.UK [Electronic resource]. - Access mode: https://assets.applytosupply.digitalmarketplace.service.gov.uk/g-cloud-14/documents/92485/817892114664343-service-definition-document-2024-04-23-0637.pdf (date accessed: 10/23/2025).
Performance assessment and validation of real-world response data generated using a deep learning-based natural language processing model across multiple solid tumors [Electronic resource]. - Access mode: https://resources.flatiron.com/publications/performance-assessment-and-validation-of-real-world-response-data-generated-using-a-deep-learning-based-natural-language-processing-model (date accessed: 10/23/2025).
Eziefule A. O. et al. The role of AI in automating routine accounting tasks: Efficiency gains and workforce implications //European Journal of Accounting, Auditing and Finance Research. – 2022. – Vol. 10 (12). – pp. 109-134.
Deloitte 2025 Q2 CFO Express [Electronic resource]. - Access mode: https://www.deloitte.com/cn/en/services/consulting/perspectives/china-cfo-express-18.html (date accessed: 10/23/2025).
Kasireddy J. R. The Role of AI in Modern Data Engineering: Automating ETL and Beyond //International Conference of Global Innovations and Solutions. – Cham : Springer Nature Switzerland, 2025. – pp. 667-693.
Download and View Statistics
Copyright License
Copyright (c) 2026 Bakhovuddin Sadriddin ogli Muratov

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.


Articles
| Open Access |
DOI: