Engineering and Technology | Open Access | DOI: https://doi.org/10.37547/tajet/Volume07Issue10-18

An Aggregation-Based Architecture for Unifying Enterprise IT Operations: A Case Study of a Centralized Monitoring System in a Global Industrial Holding

Dmytro Rumiantsev , Principal IT Consultant, Meng, USA

Abstract

Global industrial corporations often contend with fragmented IT monitoring environments, where disparate tools manage distinct infrastructure domains. This decentralization fosters operational inefficiencies, such as protracted incident diagnostics, redundant alert escalations, and increased resource consumption for root-cause analysis.

This paper documents a case study on the design, implementation, and operational impact of a centralized monitoring system at Metinvest Holding, a multinational industrial enterprise. The core objective was to consolidate heterogeneous monitoring platforms into a cohesive, single-pane-of-glass interface to improve operational visibility, streamline diagnostics, and provide high-level strategic oversight for executive management.

The research employed a single-case study methodology, coupled with an iterative development process informed by Agile principles. An aggregation and visualization layer was architected using Grafana to integrate data from incumbent systems—including Microsoft SCOM, PRTG Network Monitor, Azure Monitoring, and APC Enterprise Manager—without necessitating their replacement. The two-year development cycle involved continuous collaboration with service engineers.

The centralized system yielded substantial improvements in operational efficiency. The number of engineers required for initial, cross-domain incident triage was reduced from an average of three to one. Root-cause identification was significantly accelerated through a holistic, correlated view of infrastructure health, which concurrently minimized alert fatigue and redundant inter-departmental escalations. A novel visualization layer was also developed to furnish executive leadership with an intuitive overview of global IT operational status.

This case demonstrates that a non-disruptive, aggregation-based strategy can effectively address the challenges of monitoring fragmentation in large-scale enterprise settings. The findings highlight the value of integrating technical consolidation with user-centric visualization to achieve both operational efficiency and strategic alignment. The architectural principles delineated are broadly applicable to other multinational organizations facing similar IT infrastructure management complexities.

Keywords

IT Operations Management (ITOM), System Centralization, Grafana, Case Study, Root-Cause Analysis, Operational Efficiency, Enterprise IT Architecture, Data Visualization

References

Weill P, Ross JW. IT Governance: How Top Performers Manage IT Decision Rights for Superior Results. Boston, MA: Harvard Business School Press; 2004.

Sheffi Y. Preparing for disruptions through early detection. MIT Sloan Management Review. 2015;57(1):37-8.

Stanoevska K, Wozniak T, Ristol S. Grid and Cloud Computing: A Business Perspective on Technology and Applications. Berlin: Springer; 2010.

Tariq S, Chhetri MB, Nepal S, Paris C. Alert fatigue in security operations centres: Research challenges and opportunities. ACM Comput Surv. 2025;57(9):Article 224.

International Organization for Standardization. ISO/IEC 20000-1:2018 Information technology — Service management — Part 1: Service management system requirements. Geneva: ISO; 2018.

Beyer B, Jones C, Petoff J, Murphy NR, editors. Site Reliability Engineering: How Google Runs Production Systems. Sebastopol (CA): O'Reilly Media; 2016.

Marks EA, Bell M. Service-Oriented Architecture: A Planning and Implementation Guide for Business and Technology. Hoboken (NJ): John Wiley & Sons; 2006.

Julisch K. Mining alarm clusters to improve alarm handling efficiency. In: Proceedings of the 17th Annual Computer Security Applications Conference; 2001 Dec 10-14; New Orleans, LA. Los Alamitos (CA): IEEE Computer Society; 2001. p. 12-21.

Nobles C. Stress, burnout, and security fatigue in cybersecurity: a human factors problem. HOLISTICA. 2022 Jul;13(1):49-72.

Gill SS, Buyya R. Failure management for reliable cloud computing: a taxonomy, model and future directions. Comput Sci Eng. 2020 May-Jun;22(3):46-61.

Vemula KR. Native cloud applications: a comprehensive analysis of advantages, challenges, and use cases in modern IT infrastructure. Int J Comput Eng Technol. 2025 Feb;16(1):1253-64.

Notaro P, Cardoso J, Gerndt M. A survey of AIOps methods for failure management. ACM Trans Intell Syst Technol. 2021 Nov 30;12(6):Article 81.

McCollam R. Getting Started with Grafana: Real-Time Dashboards for IT and Business Operations. 1st ed. Berkeley (CA): Apress; 2022.

Tallon PP, Kraemer KL. A process-oriented assessment of the alignment of information systems and business strategy: implications for IT business value. J Manag Inf Syst. 2000;16(3):179-201.

Yin RK. Case Study Research and Applications: Design and Methods. 6th ed. Thousand Oaks (CA): SAGE Publications, Inc; 2018.

Stake RE. The Art of Case Study Research. Thousand Oaks (CA): Sage Publications; 1995.

Baskerville RL. Investigating information systems with action research. Commun AIS. 1999;2(3):4.

Climent EF. AIOps: Revolutionizing IT Operations with Artificial Intelligence. [place unknown]: Independently published; 2024.

Article Statistics

Copyright License

Download Citations

How to Cite

Dmytro Rumiantsev. (2025). An Aggregation-Based Architecture for Unifying Enterprise IT Operations: A Case Study of a Centralized Monitoring System in a Global Industrial Holding. The American Journal of Engineering and Technology, 7(10), 146–153. https://doi.org/10.37547/tajet/Volume07Issue10-18