Automation of Product Decision-Making Based on A/B Testing
Alexander Blinov , Chief Product Officer at Zendrop West Palm Beach, FL, USAAbstract
This article covers the issue of automating product decisions from A/B tests, trying to knit together what have pretty much been disparate and sometimes even ad hoc stages of experimentation into a single, reproducible, scalable pipeline that includes hypothesis planning, traffic control, streaming analytics, statistical evaluation, and safe rollout. The growth of this inquiry is motivated by the rapid increase in numbers of digital experiments and correspondingly strong demand for A/B testing tools--and the tremendous weakness of traditional manual processes: more than 90% of spreadsheets have errors and one typo in Excel can cost billions undermining the product teams' confidence in the experimental results. The novelty of the work lies in a comprehensive analysis of modern experiment factory architectures that integrate feature flags, Apache Kafka–based streaming analytics, frequentist and Bayesian evaluation methods, multi-armed bandit algorithms, reinforcement learning, and elements of causal ML. A six-layer pipeline concept was proposed in which each stage (from the hypothesis catalog to automatic rollback and result archiving) is implemented by automated means without analyst involvement. Results show that automated A/B processes shrink the experiment cycle from weeks to hours, allow for parallel launch of hundreds of tests, reduce error risk, and speed delivery of winning variants to production. Sequential analysis keeps the false-positive rate under control below 5% along with false discovery rate control; Bayesian modes provide for proper decisions in small samples; and multi-armed bandits plus reinforcement learning virtually eliminate traffic loss during simultaneous exploration and exploitation. The automated system increases the frequency of releases, further improves conversions, and helps improve data-driven culture within organizations. The paper will be helpful to product managers, data analysts, DevOps engineers, and CTOs who are responsible for building and scaling an experimentation platform and establishing a seamless cycle of product decision-making.
Keywords
A/B testing, experiment automation, streaming analytics, feature flags
References
K. Gilbert, “A/B Testing Gets an Upgrade for the Digital Age,” Stanford University, Jun. 12, 2024. https://www.gsb.stanford.edu/insights/ab-testing-gets-upgrade-digital-age (accessed Apr. 22, 2025).
“Top 64 Latest A/B Testing Statistics, Data, and Trends,” 9cv9 Career Blog, Feb. 26, 2025. https://blog.9cv9.com/top-64-latest-a-b-testing-statistics-data-and-trends/ (accessed Apr. 23, 2025).
P. Guha, “30 Key A/B Testing Statistics: A Comprehensive Guide,” VWO, Oct. 25, 2024. https://vwo.com/blog/ab-testing-statistics/ (accessed Apr. 24, 2025).
“Why 94% of Financial Spreadsheets Contain Errors,” NextProcess, Apr. 04, 2025. https://www.nextprocess.com/uncategorized/why-94-of-financial-spreadsheets-contain-errors-and-what-it-costs-you/ (accessed Apr. 25, 2025).
P. Barnhurst, “How an Excel error cost JP Morgan $6 billion,” LinkedIn, Jul. 19, 2024. https://www.linkedin.com/posts/thefpandaguy_how-an-excel-error-cost-jp-morgan-6-billion-activity-7220200222088470528-fd8u (accessed Apr. 26, 2025).
“How a Tech Innovator Reduced Test Cycle Time,” Provar, Apr. 10, 2025. https://provar.com/case-study/how-this-technology-company-reduced-test-cycle-time/ (accessed Apr. 26, 2025).
R. Kohavi and R. Longbotham, “Online Controlled Experiments and A/B Testing,” Encyclopedia of Machine Learning and Data Mining, pp. 1–8, Jan. 2016, doi: https://doi.org/10.1007/978-1-4899-7502-7_891-1.
“Study finds 94% of business spreadsheets have critical errors,” Higher Education Press, Aug. 13, 2024. https://phys.org/news/2024-08-business-spreadsheets-critical-errors.html (accessed Apr. 26, 2025).
B. Zobel, “The $6 Billion Excel Error,” Prosper Spark, Oct. 2024. https://www.prosperspark.com/the-6-billion-excel-error/ (accessed Apr. 27, 2025).
“Kafka use cases: Real-world applications,” Statsig, Aug. 13, 2024. https://www.statsig.com/perspectives/kafka-use-cases-applications (accessed Apr. 28, 2025).
K. Smith, “Introducing the 2022 State of Feature Management,” Launch Darkly, 2022. https://launchdarkly.com/blog/state-of-feature-management-2022/ (accessed Apr. 28, 2025).
“The data-driven enterprise of 2025,” McKinsey, 2022. Accessed: Apr. 30, 2025. [Online]. Available: https://www.mckinsey.com/~/media/mckinsey/business%20functions/mckinsey%20analytics/our%20insights/the%20data%20driven%20enterprise%20of%202025/the-data-driven-enterprise-of-2025-final.pdf
“Knowledge Base,” Geteppo, 2025. https://docs.geteppo.com/experiment-analysis/reporting/knowledge-base/ (accessed May 01, 2025).
“The state of feature management,” Launch Darkly, 2022. https://launchdarkly.com/state-of-feature-management/ (accessed May 02, 2025).
J. Koshy, “Kafka Ecosystem at LinkedIn,” LinkedIn. https://www.linkedin.com/blog/engineering/open-source/kafka-ecosystem-at-linkedin (accessed May 03, 2025).
A. Sellers, “Key Takeaways from Confluent’s 2024 Data Streaming Report,” Confluent, Jun. 05, 2024. https://www.confluent.io/blog/2024-data-streaming-report/ (accessed May 04, 2025).
T. Chan, “You don’t need large sample sizes to run A/B tests,” Statsig.com, Dec. 24, 2024. https://www.statsig.com/blog/you-dont-need-large-sample-sizes-ab-tests (accessed May 05, 2025).
“Statistical significance,” Optimizely, Feb. 05, 2025. https://support.optimizely.com/hc/en-us/articles/4410284003341-Statistical-significance (accessed May 08, 2025).
“Rollout and rollback features,” Optimizely. https://docs.developers.optimizely.com/full-stack-experimentation/v2.0/docs/rollout-and-rollback-features (accessed May 09, 2025).
L. Pekelis, “The story behind our Stats Engine,” Optimizely, Jan. 20, 2015.
https://www.optimizely.com/insights/blog/statistics-for-the-internet-age-the-story-behind-optimizelys-new-stats-engine/ (accessed May 10, 2025).
“A beginner’s guide to Bayesian experimentation,” Statsig, Sep. 16, 2024.
https://www.statsig.com/perspectives/beginner-guide-bayesian-experimentation (accessed May 10, 2025).
F. Siddiqi, “ML Platform Meetup: Infra for Contextual Bandits and Reinforcement Learning,” Medium, Oct. 18, 2019.
https://netflixtechblog.com/ml-platform-meetup-infra-for-contextual-bandits-and-reinforcement-learning-4a90305948ef (accessed May 11, 2025).
A. Jetli, “Making Uber’s Experiment Evaluation Engine 100x Faster,” Uber Blog, Oct. 03, 2024. https://www.uber.com/en-GB/blog/making-ubers-experiment-evaluation-engine-100x-faster/ (accessed May 12, 2025).
R. Chang, “How Airbnb Achieved Metric Consistency at Scale,” The Airbnb Tech Blog, Aug. 05, 2021. https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70 (accessed May 13, 2025).
“Use minimum detectable effect when designing an experiment,” Optimizely, 2024.
https://support.optimizely.com/hc/en-us/articles/4410288881293-Use-minimum-detectable-effect-when-designing-an-experiment (accessed May 14, 2025).
M. Stewart, “Sequential Testing on Statsig,” Statsig. https://www.statsig.com/blog/sequential-testing-on-statsig (accessed May 15, 2025).
“Creating Experiment Analyses,” Geteppo. https://docs.geteppo.com/experiment-analysis/configuration/ (accessed May 17, 2025).
N. Donovan, “The role of experimentation at Booking.com,” Booking, Sep. 02, 2019.
https://partner.booking.com/en-us/click-magazine/industry-perspectives/role-experimentation-bookingcom (accessed May 16, 2025).
“7 Reasons Percentage Rollouts Reduce Deployment Risk,” Launch Darkly, Nov. 15, 2022.
https://launchdarkly.com/blog/how-percentage-rollouts-minimize-deployment-risks/ (accessed May 19, 2025).
M. Watson, “The Ultimate Guide to Feature Flags: Implementation Strategies for Enterprise Applications,” Full Scale, Jan. 27, 2025. https://fullscale.io/blog/feature-flags-implementation-guide/ (accessed May 19, 2025).
Article Statistics
Copyright License
Copyright (c) 2025 Alexander Blinov

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.