GreenOps at Scale: The 50TB Memory Diet for a Petabyte Platform

At petabyte scale, software inefficiency isn't just costly - it has a massive ecological impact. At Workday, our logging clusters hit a "hardware wall," consuming 111TB of RAM while relying on slower, energy-heavy Cloud infrastructure. This talk explores how we achieved a 46% reduction in memory footprint (saving ~50TB) and a 38% cost reduction with 2-9x better performance, by migrating to local NVMe and efficient ARM processors. Crucially, this is a story of engineering experience meeting AI speed. We combined human knowledge of distributed systems to design the strategy, while utilizing AI to rapidly build the non-existing benchmarking tools needed to execute it. This synergy allowed us to simulate massive production loads on a 1/1000th scale sub-cluster, proving that AI doesn't replace engineers - it accelerates practical experimentation and combines both knowledge, enabling us to validate high-risk architectural shifts with precision and velocity. Join us to learn how combining GreenOps principles, AI-driven engineering, and open-source tools built a leaner, greener, and faster observability stack.