Building Better Cloud Cost Testing: The Fox Tool Story

The past few days have been quiet on here, and here's why.

Peter

Jun 23, 2025

Article voiceover

1×

0:00

-11:12

While most founders obsess over feature releases and user metrics, i've been deep in the trenches solving a different problem: how do you test an AI cost optimization engine when you need realistic but controlled data scenarios?

The answer turned out to be building Fox - a tool that generates AWS-compatible mock data so realistic that our AI can't tell it apart from the real thing.

The Problem That Started It All

Building a FinOps AI that can identify cloud waste sounds straightforward until you try to validate it. Real AWS environments are messy, unpredictable, and you can't control the variables. Synthetic data from most tools looks obviously fake. And manually creating test scenarios? That's a nightmare.

Our AI optimization engine needed to prove it could find waste across different scenarios - over-provisioned EC2 instances, unused RDS databases, misconfigured Auto Scaling Groups. But we needed controlled environments where we knew exactly where the waste was hidden.

Four Days of Obsessive Building

What started as "let me quickly mock some data" turned into rebuilding how we think about cloud cost simulation entirely.

Day 1: Discovered our existing data generation was creating $5 test scenarios. Real AWS bills don't look like that. Back to the drawing board.

Day 2: Deep dive into actual AWS Cost and Usage Reports. 131 columns. Complex pricing models. Service-specific consumption patterns. This wasn't going to be quick.

Day 3: Performance breakthrough. Got data generation from 60+ seconds down to 24 seconds. Then added caching that made subsequent runs 4000x faster (45 seconds to 0.011 seconds).

Day 4: Added interactive dashboards and comprehensive analysis. Now Fox doesn't just generate data - it validates whether our AI found the right waste.

What Makes Fox Different

Most cloud cost tools generate obviously synthetic data. Fox creates scenarios so realistic they include:

Business hour usage patterns that mirror real workload fluctuations
Spot pricing volatility that matches AWS market dynamics
Service-specific consumption where S3 costs behave differently than EC2
Proper resource correlation so CloudWatch metrics align perfectly with billing data

The generated data passes the "human expert" test. FinOps practitioners looking at Fox output can't tell it from real AWS bills.

The Technical Breakthrough

Building Fox taught me something about AI validation that applies beyond FinOps. When you're testing AI that needs to understand complex real-world patterns, your test data needs to be indistinguishable from reality.

We achieved this through:

Complete AWS Format Compatibility: Fox generates proper 131-column Cost and Usage Reports with accurate SKUs, product families, and service classifications.

Realistic Cost Patterns: Thousands of dollars for 30-day samples instead of the $5 toy scenarios most tools create. Costs fluctuate based on business hours, seasonality, and resource utilization.

Complex Infrastructure Support: Multi-module Terraform configurations with proper dependency handling and resource correlation across 51+ AWS service types.

The result? Our AI optimization engine now gets tested against scenarios that mirror the complexity it'll face in production.

Building in Public Reality Check

Here's what they don't tell you about building in public as a solo founder: sometimes the most important work happens in complete silence.

Those features that users will eventually love? They often start with infrastructure work that looks boring from the outside. Fox isn't a user-facing feature, but it's the foundation that makes our AI recommendations trustworthy.

The entrepreneurship content space loves to talk about customer validation and product-market fit. But there's another kind of validation that's equally critical - technical validation. Does your core technology actually work as promised?

What's Next

Fox started as a testing tool but it's becoming something bigger. Teams building FinOps solutions need better ways to validate their tools. Companies migrating to cloud need realistic cost modeling before they commit resources.

We're seeing demand for Fox as a standalone product. The same realistic data generation that helps us test our AI could help other teams model cloud scenarios, train their own optimization algorithms, or validate cost management tools.

But for now, Fox serves its primary purpose: giving us confidence that when our AI says "this resource is wasted," it's actually right.

The Lesson

Building great AI isn't just about algorithms and training data. It's about creating testing environments sophisticated enough to validate those algorithms work in the real world.

Sometimes the best way to build your product is to build the tools that prove your product works.

Fox proved our AI can identify waste accurately across different scenarios. That confidence is what lets us recommend cost optimizations that actually save money instead of just moving it around.

And honestly? Building Fox was some of the most satisfying development work i've done in months. There's something deeply rewarding about creating tools that make your other tools better.

Next post: How we're using Fox to validate optimization recommendations that could save companies 60-80% on their AWS bills.