Executive Summary
A wave of custom AI inference chips from AWS (Trainium3), Google (TPU v6), and Microsoft (Maia 2) has driven inference costs down 60% year-over-year. NVIDIA responded by cutting H200 pricing and announcing aggressive B300 availability timelines. The cost to run a frontier AI model has fallen below $0.001 per 1K tokens on optimized infrastructure.
Key Takeaways
- AI inference costs down 60% YoY driven by custom silicon competition
- AWS Trainium3 delivers 3x price-performance improvement over NVIDIA H100 for transformer models
- Google TPU v6 pods now available in 12 regions with automatic right-sizing
- NVIDIA cuts H200 cloud pricing 25% and accelerates B300 availability to Q2 2026
- FinOps teams should audit AI workload placement: significant savings available through provider-native silicon
Download the Full Report (PDF)
Get the complete analysis with detailed charts, methodology, and data appendices.
Share this report
Related Reports
Week of March 24, 2026: AWS Announces Graviton5, Azure Cuts VM Pricing 12%
March 27, 2026
Week of March 17, 2026: Snowflake vs Databricks: The Data Platform Pricing War Escalates
March 20, 2026
Week of March 3, 2026: CNCF Releases Kubernetes Cost Management Standards v1.0
March 6, 2026
Subscribe for weekly updates
Get the IFO4 weekly briefing delivered to your inbox every Friday.
Subscribe Now