Week of March 10, 2026: AI Inference Costs Plummet 60% as New Silicon Enters Market

AI InfrastructureInference CostsGPU PricingNVIDIACustom Silicon

Executive Summary

A wave of custom AI inference chips from AWS (Trainium3), Google (TPU v6), and Microsoft (Maia 2) has driven inference costs down 60% year-over-year. NVIDIA responded by cutting H200 pricing and announcing aggressive B300 availability timelines. The cost to run a frontier AI model has fallen below $0.001 per 1K tokens on optimized infrastructure.

Key Takeaways

AI inference costs down 60% YoY driven by custom silicon competition
AWS Trainium3 delivers 3x price-performance improvement over NVIDIA H100 for transformer models
Google TPU v6 pods now available in 12 regions with automatic right-sizing
NVIDIA cuts H200 cloud pricing 25% and accelerates B300 availability to Q2 2026
FinOps teams should audit AI workload placement: significant savings available through provider-native silicon