Capacity & Cost EstimatorsTool 5 of 8
Estimators
GPU Fleet Sizing Calculator
Size a self-hosted GPU fleet. Input model size, quantization, and QPS targets to get GPU count and monthly cost.
Model Weights
3.5 GB
Memory / Instance
4.5 GB
Weights + KV-cache
GPUs / Instance
1
A10G (24 GB)
QPS / Instance
200
Instances Needed
5
Before redundancy
Total Instances
10
With 2x buffer
Total GPUs
10
A10G (24 GB)
Max Concurrent / Instance
78
requests
On-Demand
Hourly$10
Monthly$7.3K
Reserved (1-year)
Hourly$6
Monthly$4.4K
Savings$2.9K/mo
Estimates as of March 2026. QPS numbers are approximate and vary with request complexity, context length, and batching efficiency. KV-cache overhead estimated at 30% of model weights. Actual GPU utilization depends on continuous batching configuration (vLLM, TensorRT-LLM). GPU pricing reflects typical cloud on-demand and 1-year reserved rates.