Inference Capacity Planner.
How many GPUs do you actually need? Input your model, peak traffic, and serving engine — and get a replica count, annual cost, and API vs self-host comparison.
Throughput estimated from empirical baselines scaled by GPU compute, model efficiency, and engine factor. Multi-GPU scaling uses conservative tensor-parallel efficiency (1×, 1.75×, 3.2×, 5.5× for 1/2/4/8 GPUs).
Want capacity estimates for your specific setup?
We'll send you a detailed capacity plan based on your inputs.
Already running inference? See how close you are to the plan.
Most teams provision 2–3× what they actually need. piqc shows you the gap between your planned capacity and what your cluster is actually using.
More Calculators
View all →$/Token vs. GPU Utilization
See how utilization rate drives cost per token — and what recovering waste saves.
Procurement Deferral Calculator
How many months does fleet optimization delay your next hardware order?
Capacity Risk Calculator
Find your GPU ordering deadline before traffic growth outpaces your cluster.
GPU Waste Calculator
Estimate how much your inference fleet could recover through rightsizing.
GPU Inference TCO Calculator
Compare total cost of ownership across cloud providers.
Build vs. Buy: GPU Control Plane
Model engineering time, maintenance cost, and 3-year total cost.
GPU Sizing Calculator
Get a GPU type, node count, and scaling strategy recommendation.
GPU Fleet Cost Optimizer
Find the lowest-cost configuration for your throughput requirements.
KV Cache & Context Window Cost
See how KV cache memory scales with context length and batch size.
CPU:GPU Ratio Calculator
Find the gap as AI shifts from batch inference to multi-agent orchestration.