Products · Introspect
Your cluster's X-ray
Understand the relationship between every model in your fleet and the hardware it runs on. Starts as a read-only one-time scan — graduates to a lightweight agent for continuous monitoring with no changes to your serving stack.
- Auto-discovers vLLM, Triton, KServe, SGLang, Ollama, TGI
- Understands model-hardware fit — not just GPU utilization
- Continuous safety signals: KV cache pressure, OOM risk, queue depth — every 15 seconds
- Performance and structural signals collected at longer intervals — zero added load on workloads
- Reads from your existing Prometheus — no duplicate scraping
introspect.yaml
discover: runtimes: [vllm, triton, kserve, sglang] depth: deep emit: - workload.memory_shape - workload.kv_cache_profile - workload.batch_dynamics mode: read-only