ParallelIQ
Products · Introspect

Your cluster's X-ray

Understand the relationship between every model in your fleet and the hardware it runs on. Starts as a read-only one-time scan — graduates to a lightweight agent for continuous monitoring with no changes to your serving stack.

  • Auto-discovers vLLM, Triton, KServe, SGLang, Ollama, TGI
  • Understands model-hardware fit — not just GPU utilization
  • Continuous safety signals: KV cache pressure, OOM risk, queue depth — every 15 seconds
  • Performance and structural signals collected at longer intervals — zero added load on workloads
  • Reads from your existing Prometheus — no duplicate scraping
introspect.yaml
discover:
  runtimes: [vllm, triton, kserve, sglang]
  depth: deep
emit:
  - workload.memory_shape
  - workload.kv_cache_profile
  - workload.batch_dynamics
mode: read-only

The rest of the platform

Get more from the cluster you already have.

Start for Free