Models · inference
Models & inference
Registry, deployment topology, and inference performance for every deployed model.
Models in production
5
+2 in staging · 1 training
Requests · 24h
1.8M
+8.4% DoD
Avg p50 latency
418ms
-22ms vs 7d avg
Avg p99 latency
1.84s
+86ms vs 7d avgwithin SLO
Model registry
All deployed and staged models
Model
Kind
Status
Throughput
p50 / p99
Ctx
Philotic-1v2.4.1
self-hosted · 72B
Orchestrator
production
86.4M tok/hr
412ms / 1840ms
128K
Claude Sonnetv4.6
Anthropic · API
Reasoning
production
42.0M tok/hr
612ms / 2480ms
200K
Cali-OEMv3.0
Phil-1 ⊃ Caliber OEM corpus
Fine-tune
production
31.2M tok/hr
380ms / 1420ms
64K
Cali-Insurancev2.1
Phil-1 ⊃ insurance workflow
Fine-tune
production
18.4M tok/hr
396ms / 1560ms
64K
Embed-Sovereignv1.4
self-hosted · 768d
Embedding
production
6.2M tok/hr
38ms / 142ms
8K
Llamav3.3 · 70B
open · self-hosted
Reasoning
staging
2.4M tok/hr
540ms / 1980ms
128K
Banking-Arabicv1.0-rc
Phil-1 ⊃ MENA banking
Training
training
—
—
64K
Latency distribution · last 24h
Request latency histogram across all production models
p50 418ms·p95 1240ms·p99 1840ms
Routing
Active traffic split
Philotic-1 v2.4.168%
Cali-OEM v3.018%
Cali-Insurance v2.19%
Claude Sonnet 4.65%
Failover ready: Llama 3.3 70B
Last canary: 2026-04-26 · 0 regressions
Inference volume · 30d
Latency vs cost · per model
p50 latency against $/MTok — shape encodes model kind, hover for detail
Rate limits & quotas
Philotic-1 v2.4.1
184 rps / 250
0.04% errors
Claude Sonnet v4.6
96 rps / 250
0.08% errors
Cali-OEM v3.0
142 rps / 250
0.10% errors
Cali-Insurance v2.1
78 rps / 250
0.11% errors
Embed-Sovereign v1.4
1240 rps / 250
0.01% errors