Models
Benchmarks
How Inserloft tracks and reports model performance.
Benchmarks give every model on Pixel a comparable, reproducible performance record.
Each model version tracked in the Model Registry can report benchmark results alongside its Model Card, covering areas such as:
- General reasoning and knowledge
- Coding ability
- Long-context comprehension
- Instruction following
- Latency and throughput, particularly relevant for Baby LLMs
Benchmark results are versioned together with the model, so a Kyro 3 result and a later Kyro 3.5 result can be compared directly without ambiguity about which weights produced which score.