Models

Benchmarks

How Inserloft tracks and reports model performance.

Benchmarks give every model on Pixel a comparable, reproducible performance record.

Each model version tracked in the Model Registry can report benchmark results alongside its Model Card, covering areas such as:

General reasoning and knowledge
Coding ability
Long-context comprehension
Instruction following
Latency and throughput, particularly relevant for Baby LLMs

Benchmark results are versioned together with the model, so a Kyro 3 result and a later Kyro 3.5 result can be compared directly without ambiguity about which weights produced which score.

← Model Cards Introduction →