Inserloftdocs

Models

Benchmarks

How Inserloft tracks and reports model performance.

Benchmarks give every model on Pixel a comparable, reproducible performance record.

Each model version tracked in the Model Registry can report benchmark results alongside its Model Card, covering areas such as:

  • General reasoning and knowledge
  • Coding ability
  • Long-context comprehension
  • Instruction following
  • Latency and throughput, particularly relevant for Baby LLMs

Benchmark results are versioned together with the model, so a Kyro 3 result and a later Kyro 3.5 result can be compared directly without ambiguity about which weights produced which score.