SB SmartBreeds.io

Reliability ledger

Artifact-backed Tsinghua100 dense results.

Public metrics for the frozen DINOv2-small SmartBreeds research harness: accuracy, calibration, conformal coverage, per-breed coverage, and the weak-class false-inclusion diagnostic.

Headline metrics

Current artifact summary

Weak-class diagnostic

Combined-confuser false-inclusion

These values come from the local target-vs-confuser probe. They are diagnostic rows, not replacements for the 100-way classifier.

Per-class breakdown

Coverage and calibration by breed

Each breed has 20 held-out test examples. Per-class ECE is a diagnostic 10-bin top-confidence value within that breed subset.

100 breeds

Breed Coverage ECE Set size Misses

Methodology

What this endpoint proves

Protocol

Frozen DINOv2-small embeddings, nearest-prototype classification, temperature scaling on the calibration split, and selected global RAPS at target coverage 0.90.

Refresh

Loading refresh metadata...

Boundary

This is a research-harness result on a Tsinghua Dogs subset. It is not a full benchmark claim, a production classifier guarantee, or permission to publish dataset-derived dog images.