Reliability ledger
Artifact-backed Tsinghua100 dense results.
Public metrics for the frozen DINOv2-small SmartBreeds research harness: accuracy, calibration, conformal coverage, per-breed coverage, and the weak-class false-inclusion diagnostic.
Headline metrics
Current artifact summary
Weak-class diagnostic
Combined-confuser false-inclusion
These values come from the local target-vs-confuser probe. They are diagnostic rows, not replacements for the 100-way classifier.
Per-class breakdown
Coverage and calibration by breed
Each breed has 20 held-out test examples. Per-class ECE is a diagnostic 10-bin top-confidence value within that breed subset.
100 breeds
Methodology
What this endpoint proves
Protocol
Frozen DINOv2-small embeddings, nearest-prototype classification, temperature scaling on the calibration split, and selected global RAPS at target coverage 0.90.
Refresh
Loading refresh metadata...
Boundary
This is a research-harness result on a Tsinghua Dogs subset. It is not a full benchmark claim, a production classifier guarantee, or permission to publish dataset-derived dog images.