Public app plus private research track
SmartBreeds.io
Dog-breed photo search backed by a calibrated fine-grained vision study on a Tsinghua Dogs subset.
Current research result
Aggregate calibration is strong. Class coverage is still the hard part.
The Tsinghua100 dense run uses 8,000 images across 100 breeds. DINOv2-small prototype scores are temperature-scaled, then evaluated with RAPS conformal prediction sets.
- Top-1 accuracy is 0.846 and ECE is 0.051 after temperature scaling.
- Selected RAPS reaches 0.968 aggregate coverage with mean set size 2.59.
- Per-class disaggregation exposes weak breeds instead of hiding them behind aggregate metrics.
Measured, not inflated
The useful story is narrower than a product claim.
SmartBreeds is not presented as a final benchmark or a deployed guarantee. The result is a reproducible calibration study with visible failure modes.
Reliability visuals
Charts are part of the claim boundary.
These figures are public-safe research visuals. Dataset-derived dog photos stay private until the license review is complete.
Reliability readout
| Quantity | Value | Scope |
|---|---|---|
| ECE | 0.0508 | 2,000-image test split |
| Top-1 | 0.8455 | DINOv2-small prototypes |
| 0.9-1.0 bin | 0.968 confidence / 0.976 accuracy | 918 predictions |
Lowest global RAPS classes
| Breed | Coverage | Mean set size |
|---|---|---|
| great_dane | 0.80 | 3.45 |
| lhasa | 0.85 | 2.35 |
| tibetan_mastiff | 0.85 | 2.50 |
Next research gate
Weak-class coverage gets priority over bigger claims.
The next work targets lhasa, tibetan mastiff, great dane, and the worst Mondrian class. The research question is whether structured pooling can tighten sets without burying class-specific failures.
Product routes