Full results: XSA vs Baseline on CC12M
All runs trained 32 epochs on CC12M at 224px resolution. Zero-shot evaluation on ImageNet.
Model
Tokens
Type
Val Loss
ZS Top-1
ZS Top-5
Δ Top-1
Δ Top-1 = XSA minus matched Baseline (percentage points).