Full results: XSA vs Baseline on CC12M

All runs trained 32 epochs on CC12M at 224px resolution. Zero-shot evaluation on ImageNet.
Model Tokens Type Val Loss ZS Top-1 ZS Top-5 Δ Top-1
Δ Top-1 = XSA minus matched Baseline (percentage points).