๐ง CIFAR-10 Image Classification
Custom CNN ยท MobileNetV2 ยท ResNet-18 โ a controlled comparison of from-scratch vs transfer learning
Tip: These models were trained on 32ร32 CIFAR-10 thumbnails. They work best on simple, centred images of a single object โ use the example images below for reliable results.
On real-world high-resolution photos the Custom CNN (trained on 32ร32) often looks more plausible because its 32ร32 resize pipeline matches its training distribution, while MobileNetV2 sees a 224ร224 upscale it was never trained on. This is the classic domain gap โ not evidence that the smaller model is better.
Example images โ click to load
Compare All Deployed Models
Load an image above, then click the button to classify it with Custom CNN, MobileNetV2, ResNet-18 simultaneously.
Architecture Benchmark
All models were evaluated under a consistent training budget on the full CIFAR-10 dataset, using Adam, lr=0.001, 15 epochs, and batch size 128, with architecture-specific adaptations where required.
Deployed in this demo: Custom CNN, MobileNetV2, ResNet-18.
EfficientNet-B0 and ViT-Small are included in the table for study comparison only.
| Model | Accuracy | Trainable Params | Total Params | Size | CPU Latency | Strategy | Status |
|---|---|---|---|---|---|---|---|
| Custom CNN | 48.40% | 2,462,282 | 2,462,282 | 9.42 MB | 1.38 ms | Trained from scratch | live |
| MobileNetV2 | 86.91% | 12,810 | 2,236,682 | 8.76 MB | 17.22 ms | Transfer learning (frozen ImageNet backbone) | live |
| ResNet-18 | 87.48% | 5,130 | 11,181,642 | 42.73 MB | 9.80 ms | Transfer learning (frozen ImageNet backbone, linear-probe head) | live |
| EfficientNet-B0 | 83.75% | 12,810 | 4,020,358 | 15.62 MB | 11.20 ms | Transfer learning (frozen backbone) | study only |
| ViT-Small | 62.30% | 4,756,746 | 4,756,746 | 12.21 MB | 3.45 ms | Minimal ViT trained from scratch | study only |
Key Findings
- ResNet-18 โ 87.48% accuracy with only 5,130 trainable parameters (fewest of any deployed model).
- MobileNetV2 โ 86.91% accuracy with 12,810 trainable parameters (frozen backbone + linear head).
- Custom CNN โ 48.40% accuracy with 2.46M trainable parameters (trained from scratch โ demonstrates the transfer-learning gap).
- ResNet-18 achieves +39.1 percentage points over the Custom CNN with ~480ร fewer trainable parameters.
Top-5 Hardest Confusion Pairs
Bidirectional misclassification counts on the full 10,000-image test set, comparing the Custom CNN (before transfer learning) to MobileNetV2 (after).
| Pair | Custom CNN | MobileNetV2 | Reduction |
|---|---|---|---|
| Truck โ Automobile | 432 | 97 | 78% |
| Ship โ Airplane | 375 | 83 | 78% |
| Cat โ Dog | 333 | 243 | 27% |
| Horse โ Dog | 293 | 68 | 77% |
| Bird โ Deer | 180 | 78 | 57% |
About This Project
How much does a pretrained ImageNet backbone actually help compared to training from scratch โ when architectures share the same training budget?
Dataset
CIFAR-10 โ 60,000 32ร32 RGB images across 10 classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck. 50,000 training images ยท 10,000 test images.
โ ๏ธ Domain Gap โ Real-World Images
These models were trained exclusively on 32ร32 thumbnail-style images. High-resolution or complex real-world photos may produce unexpected predictions โ this is the classic domain gap. For best results use the example images below, or simple close-up shots of a single object against a plain background.
Training Setup
- Optimiser: Adam (lr 0.001, wd 0.0001), CosineAnnealingLR, 15 epochs, batch 128
- Custom CNN: trained from scratch with RandomCrop, HFlip, CutOut, MixUp, and CutMix augmentation
- Transfer models (MobileNetV2, ResNet-18): frozen ImageNet backbone; only the classification head is trained; no augmentation
Deployed Models
| Model | Approach | Trainable Params | Accuracy |
|---|---|---|---|
| Custom CNN | Trained from scratch | 2,462,282 | 48.40% |
| MobileNetV2 | Transfer learning | 12,810 | 86.91% |
| ResNet-18 | Transfer learning | 5,130 | 87.48% |
Links
Pouya Alavi Naeini โ BIT student, Macquarie University (AI & Web/App Development).
PyTorch ยท Gradio ยท Hugging Face Spaces