๐ง CIFAR-10 Image Classification
Custom CNN ยท MobileNetV2 ยท ResNet-18 โ a controlled comparison of from-scratch vs transfer learning
Tip: These models were trained on 32ร32 CIFAR-10 thumbnails. They work best on simple, centred images of a single object โ use the example images below for reliable results.
On real-world high-resolution photos the Custom CNN (trained on 32ร32) often looks more plausible because its 32ร32 resize pipeline matches its training distribution, while MobileNetV2 sees a 224ร224 upscale it was never trained on. This is the classic domain gap โ not evidence that the smaller model is better.
Example images โ click to load
Compare All Deployed Models
Load an image above, then click the button to classify it with Custom CNN, MobileNetV2, ResNet-18 simultaneously.
Architecture Benchmark
All models were evaluated under a consistent training budget on the full CIFAR-10 dataset, using Adam, lr=0.001, 15 epochs, and batch size 128, with architecture-specific adaptations where required.
Deployed in this demo: Custom CNN, MobileNetV2, ResNet-18.
EfficientNet-B0 and ViT-Small are included in the table for study comparison only.
| Model | Accuracy | Trainable Params | Total Params | Size | CPU Latency | Strategy | Status |
|---|---|---|---|---|---|---|---|
| Custom CNN | 48.40% | 2,462,282 | 2,462,282 | 9.42 MB | 1.38 ms | Trained from scratch | live |
| MobileNetV2 | 86.91% | 12,810 | 2,236,682 | 8.76 MB | 17.22 ms | Transfer learning (frozen ImageNet backbone) | live |
| ResNet-18 | 87.48% | 5,130 | 11,181,642 | 42.73 MB | 9.80 ms | Transfer learning (frozen ImageNet backbone, linear-probe head) | live |
| EfficientNet-B0 | 83.75% | 12,810 | 4,020,358 | 15.62 MB | 11.20 ms | Transfer learning (frozen backbone) | study only |
| ViT-Small | 62.30% | 4,756,746 | 4,756,746 | 12.21 MB | 3.45 ms | Minimal ViT trained from scratch | study only |
Key Findings
- ResNet-18 โ 87.48% accuracy with only 5,130 trainable parameters (fewest of any deployed model).
- MobileNetV2 โ 86.91% accuracy with 12,810 trainable parameters (frozen backbone + linear head).
- Custom CNN โ 48.40% accuracy with 2.46M trainable parameters (trained from scratch โ demonstrates the transfer-learning gap).
- ResNet-18 achieves +39.1 percentage points over the Custom CNN with ~480ร fewer trainable parameters.
Top-5 Hardest Confusion Pairs
Bidirectional misclassification counts on the full 10,000-image test set, comparing the Custom CNN (before transfer learning) to MobileNetV2 (after).
| Pair | Custom CNN | MobileNetV2 | Reduction |
|---|---|---|---|
| Truck โ Automobile | 432 | 97 | 78% |
| Ship โ Airplane | 375 | 83 | 78% |
| Cat โ Dog | 333 | 243 | 27% |
| Horse โ Dog | 293 | 68 | 77% |
| Bird โ Deer | 180 | 78 | 57% |
CIFAR-10 Image Classification
How much does a pretrained backbone actually help compared to training from scratch โ when every model shares the same dataset, optimiser, and epoch budget?
This project answers that question through a controlled, end-to-end deep learning experiment. Five architectures are trained and evaluated on the CIFAR-10 benchmark. Three are deployed here as a live demo. The goal is not simply to report accuracy โ it is to surface the real trade-offs between model complexity, training strategy, parameter efficiency, latency, and generalisation.
Why I Built It
Transfer learning is often presented as an obvious win, but the degree of that advantage is rarely quantified under fair conditions. I wanted to run a structured experiment โ identical training budget, same dataset, same optimiser family โ and let the numbers speak. The result is a reproducible comparison that is useful both as a portfolio piece and as a practical reference for model selection decisions.
What This Project Demonstrates
- End-to-end ML pipeline โ data loading, augmentation (RandomCrop, CutOut, MixUp, CutMix), training with cosine annealing, evaluation, and deployment
- Transfer learning vs from-scratch training โ with measurable, empirical results
- Parameter efficiency โ ResNet-18 achieves top accuracy with only 5,130 trainable parameters
- Deployment engineering โ lazy model loading, device-aware inference, Gradio UI on HF Spaces
- Interpretability โ Grad-CAM visualisations to understand model attention
- Benchmarking discipline โ latency, throughput, model size, and confusion analysis alongside accuracy
Key Findings
| Model | Approach | Trainable Params | Accuracy |
|---|---|---|---|
| Custom CNN | Trained from scratch | 2,462,282 | 48.40% |
| MobileNetV2 | Transfer learning | 12,810 | 86.91% |
| ResNet-18 | Transfer learning | 5,130 | 87.48% |
- ResNet-18 reaches 87.48% accuracy with just 5,130 trainable parameters โ a frozen ImageNet backbone with a single linear head.
- MobileNetV2 matches it closely at 86.91%, with a slightly larger head and a smaller total model footprint.
- Custom CNN, trained entirely from scratch, tops out at 48.40% over 15 epochs โ a +39 percentage-point gap that makes the transfer learning advantage concrete and measurable.
- The transfer models converge in 1โ3 epochs. The Custom CNN is still climbing at epoch 15.
โ ๏ธ Limitations โ The Domain Gap
All models were trained on 32ร32 CIFAR-10 thumbnails. This has a direct impact on real-world performance:
- Uploading a high-resolution photo forces a resize that the models were never trained on.
- The Custom CNN is resized to 32ร32, which matches its training pipeline โ so it may look more confident on real images, but this is misleading.
- MobileNetV2 and ResNet-18 upscale to 224ร224, which is outside their fine-tuning distribution.
For reliable results, use the provided example images or simple, centred, single-object photos against a plain background.
Tech Stack
| Layer | Technology |
|---|---|
| Framework | PyTorch 2.0+ |
| Pretrained Backbones | torchvision (MobileNetV2, ResNet-18, EfficientNet-B0) |
| Dataset | CIFAR-10 via torchvision |
| Evaluation | scikit-learn |
| Visualisation | Matplotlib, Seaborn |
| Interpretability | Grad-CAM (PyTorch hooks) |
| Notebook | Jupyter โ 14-section structured pipeline |
| Demo App | Gradio 5.29 on Hugging Face Spaces |
| Model Weights | Hugging Face Hub |
About the Developer
Pouya Alavi Naeini is a final-year Information Technology student at Macquarie University, Sydney, specialising in Artificial Intelligence and Web/App Development. He builds practical software and machine learning projects with a focus on clean engineering, reproducibility, and deployment.
ยท GitHub ยท LinkedIn ยท Portfolio ยท Source Code