🧠 CIFAR-10 Image Classification

Custom CNN · MobileNetV2 · ResNet-18 — a controlled comparison of from-scratch vs transfer learning

Tip: These models were trained on 32×32 CIFAR-10 thumbnails. They work best on simple, centred images of a single object — use the example images below for reliable results.

Preprocessing note: All models now use aspect-ratio-preserving preprocessing. Landscape or portrait uploads are letterboxed / center-cropped rather than squashed, which significantly improves results on real-world photos.

Domain gap: Even with correct preprocessing, these models were trained exclusively on 32×32 thumbnail-style images. Complex real-world photos (motion blur, cluttered backgrounds, unusual angles) will produce lower confidence — this is expected and honest, not a bug. Low confidence is itself useful information: it tells you the image looks unlike what the model trained on.

Upload an image

Model

Predictions

Example images — click to load

Examples

Compare All Deployed Models

Load an image above, then click the button to classify it with Custom CNN, MobileNetV2, ResNet-18 simultaneously.

Custom CNN

MobileNetV2

ResNet-18

Model	Accuracy	Trainable Params	Total Params	Size	CPU Latency	Strategy	Status
Custom CNN	48.40%	2,462,282	2,462,282	9.42 MB	1.38 ms	Trained from scratch	live
MobileNetV2	86.91%	12,810	2,236,682	8.76 MB	17.22 ms	Transfer learning (frozen ImageNet backbone)	live
ResNet-18	87.48%	5,130	11,181,642	42.73 MB	9.80 ms	Transfer learning (frozen ImageNet backbone, linear-probe head)	live
EfficientNet-B0	83.75%	12,810	4,020,358	15.62 MB	11.20 ms	Transfer learning (frozen backbone)	study only
ViT-Small	62.30%	4,756,746	4,756,746	12.21 MB	3.45 ms	Minimal ViT trained from scratch	study only

Pair	Custom CNN	MobileNetV2	Reduction
Truck ↔ Automobile	432	97	78%
Ship ↔ Airplane	375	83	78%
Cat ↔ Dog	333	243	27%
Horse ↔ Dog	293	68	77%
Bird ↔ Deer	180	78	57%

CIFAR-10 Image Classification

How much does a pretrained backbone actually help compared to training from scratch — when every model shares the same dataset, optimiser, and epoch budget?

This project answers that question through a controlled, end-to-end deep learning experiment. Five architectures are trained and evaluated on the CIFAR-10 benchmark. Three are deployed here as a live demo. The goal is not simply to report accuracy — it is to surface the real trade-offs between model complexity, training strategy, parameter efficiency, latency, and generalisation.

Why I Built It

Transfer learning is often presented as an obvious win, but the degree of that advantage is rarely quantified under fair conditions. I wanted to run a structured experiment — identical training budget, same dataset, same optimiser family — and let the numbers speak. The result is a reproducible comparison that is useful both as a portfolio piece and as a practical reference for model selection decisions.

What This Project Demonstrates

End-to-end ML pipeline — data loading, augmentation (RandomCrop, CutOut, MixUp, CutMix), training with cosine annealing, evaluation, and deployment
Transfer learning vs from-scratch training — with measurable, empirical results
Parameter efficiency — ResNet-18 achieves top accuracy with only 5,130 trainable parameters
Deployment engineering — lazy model loading, device-aware inference, Gradio UI on HF Spaces
Interpretability — Grad-CAM visualisations to understand model attention
Benchmarking discipline — latency, throughput, model size, and confusion analysis alongside accuracy

Key Findings

Model	Approach	Trainable Params	Accuracy
Custom CNN	Trained from scratch	2,462,282	48.40%
MobileNetV2	Transfer learning	12,810	86.91%
ResNet-18	Transfer learning	5,130	87.48%

ResNet-18 reaches 87.48% accuracy with just 5,130 trainable parameters — a frozen ImageNet backbone with a single linear head.
MobileNetV2 matches it closely at 86.91%, with a slightly larger head and a smaller total model footprint.
Custom CNN, trained entirely from scratch, tops out at 48.40% over 15 epochs — a +39 percentage-point gap that makes the transfer learning advantage concrete and measurable.
The transfer models converge in 1–3 epochs. The Custom CNN is still climbing at epoch 15.

⚠️ Limitations — The Domain Gap

All models were trained on 32×32 CIFAR-10 thumbnails. This has a direct impact on real-world performance:

Uploading a high-resolution photo forces a resize that the models were never trained on.
The Custom CNN is resized to 32×32, which matches its training pipeline — so it may look more confident on real images, but this is misleading.
MobileNetV2 and ResNet-18 upscale to 224×224, which is outside their fine-tuning distribution.

For reliable results, use the provided example images or simple, centred, single-object photos against a plain background.

Tech Stack

Layer	Technology
Framework	PyTorch 2.0+
Pretrained Backbones	torchvision (MobileNetV2, ResNet-18, EfficientNet-B0)
Dataset	CIFAR-10 via torchvision
Evaluation	scikit-learn
Visualisation	Matplotlib, Seaborn
Interpretability	Grad-CAM (PyTorch hooks)
Notebook	Jupyter — 14-section structured pipeline
Demo App	Gradio 5.29 on Hugging Face Spaces
Model Weights	Hugging Face Hub

About the Developer

Pouya Alavi Naeini is a final-year Information Technology student at Macquarie University, Sydney, specialising in Artificial Intelligence and Web/App Development. He builds practical software and machine learning projects with a focus on clean engineering, reproducibility, and deployment.

· GitHub · LinkedIn · Portfolio · Source Code

🧠 CIFAR-10 Image Classification

Example images — click to load

Compare All Deployed Models

Architecture Benchmark

Key Findings

Top-5 Hardest Confusion Pairs

CIFAR-10 Image Classification

Why I Built It

What This Project Demonstrates

Key Findings

⚠️ Limitations — The Domain Gap

Tech Stack

About the Developer