Every cloud API call your app makes has three problems: it takes time (round-trip latency), it costs money (per-request pricing), and it fails without connectivity. On-device AI solves all three. The question in 2024 was "is it good enough?" In 2026, on flagship and mid-range devices, the answer is yes for a wide class of tasks.
What On-Device Models Can Do in 2026
Classification, entity extraction, sentiment analysis, summarisation (short texts), image labelling, object detection, text embedding for semantic search, code completion suggestions, and lightweight language understanding. Gemini Nano handles conversational summarisation and extraction well. MediaPipe handles vision tasks with excellent accuracy/speed trade-offs.
Gemini Nano via AICore
Android AICore is the system-level infrastructure for on-device AI. On supported Pixel and Samsung devices (and expanding), your app can call Gemini Nano through the AICore API without bundling model weights in your APK. The model is shared system-wide (0 MB impact on your app size), and Google handles model updates. Use AICore for text tasks where Gemini Nano fits.
MediaPipe for Vision Tasks
MediaPipe Solutions offers production-ready on-device models for face detection, pose estimation, hand tracking, image classification, object detection, and text classification. Integration is 3–5 lines of Kotlin with the Tasks API. Inference runs on GPU/NNAPI on modern Android — typically 10–50 ms per frame. Ideal for real-time camera apps without any network dependency.
Bundled Models for Guaranteed Availability
For tasks where you need guaranteed availability on all devices (not just AICore-supported ones), bundle a quantised (INT8/INT4) ONNX or TensorFlow Lite model in your APK or as an on-demand delivery asset. Use Play Asset Delivery to ship the model as an on-demand module — users only download it when they need the feature, keeping your base APK small.
Measuring the Impact
In a calorie tracking app, we replaced a server-side food classification endpoint (avg 340 ms round-trip) with an on-device MobileNetV3 classifier (avg 28 ms). Result: 12x latency improvement, zero per-request cost, and the feature works offline. For high-frequency operations like real-time classification, on-device isn't just better for performance — it's the only viable architecture.