💡Machine Learning on iOS
Spring 2026 | Jay Zheng
What is Machine Learning?
Normally when you write a program, you spell out every rule explicitly — "if the pixel is red, do this; if it's blue, do that." That works fine for simple logic, but it breaks down fast for complex tasks like recognizing a cat in a photo. There are too many rules to write by hand.
Machine learning flips this around. Instead of writing the rules yourself, you feed the computer a pile of data (say, thousands of labeled photos) and let an algorithm figure out the rules on its own. That process produces a model.
Core idea: Data + Algorithm -> Model
Think of the model as a black box — you don't need to know what's going on inside it. All you need to know is:
It takes some input (an image, a sentence, a number).
It produces some output (a label, a score, a prediction).
For example, in this demo app below, the input is a photo and the output is a label like "golden retriever — 94% confident."
The algorithm you choose depends on what you're trying to do. Classifying images calls for a different algorithm than predicting house prices or generating text. That's most of what machine learning research is about — figuring out which algorithms work best for which problems (you can learn more in CS 3780). For this demo, we're using a pre-trained neural network called MobileNetV2 that's already been tuned for image classification.
Machine Learning on iOS
When you hear "AI" or "ML," you probably think of something that lives in the cloud — you send a request to ChatGPT or Claude, OpenAI's or Anthropic's servers process it and you get a response back.
iOS does it differently. Apple wanted ML to run on the device itself, for a few reasons:
Privacy — your data never leave the phone.
Speed — Light weight model and instant result.
Offline — Not wifi/cellular network dependent.
No server costs — you don't pay for token.
To make this practical, Apple built a stack of frameworks:
Core ML
Runs ML models on the iPhone's chips, and Neuro Engine introduced in 2017 with A11 Bionic Chip.
Vision
High-level wrapper for image tasks (classification, face/text detection).
Natural Language
Same idea, but for text.
Create ML
An app that lets you train your own models without writing ML code.
The model file format Apple uses is .mlmodel. You drag one into your Xcode project and Xcode automatically generates a Swift class for it — that's why you can write MobileNetV2() in code below without ever defining that class yourself.
To get started, visit each explandable below
Demo has a working image classification app, you can try it and see how it works
Limitation on MobileNetV2 talks about what the model we use in the demo can not do
Training your own model with createML gives you the steps to training your own ML model that you can use in your own project.
Demo
How does demo app ?work
You tap a button, the photo picker opens
You pick an image, then the image is shown on the screen
The App runs the image through MobileNetV2, the pre-trained model that we are using
It outputs the confidence score
None of this requires the internet
Try the app
Once you have the demo running, experiment:
Pick photos of objects you'd expect it to know (dog, banana, laptop), if you are running on a simulator, you can find an image online and drag into the phone simulator.
Pick photos of things it won't know (your face, a screenshot, a meme). Watch it guess wrong — and notice the confidence score is often still high. This is an important note: confidence is not correctness.
Try a blurry or dark photo. See how the predictions degrade.
File walk through
imageClassifierDemoApp.swift - App entry point. It launches ContentView.
MobileNetV2.mlmodel - The a pre-trained model from Apple's model gallery that recognizes 1,000 categories of objects (dogs, cars, fruit, instruments, etc.) — the standard ImageNet classes.
When you click on it in Xcode, you can see its inputs (a 224×224 image) and outputs (a label + confidence). You don't write any code for this file — Xcode auto-generates a class called
MobileNetV2so Swift can use it.
Classifier.swift — the ML logic
Classifier.swift — the ML logicThis is the file that actually runs the model. It's is only aobut 20 lines of code, here is the skeleton structure:
Load the model
Wraps the auto-generated MobileNetV2 class in a VNCoreMLModel so the Vision framework can use it.
Build a request
A
VNCoreMLRequestsays "I want to run this model."A
VNImageRequestHandlersays "…on this specific image."It is separated so that you may run request on the same image in the case you want to do that
Run it
This line of code runs the inference. It is synchronous but fast (usually < 100 ms on a newer iPhone).
Read the result
The model returns a sorted list of guesses, we grab the best guess — its label "identifier" and how sure it is, "confidence", a 0–1 float.
ContentView.swift — the UI
ContentView.swift — the UIState
Tracks what the user picked and holds an instance of our Classifier.
The image / placeholder
In the file, we have a ZStack that shows either the picked image or a "Select image…" prompt if nothing is picked yet.
The result overlay
When classifier.result isn't nil, a panel appears at the top showing the labels and confidence. Because Classifier is @Observable, this panel appears automatically as soon as the model finishes.
The picker button (in the toolbar)
This is the link between the UI and ML logic code.
When the user picks a photo: we tell our phone to load the image data → store it for display → hand it to the classifier. The Task { … await … } is async because loading photo data from the library may take a moment.
Limitation on MobileNetV2
It's worth mentioning what this model can't do:
This model has only 1000 fixed classes, it does not have class for things like "phone screen", "whiteboard", or anything outside ImageNet
Can not give you a good guess for something outside of the image net, it will pick the closest of 1,000 classes
It only gives you the best guess of the image, but only one label, so if your image is "a person and a dog", then it will give you only either a dog or a person. To have multiple label, you may check out this project.
Only Image classification, this model does not classify video, or live camera.
If any of those limits matter for your app, you may train your own model with Create ML, or pick a different model from Apple's model gallery.
Training your own model with Create ML
MobileNetV2 only knows the 1,000 ImageNet classes. If you want to recognize, for example
difference between a golden retriever and your specific dog,
or good vs. burnt cookies, you need your own model.
That's where Create ML comes in. It is bundled with Xcode.
You can use use Spotlight Search (Cmd + Space) and type "Create ML"
Steps:
Open Create ML → choose Image Classification.
Drop in folders of training images, one folder per category.
Click Train. (it may take couple minutes, but it should not take hours on a Mac.)
Export the resulting
.mlmodelfileDrag it into Xcode and use it in code exactly like we use
MobileNetV2in the above demo.
Last updated
Was this helpful?