💡Machine Learning on iOS

Spring 2026 | Jay Zheng

What is Machine Learning?

Normally when you write a program, you spell out every rule explicitly — "if the pixel is red, do this; if it's blue, do that." That works fine for simple logic, but it breaks down fast for complex tasks like recognizing a cat in a photo. There are too many rules to write by hand.

Machine learning flips this around. Instead of writing the rules yourself, you feed the computer a pile of data (say, thousands of labeled photos) and let an algorithm figure out the rules on its own. That process produces a model.

Core idea: Data + Algorithm -> Model

Think of the model as a black box — you don't need to know what's going on inside it. All you need to know is:

  • It takes some input (an image, a sentence, a number).

  • It produces some output (a label, a score, a prediction).

For example, in this demo app below, the input is a photo and the output is a label like "golden retriever — 94% confident."

The algorithm you choose depends on what you're trying to do. Classifying images calls for a different algorithm than predicting house prices or generating text. That's most of what machine learning research is about — figuring out which algorithms work best for which problems (you can learn more in CS 3780). For this demo, we're using a pre-trained neural network called MobileNetV2 that's already been tuned for image classification.

Machine Learning on iOS

When you hear "AI" or "ML," you probably think of something that lives in the cloud — you send a request to ChatGPT or Claude, OpenAI's or Anthropic's servers process it and you get a response back.

iOS does it differently. Apple wanted ML to run on the device itself, for a few reasons:

  • Privacy — your data never leave the phone.

  • Speed — Light weight model and instant result.

  • Offline — Not wifi/cellular network dependent.

  • No server costs — you don't pay for token.

To make this practical, Apple built a stack of frameworks:

Framework
What it does

Core ML

Runs ML models on the iPhone's chips, and Neuro Engine introduced in 2017 with A11 Bionic Chip.

Vision

High-level wrapper for image tasks (classification, face/text detection).

Natural Language

Same idea, but for text.

Create ML

An app that lets you train your own models without writing ML code.

The model file format Apple uses is .mlmodel. You drag one into your Xcode project and Xcode automatically generates a Swift class for it — that's why you can write MobileNetV2() in code below without ever defining that class yourself.

To get started, visit each explandable below

  • Demo has a working image classification app, you can try it and see how it works

  • Limitation on MobileNetV2 talks about what the model we use in the demo can not do

  • Training your own model with createML gives you the steps to training your own ML model that you can use in your own project.

Demo
22MB
Open

How does demo app ?work

  1. You tap a button, the photo picker opens

  2. You pick an image, then the image is shown on the screen

  3. The App runs the image through MobileNetV2, the pre-trained model that we are using

  4. It outputs the confidence score

None of this requires the internet

Try the app

Once you have the demo running, experiment:

  1. Pick photos of objects you'd expect it to know (dog, banana, laptop), if you are running on a simulator, you can find an image online and drag into the phone simulator.

  2. Pick photos of things it won't know (your face, a screenshot, a meme). Watch it guess wrong — and notice the confidence score is often still high. This is an important note: confidence is not correctness.

  3. Try a blurry or dark photo. See how the predictions degrade.

File walk through

imageClassifierDemoApp.swift - App entry point. It launches ContentView.

MobileNetV2.mlmodel - The a pre-trained model from Apple's model gallery that recognizes 1,000 categories of objects (dogs, cars, fruit, instruments, etc.) — the standard ImageNet classes.

  • When you click on it in Xcode, you can see its inputs (a 224×224 image) and outputs (a label + confidence). You don't write any code for this file — Xcode auto-generates a class called MobileNetV2 so Swift can use it.

Classifier.swift — the ML logic

This is the file that actually runs the model. It's is only aobut 20 lines of code, here is the skeleton structure:

  1. Load the model

Wraps the auto-generated MobileNetV2 class in a VNCoreMLModel so the Vision framework can use it.

  1. Build a request

  • A VNCoreMLRequest says "I want to run this model."

  • A VNImageRequestHandler says "…on this specific image."

  • It is separated so that you may run request on the same image in the case you want to do that

  1. Run it

  • This line of code runs the inference. It is synchronous but fast (usually < 100 ms on a newer iPhone).

  1. Read the result

The model returns a sorted list of guesses, we grab the best guess — its label "identifier" and how sure it is, "confidence", a 0–1 float.

ContentView.swift — the UI

State

Tracks what the user picked and holds an instance of our Classifier.

The image / placeholder

In the file, we have a ZStack that shows either the picked image or a "Select image…" prompt if nothing is picked yet.

The result overlay

When classifier.result isn't nil, a panel appears at the top showing the labels and confidence. Because Classifier is @Observable, this panel appears automatically as soon as the model finishes.

The picker button (in the toolbar)

This is the link between the UI and ML logic code.

When the user picks a photo: we tell our phone to load the image data → store it for display → hand it to the classifier. The Task { … await … } is async because loading photo data from the library may take a moment.

Limitation on MobileNetV2

It's worth mentioning what this model can't do:

  • This model has only 1000 fixed classes, it does not have class for things like "phone screen", "whiteboard", or anything outside ImageNet

  • Can not give you a good guess for something outside of the image net, it will pick the closest of 1,000 classes

  • It only gives you the best guess of the image, but only one label, so if your image is "a person and a dog", then it will give you only either a dog or a person. To have multiple label, you may check out this project.

  • Only Image classification, this model does not classify video, or live camera.

If any of those limits matter for your app, you may train your own model with Create ML, or pick a different model from Apple's model gallery.

Training your own model with Create ML

MobileNetV2 only knows the 1,000 ImageNet classes. If you want to recognize, for example

  • difference between a golden retriever and your specific dog,

  • or good vs. burnt cookies, you need your own model.

That's where Create ML comes in. It is bundled with Xcode.

  • You can use use Spotlight Search (Cmd + Space) and type "Create ML"

Steps:

  1. Open Create ML → choose Image Classification.

  2. Drop in folders of training images, one folder per category.

  3. Click Train. (it may take couple minutes, but it should not take hours on a Mac.)

  4. Export the resulting .mlmodel file

  5. Drag it into Xcode and use it in code exactly like we use MobileNetV2 in the above demo.

Last updated

Was this helpful?