# Machine Learning on iOS

## What is Machine Learning?

Normally when you write a program, you spell out every rule explicitly — *"if the pixel is red, do this; if it's blue, do that."* That works fine for simple logic, but it breaks down fast for complex tasks like recognizing a cat in a photo. There are too many rules to write by hand.

Machine learning flips this around. Instead of writing the rules yourself, you feed the computer a pile of **data** (say, thousands of labeled photos) and let an **algorithm** figure out the rules on its own. That process produces a **model**.

> Core idea: **Data + Algorithm -> Model**

Think of the model as a **black box** — you don't need to know what's going on inside it. All you need to know is:

* It takes some **input** (an image, a sentence, a number).
* It produces some **output** (a label, a score, a prediction).

\
For example, in this demo app below, the input is a photo and the output is a label like *"golden retriever — 94% confident."*

The algorithm you choose depends on what you're trying to do. Classifying images calls for a different algorithm than predicting house prices or generating text. That's most of what machine learning research is about — figuring out which algorithms work best for which problems (you can learn more in CS 3780). For this demo, we're using a pre-trained neural network called **MobileNetV2** that's already been tuned for image classification.

## Machine Learning on iOS

When you hear "AI" or "ML," you probably think of something that lives in the cloud — you send a request to ChatGPT or Claude, OpenAI's or Anthropic's servers process it and you get a response back.

iOS does it differently. Apple wanted ML to run **on the device itself**, for a few reasons:

* **Privacy** — your data never leave the phone.
* **Speed** — Light weight model and instant result.
* **Offline** — Not wifi/cellular network dependent.
* **No server costs** — you don't pay for token.

To make this practical, Apple built a stack of frameworks:

| Framework            | What it does                                                                                    |
| -------------------- | ----------------------------------------------------------------------------------------------- |
| **Core ML**          | Runs ML models on the iPhone's chips, and Neuro Engine introduced in 2017 with A11 Bionic Chip. |
| **Vision**           | High-level wrapper for image tasks (classification, face/text detection).                       |
| **Natural Language** | Same idea, but for text.                                                                        |
| **Create ML**        | An app that lets you *train* your own models without writing ML code.                           |

The model file format Apple uses is `.mlmodel`. You drag one into your Xcode project and Xcode automatically generates a Swift class for it — that's why you can write `MobileNetV2()` in code below without ever defining that class yourself.

To get started, visit each explandable below

* **Demo** has a working image classification app, you can try it and see how it works
* **Limitation on MobileNetV2** talks about what the model we use in the demo can not do
* **Training your own model with createML** gives you the steps to training your own ML model that you can use in your own project.

<details>

<summary><strong>Demo</strong></summary>

{% file src="/files/mJP3Pgi50xlI5FD9Ez9j" %}

How does demo app ?work

1. You tap a button, the photo picker opens
2. You pick an image, then the image is shown on the screen
3. The App runs the image through MobileNetV2, the pre-trained model that we are using
4. It outputs the confidence score

{% hint style="info" %}
None of this requires the internet
{% endhint %}

### Try the app <a href="#try" id="try"></a>

Once you have the demo running, experiment:

1. Pick photos of objects you'd expect it to know (dog, banana, laptop), if you are running on a simulator, you can find an image online and drag into the phone simulator.&#x20;
2. Pick photos of things it *won't* know (your face, a screenshot, a meme). Watch it guess wrong — and notice the confidence score is often still high. **This is an important note: confidence is not correctness.**
3. Try a blurry or dark photo. See how the predictions degrade.

## File walk through

`imageClassifierDemoApp.swift`  - App entry point. It launches `ContentView`.&#x20;

`MobileNetV2.mlmodel` - The a pre-trained model from [Apple's model gallery](https://developer.apple.com/machine-learning/models/) that recognizes **1,000 categories** of objects (dogs, cars, fruit, instruments, etc.) — the standard ImageNet classes.

* When you click on it in Xcode, you can see its inputs (a 224×224 image) and outputs (a label + confidence). You **don't write any code for this file** — Xcode auto-generates a class called `MobileNetV2` so Swift can use it.

### `Classifier.swift` — the ML logic

This is the file that actually runs the model. It's is only aobut 20 lines of code, here is the skeleton structure:

```swift
@Observable
class Classifier {
    var result: String?
    var confidence: Float?

    public func detect(ciImage: CIImage) {
        // 1. Load the model
        // 2. Build a Vision request
        // 3. Run the request on the image
        // 4. Read the top result
    }
}
```

1. **Load the model**

```swift
let model = try? VNCoreMLModel(for: MobileNetV2(...).model)
```

Wraps the auto-generated `MobileNetV2` class in a `VNCoreMLModel` so the Vision framework can use it.

2. **Build a request**

```swift
let request = VNCoreMLRequest(model: model)
let handler = VNImageRequestHandler(ciImage: ciImage)
```

* A `VNCoreMLRequest` says *"I want to run this model."*&#x20;
* A `VNImageRequestHandler` says *"…on this specific image."*&#x20;
* It is separated so that you may run request on the same image in the case you want to do that

3. **Run it**

```swift
try? handler.perform([request])
```

* This line of code runs the inference. It is synchronous but fast (usually < 100 ms on a newer iPhone).

4. **Read the result**

```swift
let results = request.results as? [VNClassificationObservation]
if let firstResult = results.first {
    confidence = firstResult.confidence
    result = firstResult.identifier
}
```

The model returns a sorted list of guesses, we grab the best guess  — its label "`identifier"` and how sure it is, "`confidence"`, a 0–1 float.

### `ContentView.swift` — the UI

**State**

```swift
@State private var selectedImage: PhotosPickerItem? = nil
@State private var selectedImageData: Data? = nil
var classifier = Classifier()
```

Tracks what the user picked and holds an instance of our `Classifier`.

**The image / placeholder**

In the file, we have a  `ZStack` that shows either the picked image or a "Select image…" prompt if nothing is picked yet.

**The result overlay**

When `classifier.result` isn't `nil`, a panel appears at the top showing the labels and confidence. Because `Classifier` is `@Observable`, this panel appears automatically as soon as the model finishes.

**The picker button (in the toolbar)**

```swift
PhotosPicker(...)
    .onChange(of: selectedImage) { oldItem, newItem in
        Task {
            if let data = try? await newItem?.loadTransferable(type: Data.self) {
                selectedImageData = data
                classifier.detect(ciImage: CIImage(data: data)!)
            }
        }
    }
```

This is the link between the UI and ML logic code.&#x20;

When the user picks a photo: we tell our phone to load the image data → store it for display → hand it to the classifier. The `Task { … await … }` is async because loading photo data from the library may take a moment.<br>

</details>

<details>

<summary><strong>Limitation on MobileNetV2</strong></summary>

It's worth mentioning what this model can't do:

* This model has only 1000 fixed classes, it does not have class for things like "phone screen", "whiteboard", or anything outside [ImageNet](https://www.image-net.org/)
* Can not give you a good guess for something outside of the image net, it will pick the closest of 1,000 classes
* It only gives you the best guess of the image, but only one label, so if your image is "a person and a dog", then it will give you only either a dog or a person. To have multiple label, you may check out [this project](https://developer.apple.com/documentation/vision/recognizing-objects-in-live-capture).
* Only Image classification, this model does not classify video, or live camera.

If any of those limits matter for your app, you may train your own model with Create ML, or pick a different model from [Apple's model gallery](https://developer.apple.com/machine-learning/models/).

</details>

<details>

<summary><strong>Training your own model with Create ML</strong></summary>

MobileNetV2 only knows the 1,000 ImageNet classes. If you want to recognize, for example

* **difference between a golden retriever and your specific dog**,&#x20;
* or **good vs. burnt cookies**, you need your own model.

That's where **Create ML** comes in. It is bundled with Xcode.&#x20;

* You can use use **Spotlight Search** (Cmd + Space) and type "Create ML"

**Steps:**

1. Open Create ML → choose **Image Classification**.
2. Drop in folders of training images, one folder per category.
3. Click **Train**. (it may take couple minutes, but it should not take hours on a Mac.)
4. Export the resulting `.mlmodel` file
5. Drag it into Xcode and use it in code exactly like we use `MobileNetV2` in the above demo.

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ios-course.cornellappdev.com/course-content/machine-learning-on-ios.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
