Counting Squats on Apple Watch

2026/06/17

This is a case study in building a wrist-worn rep counter on Apple Watch SE 2 — and, more importantly, in the workflow that made the algorithm tractable: decoupling sensor-data recording from algorithm design so the detector could be tuned offline against real, labeled data instead of by flailing on-wrist.

The app counts bodyweight squats in real time, with a per-rep haptic, structured sets/rest, and a workout saved to Apple Health. But the interesting part is underneath: how do you reliably detect a whole-body squat from a sensor strapped to your wrist, where the arm can be still, braced on the knees, or swinging freely?


1. Problem Statement

Two goals drove the project: (1) learn how the Apple Watch SE 2 motion sensors actually behave, and (2) build a frictionless squat counter to support knee rehabilitation, where consistent, controlled reps matter.

The core challenge is deceptively simple — count squats — but the sensor placement makes it hard:

The result is a detector that hits 15/15 on a clean session and stays within ±1 across varied arm positions, plus a recording-and-replay workflow that turns algorithm tuning from an on-wrist guessing game into an offline unit test.


2. The Workflow Insight: Decouple Recording from Detection

The single most important decision was architectural, not algorithmic: you cannot tune a detector you have no data for, and you should not tune it live.

So the build order was inverted. Phase 0 was not a counter — it was a recorder. The watch app’s first job was to capture raw CMDeviceMotion into a labeled SessionTrace (you do exactly 20 squats, tag it “20”), and ship that trace off the wrist for offline analysis.

The detector itself was written as a pure function over a sample stream:

Sensor source  →  [MotionSample]  →  SquatDetector  →  [RepEvent]

Because the detector has zero dependency on live hardware, the exact same code runs in two places:

ContextInputUsed for
On-watch (live)CMDeviceMotion streamreal-time counting + haptic
On-Mac (swift test)recorded .json tracestuning + regression testing

This is the whole game. A threshold change that used to require re-strapping the watch and doing 20 squats now runs against a folder of recorded sessions in seconds:

$ swift run squat-replay traces/
trace                                truth   found     err%
------------------------------------------------------------
20260617_reps10_F524B17B.json           10      10     0.0%
20260617_reps15_7412380C.json           15      15     0.0%
------------------------------------------------------------
scored 2 trace(s)  mean accuracy 100.0%

The data pipeline

Getting traces off the watch was kept deliberately low-friction: the watch POSTs the trace as JSON to a tiny Python listener on the Mac over Wi-Fi.

Watch (record) ──▶ HTTP POST JSON ──▶ python trace_receiver.py ──▶ *.json
                                                                     │
                          swift test  /  squat-replay  ◀─────────────┘

No companion app, no cloud, no cable. The same recorder mode lives behind a #if DEBUG gate so it never ships in the customer-facing build (which boots straight into the counter), while debug builds get a mode picker to switch between Count Squats and Record Data.


3. The Sensors: What the Watch Actually Gives You

The SE 2’s relevant hardware is its IMU — an accelerometer and a gyroscope. But the detector never touches the raw readings. It uses CoreMotion’s CMDeviceMotion, the sensor-fused output, where Apple’s filter has already combined accelerometer + gyro (+ magnetometer) to estimate orientation and hand back two pre-processed signals:

The gyroscope is the quiet hero: it tracks how the watch is tilted moment-to-moment, which is what lets the fusion separate the constant 1 g of gravity from real motion even as the wrist rotates. Without it, you could not reliably remove gravity.

Sampling is configured at 50 Hz — far more than a 2–5 s/rep squat needs, but cheap and clean. Measured against recorded sessions:

MetricValue
Effective rate49.8 Hz
Sample-interval jitter (std)0.0 ms
Dropped samplesnone

The stream is essentially perfect, which means any error is the algorithm’s fault, not the sensor’s — a reassuring thing to establish early.


4. The Algorithm: From Acceleration to a Rep

The detector is a five-step pipeline per sample. The guiding idea: don’t track the arm — recover the body’s vertical motion.

4.1 Project onto the gravity axis

A squat moves your whole body down and up. That shows up in the accelerometer as a rhythmic change along the gravity axis, regardless of arm orientation. So the first step projects user acceleration onto vertical:

up_hat = normalize(gravity)            # which way is "up", in device frame
aUp    = − userAcceleration · up_hat   # scalar projection onto vertical (g)

This single dot product is what makes the detector robust to wrist position: gravity always points to true vertical, so aUp measures real-world up/down acceleration whether your arm is crossed or hanging.

4.2 Low-pass smooth

α        = 1 − exp(−dt / τ)            # τ = 0.12 s, framerate-independent
smoothed += α · (aUp − smoothed)

4.3 Integrate to velocity (the key move)

There is no velocity sensor — velocity is derived by integrating acceleration. But raw integration drifts: any tiny bias accumulates and the zero line wanders. So it’s a leaky integrator, which continuously bleeds off drift (a high-pass filter in disguise):

leak     = leakPerSecond ^ dt          # ≈ 0.6 / s
velocity = velocity · leak + smoothed · g · dt

Why bother converting to velocity at all? Because integration is itself a low-pass filter (it attenuates high frequencies by ~1/f). The acceleration trace is spiky and broadband; its integral is a smooth oscillation, one clean negative-then-positive hump per squat. The velocity curve is, quite literally, a denoised version of the acceleration — and it’s where reps become visually obvious:

Top: vertical acceleration (raw grey, smoothed blue). Bottom: integrated velocity, with descending (blue) and ascending (orange) states shaded and detected reps marked in green. Each squat is one clean down-up cycle — 15 squats, 15 green markers.

4.4 State machine over velocity

A squat is, in velocity terms: descend (negative) → bottom → ascend (positive) → back to standing. The detector walks a three-state machine with hysteresis and guards:

standing  ──(velocity < −vth)──▶  descending
descending ──(velocity > +vth)──▶ ascending
ascending  ──(velocity < +vth)──▶ standing   ⇒ emit rep

with rejection guards so noise doesn’t count:

GuardValueRejects
minDescentSpeed0.20 m/sshallow arm wiggles
minRepDuration0.8 sjitter / fast spikes
maxRepDuration6.0 sabandoned / paused reps

4.5 Zero-velocity update (ZUPT)

A classic inertial-navigation trick: when the body is confidently still (low smoothed acceleration while in the standing state), the integrated velocity should be zero, so it’s clamped to zero. Guarded to the standing state, it can never disturb an in-progress descent or ascent — it only cleans the baseline before the next rep begins.


5. The Bug That Looked Like “It Misses the First Squat”

Early on, the count was reliably one short (9/10, 14/15). The user’s intuition was “it misses the first squat.” The recorded data said otherwise.

Instrumenting each detected rep’s duration revealed the smoking gun:

descent attempts (start s, peakDown m/s, dur s, accepted):
   t=  3.4  peakDown=0.54  dur=5.47  ACCEPT
   t=  8.9  peakDown=0.58  dur=5.47  ACCEPT
   ...

Each “rep” lasted ~5.5 s — a full squat cadence, not a single ~2.5 s squat. The original completion condition was velocity ≤ 0, but the leaky integrator keeps velocity slightly positive after you stand up; it only crosses zero again when the next descent pulls it down. Consequences:

  1. Every rep completed one cycle late — the displayed count ran a full rep behind, which felt like the first squat never registered.
  2. The last rep of a set had no “next descent” to complete it — so it was silently dropped, producing the off-by-one.

The fix was a one-line change with a clear physical meaning: complete the rep at the top of the stand, when the upward push decays back below threshold, rather than waiting for a zero crossing:

- if velocity <= 0 {            // waits for the next descent
+ if velocity < velocityThreshold {   // fires at the top of the stand

Validated against the recorded sessions, this took the long session from 14/15 to a clean 15/15, removed the perceived lag (the count now ticks up as you stand), and — because it no longer depends on a following rep — reliably catches the final rep of every set.

This is the payoff of the recording-first workflow: a vague human report (“misses the first one”) became a measured root cause (“completion lags one cadence, dropping the last rep”) in one offline replay, with no watch on the wrist.


6. Results

SessionGround truthDetectedNotes
Slow set, varied tempo1515clean
Mixed arm positions109 → 9one merge in a noisy opening; within ±1
Synthetic suite (unit tests)exact9/9 passcounts, jitter rejection, depth/duration guards

The 10-rep session shows the harder case: note the positive velocity bump at t ≈ 0–3 s — that’s settling into position, before the first squat — and some baseline drift in the opening seconds where one rep merges. Everything after is cleanly separable.

Sampling is clean (49.8 Hz, zero jitter), cadence runs ~5 s/rep (12–13 reps/min — a controlled rehab pace), and the velocity signal is cleanly separable on every session recorded. The remaining ±1 cases trace to genuinely noisy openings, not a structural flaw, and sit comfortably inside the project’s stated tolerance.


7. Engineering Notes

A few details that mattered beyond the algorithm:


8. Takeaways

  1. Decoupling data capture from algorithm design is the highest-leverage move in sensor work. Recording labeled traces and replaying them offline turned a slow, irreproducible, on-wrist tuning loop into a fast unit test. Everything downstream got easier.
  2. Work in the right signal domain. Acceleration is what the sensor measures, but velocity — its integral — is where squats are obvious, because integration suppresses the broadband noise. Choosing the domain mattered more than choosing thresholds.
  3. Use the fused output and the gyro. Projecting userAcceleration onto the gravity axis is what makes a wrist sensor work for a whole-body motion, independent of how the arm is held.