Ring Sizer, Sub-Millimeter Finger Measurement via Computer Vision

This is a case study in building a local, single-image measurement pipeline with sub-millimeter calibration.

🔗 GitHub: github.com/fengfengxie/ring-sizer
🚀 Live Demo: huggingface.co/spaces/feng-x/ring-sizer

1. Problem Statement

Buying a ring online requires knowing your ring size, but most people don’t. Professional sizing tools (ring sizing kit, sizing strips) are not always accessible. The question: can a single phone photo, with a common reference object, give a reliable ring size recommendation?

The core challenge is deceptively simple: measure the width of a finger in an image. In practice, it involves:

Scale calibration — converting pixels to real-world units
Anatomical segmentation — isolating a specific finger from the hand
Precise edge detection — sub-pixel localization of finger boundaries
Systematic bias correction — CV measurements consistently overestimate
Size mapping — translating a physical diameter to a discrete ring size

This project tackles all five, resulting in a fully local pipeline that achieves ±0.5 mm diameter accuracy and ±1 ring size recommendation under controlled conditions.

2. System Overview

How It Works

User places their hand next to a standard credit card on a flat surface
A single top-down photo is taken (phone camera, with flash on)
The pipeline detects the card for scale, segments the finger, measures its width, applies calibration, and recommends a ring size

Architecture

The system runs as a 9-phase sequential pipeline:

Image → Quality Check → Card Detection → Hand Segmentation → Contour Extraction
     → Axis Estimation → Zone Localization → Width Measurement → Confidence Score
     → [Calibration] → [Ring Size] → JSON + Overlay PNG

All processing is local (no cloud). The stack is Python 3 with OpenCV, NumPy, MediaPipe, and SciPy.

Key Design Decisions

Decision	Choice	Rationale
Scale reference	Credit card (ISO 7810)	Universally available, known dimensions (85.60 × 53.98 mm)
Hand detection	MediaPipe Hands	Pretrained, 21 landmarks, no custom training needed
Measurement zone	15–25% of finger length from palm	Standard ring-wearing position, avoids knuckle
Width aggregation	Median of 20 cross-sections	Robust to outlier intersections
Edge detection	Sobel gradients with sub-pixel refinement	<0.5px precision vs ~2px for contour-based
Bias correction	Linear regression on ground truth	Systematic error, not random — ideal for regression
Size mapping	Nearest-match lookup with 2-size range	More robust than regression for n=29

3. Technical Deep Dive

3.1 Scale Calibration

The credit card serves as a known-size reference. Detection uses:

Adaptive thresholding + contour detection to find rectangular shapes
Aspect ratio validation (must be ≈1.586, the ISO standard)
Perspective correction to handle angled views
Scale factor computation: px_per_cm = detected_long_edge_px / 8.56

Stability result: Across 20 images from a fixed tripod, the detected scale was 128.48 ± 0.57 px/cm (CV = 0.44%) — extremely consistent.

3.2 Finger Segmentation

MediaPipe provides 21 hand landmarks per hand, including 4 per finger (MCP, PIP, DIP, TIP). The pipeline:

Detects the hand and all landmarks
Uses wrist → fingertip vector to determine orientation
Rotates the image to canonical orientation (wrist at bottom)
Generates a binary mask of the target finger using landmark-guided region growing
Cleans the mask (morphological operations, area filtering)

The user can specify which finger to measure (--finger-index index|middle|ring), defaulting to the index finger.

3.3 Edge Detection: From Contour to Sobel

v0 (Contour-based): Extract the outer contour of the finger mask, then find intersections with perpendicular cross-section lines. Simple but limited to pixel-level precision (~2px resolution).

v1 (Sobel refinement): A significant accuracy improvement:

ROI extraction around the ring-wearing zone with padding
Bidirectional Sobel filtering — detects both dark→bright and bright→dark gradients, handling any background
Mask-constrained edge search — starts from the finger mask boundary (anatomical prior), then searches ±10px for the strongest gradient (sub-pixel precision)
Parabola fitting for sub-pixel localization: samples gradient at {x-1, x, x+1}, fits f(x) = ax² + bx + c, peak at x = -b/(2a). Achieves <0.5px precision (~0.03 mm at typical resolution)
Outlier filtering using Median Absolute Deviation (MAD) — removes measurements >3 MAD from median

The system uses auto mode by default: attempts Sobel, validates quality (gradient strength, consistency >50%, width reasonableness), and falls back to contour if quality is insufficient.

3.4 Calibration: Correcting Systematic Bias

Even with sub-pixel edge detection, the pipeline systematically overestimates finger width. This is expected — CV measures the outermost visible edge (including some skin reflection/glow), while a caliper measures the physical boundary.

Data collection: 10 subjects × 3 fingers × 2 photos = 60 measurements, with caliper ground truth.

Key finding: The over-measurement is systematic (+8.8% mean), not random. Shot-to-shot noise (0.028 cm) is much smaller than the bias (0.158 cm), making linear regression ideal.

Calibration model:

actual_diameter = 0.7921 × measured_diameter + 0.2503

Derived via ordinary least squares on 60 paired measurements. Leave-one-person-out cross-validation confirms generalization:

Metric	Before Calibration	After Calibration	Improvement
MAE	0.158 cm (1.58 mm)	0.060 cm (0.60 mm)	62%
RMSE	0.176 cm	0.075 cm	57%
Max error	0.347 cm	0.174 cm	50%

The calibration coefficients are stored in src/calibration.json and applied automatically as a post-processing step.

3.5 Ring Size Mapping

The final step maps calibrated diameter to China standard ring sizes (sizes 6–13, inner diameters 16.9–22.7 mm).

The fundamental challenge: finger diameter at the wearing zone does not perfectly predict ring size. Ring sizing depends on the knuckle (which the ring slides over), soft tissue compressibility, and personal preference. Even ground-truth caliper measurements deviate from the size chart by ±0.73 mm.

Method: Nearest-match lookup with 2-size range.

calibrated_width = 18.81 mm
→ Nearest size: 8 (18.6 mm, Δ=0.21)
→ Second nearest: 9 (19.4 mm, Δ=0.59)
→ Output: "Best match 8, recommended 8-9"

Validation (29 person-finger pairs with known ring sizes):

Finger	N	Pearson r	Exact Match	2-Size Range Hit
Index	10	0.919	60%	90%
Middle	10	0.955	50%	70%
Ring	9	0.787	44%	78%
All	29	0.863	52%	79%

The index finger is the most reliable (90% range hit rate). Ring finger is the least reliable due to higher cross-section variability.

4. Confidence Scoring

Every measurement includes a confidence score (0–1) computed from four weighted components:

Component	Weight	What It Measures
Card detection	25%	Aspect ratio match, corner quality, scale reasonableness
Finger segmentation	25%	Mask area, contour quality, landmark presence
Edge quality	20%	Gradient strength, consistency, smoothness, symmetry
Measurement	30%	Width variance, sample count, outlier ratio

Confidence levels: HIGH (>0.85), MEDIUM (0.6–0.85), LOW (<0.6). The system warns when confidence falls below a configurable threshold (default 0.7).

5. Input & Output Details

JSON Result

{
  "finger_outer_diameter_cm": 1.78,
  "confidence": 0.91,
  "scale_px_per_cm": 128.03,
  "ring_size": {
    "best_match": 8,
    "range_min": 8,
    "range_max": 9,
    "diameter_mm": 17.80
  },
  "raw_diameter_cm": 1.92,
  "calibration_applied": true,
  "edge_method_used": "sobel",
  "fail_reason": null
}

Visual Overlay

Every measurement produces an annotated PNG showing:

Credit card contour and detected corners
Finger contour, axis, and ring-wearing zone
Individual edge detection points (left/right)
Width measurement lines
Calibrated diameter, ring size recommendation, and confidence

Input Requirements

Resolution: 1080p or higher
View angle: near top-down
One hand with fingers extended
Credit card visible on the same plane
Good lighting (flash recommended to eliminate shadows)

6. Lessons Learned

1. Systematic bias dominates random noise. The pipeline’s ±1.6 mm error was 85% bias, 15% noise. A simple linear regression removed most of it. The takeaway: before adding complexity (better models, more features), check if the error is systematic.

2. Ground truth data is the highest-ROI investment. Collecting 10 people’s caliper measurements (a few hours of work) enabled both calibration and validation. Without it, we’d be guessing at accuracy.

3. Sub-pixel edge detection matters less than calibration. v1’s Sobel refinement improved theoretical precision from ~2px to <0.5px. But the calibration in v2 reduced actual error by 3× more. Precision without accuracy is misleading.

4. Ring size ≠ finger diameter. The wearing-zone diameter is a proxy, not a direct predictor. Knuckle size, tissue compressibility, and personal preference all influence the right size. Outputting a range rather than a single number is both more honest and more useful.

5. Controlled conditions matter enormously. Our ±0.5 mm accuracy is achieved with a tripod, flash, white background, and co-planar card. Real-world phone photos will perform worse. Being explicit about conditions prevents overpromising.

7. Limitations

Sample size: n=30 (10 subjects × 3 fingers). Sufficient for detecting the systematic bias pattern but not for fine-grained per-finger models.
Demographics: All subjects are from one geographic region. Finger morphology varies across populations.
Controlled setup: Accuracy claims assume tripod, flash, white background, co-planar card. Hand-held photos at arbitrary angles will degrade performance.
Single-view measurement: A top-down photo captures width but not depth. The finger cross-section is elliptical, and we only measure the major axis.
Size chart: Currently China standard only (sizes 6–13). Other standards (US, EU, UK) would need additional lookup tables.

8. Accuracy Summary

Metric	Value	Condition
Diameter MAE	0.60 mm	Calibrated, controlled setup, n=30
Diameter RMSE	0.75 mm	Same
Diameter max error	1.74 mm	Same
Ring size exact match	52%	All fingers
Ring size ±1 range hit	79%	All fingers
Ring size ±1 (index only)	90%	Best-performing finger
Card detection CV	0.44%	Scale stability across 20 images
Shot-to-shot repeatability	0.28 mm	Same subject, two photos

Bottom line: Under controlled conditions, the system measures finger diameter to ±0.5 mm and recommends ring size to ±1 size, with the index finger being the most reliable target.