Ring Sizer Calibration Report, Systematic Bias Correction via Linear Regression

2026/03/06

1. Objective

The CV pipeline consistently over-estimates finger diameter compared to caliper ground truth. This study:

  1. Quantifies the systematic bias using a controlled dataset
  2. Validates pipeline stability (card detection, repeatability)
  3. Builds a linear regression calibration model
  4. Cross-validates to ensure generalization

2. Dataset

Collection Protocol

Ground Truth

Diameter Range


3. Card Detection Stability

All images were taken from a fixed tripod with the same card. The detected scale factor (px/cm) should be nearly identical.

MetricValue
Mean px/cm128.48
Std dev0.57
CV%0.44%
Range127.28 – 129.51
Max spread2.23 px/cm (1.74%)

Verdict: Card detection is highly stable. The ±0.44% variation is negligible (contributes < 0.01 cm to measurement uncertainty).


4. A vs B Repeatability

Each person was photographed twice (shots A and B). Comparing the same person×finger across the two shots quantifies pipeline noise independent of the person.

MetricValue
Mean |A−B|0.028 cm
Std |A−B|0.028 cm
Max |A−B|0.127 cm
95th percentile0.074 cm

Verdict: The pipeline is highly reproducible. Mean shot-to-shot variation of 0.028 cm is well below the systematic bias of 0.158 cm, confirming the over-measurement is a consistent bias, not random noise.


5. Ground Truth Consistency

Cross-checking caliper diameter (D) against tape circumference (C): for a perfect circle, C = πD. Deviations indicate finger cross-section ovality.

MetricValue
Mean (C − πD)+0.228 cm
Range−0.19 to +0.92 cm

Fingers are not perfect circles — they are slightly oval, so circumference exceeds πD. This is expected and does not indicate measurement error. Some large outliers (S02: +0.91 cm) suggest either measurement imprecision with the tape or particularly oval finger cross-sections.


6. Raw Accuracy (Before Calibration)

MetricValue
N60
Mean error (CV − GT)+0.158 cm
Median error+0.138 cm
Std of error0.078 cm
Mean % error+8.8%
MAE0.158 cm
Max absolute error0.347 cm
RMSE0.176 cm
Pearson r0.883
0.779

Key observation: All 60 measurements over-estimate (only 1 near zero at −0.008 cm). This is a systematic bias, ideal for linear correction.


7. Linear Regression Calibration

Model

actual_diameter = 0.7921 × measured_diameter + 0.2503

Equivalently: the pipeline over-measures by roughly 20%, and the regression corrects both the slope and offset.

After Calibration (In-Sample)

MetricBeforeAfterImprovement
MAE0.158 cm0.057 cm64% ↓
RMSE0.176 cm0.070 cm60% ↓
Max error0.347 cm0.174 cm50% ↓
Mean error+0.158 cm~0.000 cm

Calibration Plots

Calibration Analysis Left: CV measured vs actual diameter with regression line. Center: Error distribution before (blue) and after (orange) calibration. Right: Residuals after calibration — no strong pattern, confirming linear model is appropriate.


8. Cross-Validation (Leave-One-Person-Out)

To estimate real-world performance on unseen subjects, we perform 10-fold cross-validation where each fold holds out all 6 measurements from one person.

Holdout PersonNSlopeInterceptRaw MAECal MAECal Max
S0160.8046+0.22680.2230.0350.103
S0260.8189+0.20870.2710.1110.146
S0360.7824+0.27110.1310.0370.056
S0460.7726+0.28050.1030.0800.094
S0560.7916+0.25050.1710.0780.156
S0660.8149+0.20290.1120.0500.133
S0760.8120+0.20770.1060.0360.094
S0860.7591+0.31750.1240.0420.080
S0960.7793+0.27810.1980.0870.176
S1060.7845+0.26200.1380.0450.119

Cross-Validated Summary

MetricRawCalibratedImprovement
MAE0.158 cm0.060 cm62% ↓
RMSE0.176 cm0.075 cm57% ↓

Verdict: The calibration generalizes well. Worst-case holdout (S02) has Cal MAE = 0.111 cm, still a large improvement from Raw MAE = 0.271 cm. Regression coefficients are stable across folds (slope 0.759–0.819), indicating the model is robust.


9. Regression Coefficient Stability

Across the 10 CV folds:

The narrow range confirms that no single person dominates the fit — the calibration is dataset-stable.


10. Limitations & Future Work

  1. Sample size: 10 subjects (60 measurements) is adequate for a linear model but small for detecting non-linear patterns. More data could improve robustness.
  2. Demographic coverage: All subjects are Chinese adults. Calibration may need re-fitting for significantly different hand morphologies.
  3. Setup dependency: Calibration was derived from tripod + flash images. Handheld or variable-lighting conditions may introduce additional bias.
  4. Single camera: All images from one iPhone model. Different cameras/lenses may shift the bias.
  5. Outliers: S09 and S05 ring fingers show ~20% raw error, possibly due to knuckle prominence or finger positioning. These are the hardest cases even after calibration.

11. Conclusion

The regression calibration reduces measurement error by 62% (MAE: 0.158 → 0.060 cm) and generalizes well across subjects in leave-one-person-out cross-validation.

Final calibration model:

actual_diameter = 0.7921 × measured_diameter + 0.2503

Stored in src/calibration.json and applied automatically (bypass with --no-calibration).