Building Femometer's Health OS Engine

This case study documents our journey from a rule-based insight system to an LLM-powered Health OS Engine. We preserved deterministic physiological models as the foundation while introducing LLMs as a higher-level reasoning and narrative layer—transforming fragmented insights into unified, scalable intelligence.

Background

Femometer’s intelligence system started as a rule-based analysis framework built on years of domain knowledge in women’s reproductive health.

The system successfully translated physiological data—BBT, LH tests, symptoms, cycle events,—into actionable insights for users.

However, as the product evolved and new sensing capabilities were introduced (smart ring, third-party devices, voice based emotion context, richer behavioral data), several structural limitations became increasingly clear:

Insights were feature-centric, not state-centric
Multiple modules generated interpretations independently
Conflicting or fragmented messages occasionally surfaced across the app
Adding new devices or data sources increased complexity exponentially

At this stage, the problem was no longer accuracy of individual rules, but system-level coherence and extensibility.

This realization led to a fundamental redesign of Femometer’s intelligence architecture.

The Original Architecture: Rule-Based Insight Retrieval

The original intelligence pipeline followed a classic deterministic pattern:

→ Data Acquisition
→ State & Tag Calculation (rule-based)
→ Insight Generation (template retrieval)
→ UI Presentation

Key characteristics:

Core physiological understanding was deterministic and validated
Health states (e.g., ovulation, fertile window, luteal phase) were already computed
Tags (e.g., BBT curve types, anomaly flags) were derived for interpretation
Insights were generated by fixed mappings between <state, tag> pairs and prewritten text templates

This approach worked well at early scale, but it introduced systemic issues:

Interpretation logic was distributed across modules
No single “source of truth” existed for the user’s overall health state
Narrative consistency was hard to maintain
The system was difficult to extend beyond predefined scenarios

Reframing the Core Problem

The key architectural insight was this:

The problem was not the calculation layer — it was the interpretation layer.

Femometer already had a strong deterministic understanding of reproductive physiology.

What was missing was a unified interpretation engine capable of:

Synthesizing multiple states and signals
Resolving conflicts across modules
Expressing uncertainty and confidence clearly
Scaling gracefully with new devices and data sources

This led to the concept of a Health OS Engine.

Health OS Architecture Design

Redesigned System Architecture of the Femometer Health OS:

→ Data Acquisition
→ Deterministic Health State Engine (unchanged, but strengthened)
→ LLM-Based Interpretation & Orchestration (newly added)
	→ Context Modeling Layer
	→ Interpretation Policy & Guardrails Layer
	→ Structured Output Contract Layer
→ UI Presentation

1. Deterministic Health State Engine

This layer remains rule-based and testable:

Computes canonical health states (cycle phase, ovulation status, fertility window)
Aggregates evidence from multiple sources (ring, LH tests, user input)
Tracks confidence, provenance, and historical deltas
Serves as the single source of physiological truth

This ensures:

Medical safety
Reproducibility
Auditability
Clear debugging paths

LLMs are not used here.

2. LLM-Based Interpretation & Orchestration

Instead of directly mapping states to static templates, Femometer introduces a policy-guided LLM orchestration layer responsible for interpretation.

This layer is not “prompt engineering” in the casual sense, but a structured interface composed of three parts:

a. Context Modeling Layer

Converts internal health states, evidence, confidence, and deltas into structured semantic context
Integrates data from in-house devices, third-party devices, and manual input
Shields the model from raw, noisy data

b. Interpretation Policy & Guardrails Layer

Defines how health information may be interpreted and expressed
Enforces safety boundaries and uncertainty handling
Encodes product tone, user sensitivity, and medical constraints
Ensures consistency across Home, Analysis, and Notifications

c. Structured Output Contract Layer

Forces deterministic, UI-safe outputs (e.g., headline, summary bullets, confidence phrasing, next actions)
Enables testing, regression detection, and model swapping
Prevents free-form narrative drift

Together, these components form a production-grade AI interpretation engine, not a chat feature.

Why This Design Scales with Future LLMs

A critical design goal was to ensure that improvements in LLM capability translate into real product improvements, not just better wording.

This architecture enables that by:

Preserving a stable, deterministic health state core
Expanding the semantic surface area exposed to the LLM (evidence, confidence, deltas)
Allowing increasingly sophisticated cross-state reasoning as models improve

As LLMs evolve, Femometer can:

Generate deeper, more personalized understanding and interpretations
Surface higher-order insights without changing core logic
Maintain safety and correctness while improving expressiveness

The intelligence ceiling is no longer capped by static templates.

Key Takeaway

Femometer’s Health OS Engine does not replace medical logic with AI.

It replaces a rigid, template-based understanding and interpretation layer with a policy-guided AI orchestration layer that can grow more intelligent over time—while remaining grounded in physiological truth.

This shift transforms Femometer from a feature-driven health app into a state-aware, extensible health intelligence system.

Building Femometer’s Health OS Engine