Every AI tool measures the model's output. Syntonic measures the coupling: the human-and-AI as one unit.

The instrument for human-AI coupling.

Measure and optimize the quality of the working relationship between your people and your AI: the signal no output-eval can see.

The synergy isn't automatic.

The spread between −19% and +55.8% isn't the model. It's the pairing, and nothing in your stack measures the pairing. Most human-AI pairings underperform, and no tool on the market tells you whether yours is one of them.

LOCI is the engineering fix to that documented failure: one score for whether the working relationship is actually working. The bar it's built against is the published one, Complementary Team Performance: the pairing has to beat the best of either member alone, not the average. We measure that. We don't assume it.

Catch the errors neither your engineer nor your AI catches alone.

Code review is where AI errors get expensive: it passed the tests, it passed review, and it still blew up in prod. CEED (coupling-emergent error detection) instruments the code-review process itself. When a result warrants a second look, it gets one, and every flag is tracked against confirmed errors so you can see how well it's working.

"We instrument your code-review loop with a coupling layer and measure, against a pre-registered criterion, whether it catches what your stack misses. You keep the readout either way."

The criterion is registered before we start (flagged outputs wrong at ≥2x your base rate), and it's stated up front because the experiment design is the offer. No catch-rate claims here: your pilot produces your number, on your codebase, not a public leaderboard.

Run the pilot on your codebase →

LOCI

A score for the relationship, not the output.

The only composite we know of that treats the exchange, the human-and-AI as one unit, as the unit of analysis. One normalized number for coupling quality over time: on course, in sync, transferring meaning without loss.

Status: validation in progress.

ASE

The evaluator the model can't game.

The score is produced from outside the evaluated system's prompt and state, on a different model family. We built the architecture the field's own findings imply is needed. For researchers: scalable oversight, scored from outside the system prompt.

Status: robustness experiment in progress.

CEED

Errors the pairing surfaces that neither party catches alone.

Coupling measurement in practice: our flagship demonstration, deployed first on code review.

Status: design-partner pilots open.

How we're different

Output evals (the existing category)Syntonic
What it measuresThe AI's outputThe coupling: the human-and-AI as one unit
The question it answers"Is the model right?""Is the collaboration working?"
Who uses itThe AI engineering teamThe team doing the work, and its leadership

To our knowledge, no commercial product treats the human-AI pair as the unit of analysis.

Research roadmap.

Results ship when experiments land, not before.

  1. Patent-pending instrumentation

    DONE June 2026

    Patent pending (nonprovisional submitted for filing, June 2026).

  2. Token-savings RCT

    IN PROGRESS updated 2026-06-10

    A pre-registered randomized comparison: equal task-success at fewer tokens-to-resolution, against a strong published baseline class. Numbers ship when the study lands.

  3. LOCI convergent-validity study

    IN PROGRESS updated 2026-06-10

    Ship bar: agreement with human raters at Cohen's kappa ≥ 0.6. Strong public claims reserved for ≥ 0.8. The protocol publishes first.

  4. arXiv preprint

    COMING updated 2026-06-10

    Preprint: coming.

On the roadmap.

  • ASE API + LOCI scoring API In validation, not yet available.
  • Per-seat code-review product Coming.
  • For individuals Coming.

Run the pilot on your codebase.

Book a design-partner call

Have us build yours, and see it proving its payoff.

Work with us