Behavioral infrastructure for the supervised-agent era

Your AI ignores you.
Calx fixes that.

Your corrections don’t stick. Your agents make the same mistakes. Your team is building the same internal harness three times over.

Calx is the behavioral control layer for supervised AI agents. We find the recurring corrections in your agent workflows, compile them into runtime enforcement, and give you the evidence back.

0 / 9vs9 / 9text rules vs compiled rules.
Fisher’s exact, p = 0.002 across 6 architectural classes.

Corrections become structural rules.

Runtime enforcement, not another prompt file. A real rule, a real violation, a real intercept.

Prompts and memory improve what the agent knows.
Not what it does.

Everyone is building on the information plane. There is no compiler between the two planes. Calx fixes that.

Information plane
What the agent knows
Memory (Mem0, Letta, Zep). Retrieval (RAG, vector stores). Context (CLAUDE.md, prompts, system cards).
Calx integrates here. We do not build here.
The gap · no compiler
Behavioral plane
What the agent does
Governance (correction lifecycle). Harness (Calx Tether). Compiled rules enforced at runtime, outside the context window.
Calx builds here.
Better prompts, more memory, bigger context windows. None of these compile into what the agent does. Calx runs the correction-to-enforcement loop in the harness the agent actually executes in.
p = 0.002
Fisher’s exact · 6 architectural classes
The Compiler Gap (Hardwick 2026). 100+ sources · 70+ practitioner reports · 9 academic papers.

Give us one agent workflow. We’ll find what humans keep correcting.

A Correction Audit is the low-friction entry. We inspect your agent’s corrections, identify recurrence clusters, and deliver a behavioral control report that shows what compiles into runtime enforcement and what stays process.

  • The top recurring correction classes your team keeps paying for
  • Which corrections are architectural and can compile. Which are process and cannot.
  • A concrete enforcement plan: which rules become Tether hooks, which become review gates
  • A baseline recurrence metric so pilot impact is measurable before we start
Currently running with engineering firm (9-agent fleet), named enterprise account (recruiting, $4K early access), and two additional design partners. Inbound only. No outbound yet.
formatAsync + 30-min read-out
duration2 weeks · typical
accessOne workflow · read-only
deliverableBehavioral control report · PDF + walkthrough
next stepScoped pilot or design partnership
who runs itSpencer, founder

One compiler. Three product layers.

Tether enforces. Bench captures. Serve compiles. Model-agnostic via LiteLLM.

Tether
The enforcement primitive
Runtime middleware between the agent and its tools.
  • Tool veto · Block tool calls at the harness level
  • Response review · Check output against rules before delivery
  • Injection defense · Enforced outside the context window
See Tether →
Bench
Control surface
Where supervisors work with agents and corrections get captured.

Not a chat app. Not a workspace. The surface where correction becomes visible, fast, and enforceable.

See Bench →
Serve
Compilation engine
Correction lifecycle, recurrence detection, rule promotion.

The brain of the system. Proprietary pipeline. The moat compounds per operator, per session.

Read the paper →

Three papers. 100+ sources. Built with itself.

Calx is an applied science company that ships software. The product is the research made durable.

Paper 1 · The Behavioral Plane
151
Corrections captured across 43 days, 8 operators. 237 rules transferred, 44 new. The foundational dataset.
Paper 2 · Stickiness Without Resistance
3–4
Compilation threshold. Architectural corrections converge at zero recurrence. Extends Szulanski (1996) for machine learners.
Paper 3 · The Compiler Gap
p=0.002
Fisher’s exact across 6 architectural classes. 9 academic papers, 70+ practitioner reports. Independent convergence from Meta Superintelligence Labs.

Behavioral infrastructure
for the supervised-agent era.

Runtime governance is forming as a category. Memory remembers. Observability records. Guardrails block. Calx is the adaptive correction loop inside the harness that turns human judgment into runtime behavior. The specialists get there first.