Agents are
dynamical systems

We study how AI agents behave under uncertainty and build the systems that make that behavior calibrated, verifiable, and trainable.

Read our research

Research

OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence

Four frontier models achieve 94% alert detection but 45-82% false positive containment rates. The calibration gap is not in detection -- it is in restraint.

arXiv:2601.21083Jan 2026

Surprisal-Guided Selection: Compute-Optimal Test-Time Strategies for Execution-Grounded Code Generation

Test-time training degrades to below random sampling (equivalent K < 1). Selecting by surprisal -- the model's least confident correct solutions -- matches oracle performance from existing logprobs.

PreprintFeb 2026