Make science programmable.
Dynamical Systems is a research lab building the experience layer for physical science. We turn experiments, simulations, instrument records, failures, and expert decisions into reusable evidence.
We are building toward a world where the labs that already exist compound like science factories, and the next physical decision is better because the system remembers what happened. That evidence compresses the path from candidate to deployable capability, and it is what the next generation of scientific agents will be trained on.
Open roles
In person in New York City.
Founding Research Scientist, Materials and AI
You turn the way experts read experimental data, what they trust, what they reject, and when they stop, into the rubrics and evidence standards our agents are held to. You help set the scientific direction of the company alongside the founder.
Science
Founding Research Scientist, Materials and AI
You turn the way experts read experimental data, what they trust, what they reject, and when they stop, into the rubrics and evidence standards our agents are held to. You help set the scientific direction of the company alongside the founder.
What you'll do
- Translate how materials scientists reason over experimental data into the rubrics and evaluations agents are measured against, across synthesis, characterization, and qualification workflows.
- Decide what is worth measuring, what counts as proof, and where today's models fail when they touch physical evidence.
- Audit how agents reason over evidence, find where they cut corners, and turn that into the standard of proof the whole system is held to.
- Trace delayed physical outcomes, including failures and nonformations, back to the decisions that caused them, and define what good judgment looks like across a campaign.
- Set the scientific agenda and the research bar.
The hard problems you'll work on
- What makes a scientific decision valid? When is a passing score real, and when is it only rubric-shaped?
- How do you encode the judgment of an expert materials scientist, across workflows, into something an agent can be trained and measured against?
- How do delayed physical outcomes assign credit back across corpus, candidate, protocol, simulator, process parameters, instrument state, and interpretation?
What we look for
- A PhD or equivalent depth in materials, chemistry, physics, scientific ML, or a related field, with hands-on lab or simulation experience.
- Worked at the boundary of a science domain and machine learning, and can tell a convenient result from a trustworthy one.
- Fluent in how working scientists make calls across a campaign, and able to turn that judgment into something a system can use.
- Want to own a scientific direction from the start.
Founding Platform Engineer
You make the full loop run as a system. You build the platform that turns historical experimental data into evidence, environments, and trained agents, then carries the result back into the next decision. You own how scientists and agents work over the same physical evidence, and you help define what that interface should be.
Platform
Founding Platform Engineer
You make the full loop run as a system. You build the platform that turns historical experimental data into evidence, environments, and trained agents, then carries the result back into the next decision. You own how scientists and agents work over the same physical evidence, and you help define what that interface should be.
What you'll do
- Carry the full loop from ingesting historical experimental data, through compiling it into machine-readable evidence and running evals and environments, to feeding results back into training and the next decision.
- Build the product where scientists and agents plan, run, inspect, and decide together, with traces, evidence packets, eval runs, review loops, and direct-manipulation interfaces over the evidence.
- Own the data models and provenance, the sample lineage, verifier outcomes, and artifacts, so every experiment leaves a record the system can learn from.
- Sit with the people running campaigns, find where their attention is lost, and build for that gap.
- Own product and platform engineering end to end, from data model and APIs to interface and deployment, and shape it with the founder.
The hard problems you'll work on
- What is the right structured object for an encounter with reality, one that holds intent beside execution, sample beside process, and measurement beside the uncertainty it changed?
- How do you make that loop reliable across heterogeneous instruments, labs, and data?
- How do you make expert judgment durable, so a correction becomes versioned signal the system can reuse?
What we look for
- Built serious software in ambiguous domains like research platforms, data systems, ML infrastructure, developer tools, or scientific software where correctness and usability both mattered.
- Owned a system end to end, from data model and APIs to interface, deployment, and observability.
- Want to build a product and platform from scratch and own it.
- A bias toward simple systems other people can build on.
Founding Research Engineer, Evals and Environments
You own how we measure scientific agents. The evals and the environments are yours end to end. You turn physical workflow history into the environments we train and test agents against, and the evaluations that tell us whether their judgment holds.
Evals
Founding Research Engineer, Evals and Environments
You own how we measure scientific agents. The evals and the environments are yours end to end. You turn physical workflow history into the environments we train and test agents against, and the evaluations that tell us whether their judgment holds.
What you'll do
- Own the evals end to end, the harnesses, baselines, metrics, and the offline loop that qualifies an agent's judgment before it touches a live instrument.
- Build long-horizon environments where agents inspect physical evidence, call tools and models, make decisions, and get scored against what reality showed.
- Design the state, actions, rewards, verifiers, and visibility rules that turn a messy scientific workflow into something an agent can be trained and evaluated on.
- Turn dead ends, simulator misses, and expert corrections into harder tasks, hard negatives, and training data, and decide what the environment should teach next.
- Work directly with the founder, who has lived this problem, and own the eval and environment surface as the team grows.
The hard problems you'll work on
- How do workflow traces become reusable experience, replay episodes, and evaluation environments?
- How does experiment-as-code become a stable RL environment with typed state, actions, rewards, uncertainty, and hard negatives?
- Can an agent reuse prior physical evidence without leaking the hidden answer that often lives in the same record?
What we look for
- Built RL environments, agent eval harnesses, tool-use systems, simulators, or benchmark suites that had to survive messy data.
- Trained or post-trained LLMs with RL, or designed reward and verifier contracts with human or model trainers.
- Comfortable taking an ambiguous surface and turning it into clean abstractions.
- Want to own evaluation and environment design as a whole.
Reach out.
Email us with the role you are closest to and 1 or 2 things you have built.
A paper, repo, dataset, product, lab system, or a failure you learned from tells us more than a resume.
These are among our first research and engineering hires. They carry founding equity, and we discuss cash compensation on the first call.