Disarray builds AI research agents for long-horizon autonomy over heterogeneous data
Disarray tackles two key challenges in AI research agents:
Generating stronger hypotheses for efficient experimentation, and
Avoiding the failure modes that derail long-running agent loops.
A key insight in Disarray’s approach is that success comes from better context, not more context. Disarray’s core differentiator is a context graph that captures rich lineage across data, code, execution history, and documentation, enabling precise retrieval of the most relevant information and multi-hop analysis that reveals novel insights from prior work.
Disarray agents manage the entire end-to-end research loop: translating high-level goals, finding relevant data, forming hypotheses, writing and running code, interpreting partial or failed results, and deciding what to try next. By giving agents more targeted context, higher-order understanding, and intelligent harnesses, Disarray agents form better hypotheses, stay focused over time, and ultimately deliver breakthrough results rapidly in long-horizon research.
Validated in autonomous ML experimentation
Disarray agents earned 28 Kaggle competition medals in fully autonomous runs limited to 24 hours and a single GPU per competition. Medals include wins in vision, NLP, tabular, and object-detection tasks, including nine top-10 finishes and one result that outperformed all human teams.
ML model development is Disarray’s first product use case, but the underlying capabilities extend to a much broader class of open-ended research problems.
Toward recursive self-improvement
Dynamic self-improvement of the harness and planning modules are active areas of research at Disarray. Today, agent outcomes and user interactions are used to self-heal the context graph. The instrumentation that runs research today forms the foundation for recursive self-improvement tomorrow.
Every time a Disarray agent runs an experiment, it generates a detailed trace: what context it retrieved, what hypothesis it formed, what code it executed, what errors it hit, how it recovered, and the ultimate result. These high-fidelity execution traces serve as the precise training data for improving all components of Disarray.
Follow our work
Follow our work on context graphs, long-running agent harnesses, autonomous experimentation, and the journey towards recursive self-improvement for AI systems.
Stay Updated
Join us
We’ve built a strong foundation and proven what’s possible, but the most interesting problems are still ahead. If you’re naturally curious, energized by tough challenges, and want to push the boundaries of automation in AI development, come join us! Reach us at careers@disarray.ai.
