AI-Native Intelligence · Essay · 16 min read
The Principal and the Swarm
AI agents make it free to build intelligence systems no one would have funded, staffed, or subscribed to before.
One system. Eight experiments. A map of what is possible now, and why the scarce resource is shifting upward from execution toward judgment.
In this essay
- 01 The shift Why cheap execution creates new intelligence, not just faster intelligence.
- 02 The operating system One principal, many specialists, and a reusable harness that keeps getting cheaper to run.
- 03 Eight probes into AI-native intelligence From consumer simulation and local market intelligence to trading systems and research experiments.
- 04 Where this goes next Why the real pattern is reusability, and why some of these experiments deserve deeper follow-on essays.
Who this is for: people in finance and operations who want to see what is actually possible now, and builders who want to see where the leverage is.
The Shift
The bigger shift is not speed alone. It is the ability to build forms of intelligence that would never have justified a team, a budget, or a software product before.
Everyone is focused on faster reports, cheaper analysis, and broader screening. That is real, and it is already commoditizing. The bigger shift is new intelligence: questions that used to be uneconomic are suddenly buildable.
The shift
Not just faster diligence. New diligence.
The important change is not speed alone. It is the ability to ask questions that would never have justified a team, a budget, or a software product before.
What matters now
Judgment. Agency. Ability to drive execution.
As execution gets cheap, the scarce resource shifts upward. The edge is framing the problem, picking the right lane, and getting the work all the way through.
One system. Eight experiments. A map of what is possible now.
Part 1 shows the operating system. Part 2 shows the experiments it produced, from Chicago permit data to consumer market simulation to research work.
I have spent the last several months building these systems, and one finding keeps recurring: as the cost of building drops toward zero, the value of knowing what to build goes up.
First, the operating system. Then, the experiments it made possible.
Several of the experiments below will likely get their own posts over time. This piece is the high-level map first: how the system is set up, and what it can already produce.
The Operating System
One operator. Broader research capacity.
This is the working shape of the system. One principal directs the work. Stable associates handle writing, analysis, engineering, and audit. A wider specialist bench comes in when needed.
The tools matter. What matters more is judgment, agency, and the ability to drive execution.
Project Workspace
Claude Code
Separate harness
Orchestration · Principal
Frontier Reasoning Model
Routes tasks, manages context, coordinates the team
OpenAI Codex
Separate harness
Different harnesses for different jobs: OpenClaw, Hermes, Claude Code, Codex. All working in the same project directory.
Associates
Creative
Writing, design, strategy
Analysis
Research, analysis, fast iteration
Engineering
Code, debugging, architecture
Audit
QA, review, verification
Specialist Pool · On-Demand
Open-source and frontier models on US servers, called when specialized capabilities are needed.
The infrastructure is OpenClaw and Hermes agents, open-source orchestration layers. The system remembers context across sessions, can be reached from anywhere, and spawns specialized sub-agents whose work gets reviewed before it counts. The principal reviews every associate's work and respawns with corrections, like marking up a teammate's draft before it ships.
The entire system runs on a laptop. What remains expensive is judgment, agency, and the ability to drive execution: framing the problem, asking the right questions, and directing specialized intelligence where it matters most.
Eight Probes into AI-Native Intelligence
These are high-level overviews of some experiments. I will release and go into further detail on various experiments in future blog posts. Everything below started the same way: a question relevant to investing or business intelligence, described in plain English. Most went from idea to working MVP in a single session. None required hiring a developer or engaging a consultant.
The point isn't finished products. These are experiments, quick, free probes into whether AI agents can produce genuine intelligence in domains that previously required expensive infrastructure.
Experiment 1
Business Intelligence100 AI Analysts Debate Peloton's Future
Structured adversarial simulation
The Question
Can a swarm of differently motivated AI analysts stress-test a company thesis better than a one-shot investment memo?
The Setup
100 agents are assigned distinct archetypes: dedicated subscriber, growth investor, short seller, skeptic, brand loyalist, cost-conscious consumer, and more. They debate Peloton over five rounds. The visual shows whether disagreement stays real or collapses into consensus.
Round 1. The debate starts dispersed, with only a slight bearish edge.
55% bearishScroll-in animates once. Controls let you compare the opening distribution with the final consensus.
The Finding
Peloton converged to a strong bearish consensus: 91% bearish by round five. The Lululemon control did not converge. The point is not that swarms always agree. It is that they can separate fragile theses from genuinely contested ones.
Gemini ran the agent swarm. Opus synthesized the output. Total cost: a few dollars.
The pattern
Agent swarms can pressure-test a thesis against structured diversity in an afternoon instead of a month.
Experiment 2
Business Intelligence300 AI Personas Simulate a Consumer Market
Real reviews, real pricing, real distances, simulated decisions at $1.80
The Question
Can AI personas grounded in real market inputs reveal hidden consumer demand faster than a traditional research project?
The Setup
Each persona gets a home location, household archetype, and price sensitivity. They are shown nearby facilities using real Google reviews, published pricing, and GPS-derived distance so the simulation reflects actual tradeoffs rather than abstract survey answers.
The Finding
The useful signal lives in the gap between first choice and would consider. Location C is not weak; it is under-converted. 71% of personas would consider it, but only 8% pick it first. That is latent demand a real operator can act on.
Traditional version: $5K+ budget and roughly three months. This version: $1.80, same day, rerunnable whenever the market changes.
The pattern
AI persona simulations grounded in real market data can produce consumer intelligence that used to require months and thousands of dollars, then rerun whenever inputs change.
Experiment 3
Business IntelligenceChicago Permit Velocity
Where public data meets investment intelligence
The Question
Where are renovation permits moving fastest and slowest in Chicago, and is that changing over time?
The Setup
10,047 permits across all 77 community areas are normalized into a neighborhood ranking surface. The view can switch between current speed and six-month momentum so the reader can distinguish a slow market from one that is improving.
Neighborhood intelligence
spread between the fastest and slowest permit markets in the same city
Median
42d
Ranked
75
Fastest → slowest
The Finding
Permit timing is not one citywide market. It is a collection of neighborhood markets with materially different operating conditions. That makes permit velocity useful for underwriting, expansion planning, and local business timing.
Codex wrote the pipeline. Opus audited the methodology. Total software cost: zero.
The pattern
Any public data source that is technically accessible but practically ignored can become a free intelligence layer.
Experiment 4
Business IntelligenceLocal Market Intelligence on Autopilot
Daily composite scoring across 6 markets, zero API cost
The Question
Can a fully automated public-data stack produce a daily market brief that feels closer to internal strategy intelligence than generic macro commentary?
The Setup
20+ collectors pull from TSA, FRED, gas, weather, Google Trends, news, RSS, and local signals. Those inputs are scored into a composite market index and synthesized into a daily brief without manual work.
Composite Market Index
Market Comparison · 6 regions
The Finding
This is the shape of CEO-grade local intelligence built entirely from free public data. Once the collection, scoring, and synthesis loop is automated, the brief becomes cheap to rerun and easy to expand to new markets.
20+ collectors. 400+ days of history. Zero manual input.
The pattern
Repeatable local intelligence layers, composite scoring, daily collection, and AI synthesis can be built from public data for almost any market at near-zero cost.
Experiment 5
Personal ExperimentThe System That Taught Itself to Sell Volatility
SPX options, paper-traded AI-directed trading with feedback loops
The Question
Can an AI-directed system learn which options structures consistently survive across regimes instead of relying on static trading heuristics?
The Setup
Three layers work together: AutoResearcher for backtests, a regime playbook for market context, and an LLM override for real-time macro and news. The system reviews paper-trading outcomes nightly and updates its preferences over time.
The Finding
The system started with no preference and taught itself that credit spreads were the durable answer. That held up in a second, uncorrelated market. The important point is not a single win-rate number. It is that the loop can learn, veto, and adapt from paper-traded outcomes.
BTC cross-check: credit spreads at 91.8% win rate versus 40.2% for debit spreads.
The pattern
Any domain with measurable outcomes and repeatable decisions is a candidate for learning systems that compound over time.
Experiment 6
Business IntelligencePricing Analysis of Pet Services Market
Published rates, review context, and radius-to-location mapping turned into a market benchmark
The Question
Can scattered public pricing pages, review context, and geography be turned into a usable competitive benchmark for a fragmented local market?
The Setup
Public boarding and daycare rates are scraped, normalized, and mapped to the nearest operating location. Reviews and package structure add context, so the output becomes a market ladder instead of a pile of disconnected websites.
Benchmark snapshot
One pass turns scattered public pricing into a market ladder you can actually benchmark against.
Market ladder
Budget
Core
Premium
The Finding
Once public pricing is normalized, the market stops looking like isolated websites and starts looking like structure: a floor, a midpoint, a premium tier, and a real spatial competitive set.
No private data and no hidden geography on this page. Just the broader lesson: public rates, review context, and basic geo logic become a useful benchmark surprisingly fast.
The pattern
Public websites already contain a large share of local market structure. With scraping, normalization, and simple geo logic, competitor pricing stops being scattered pages and becomes a usable benchmark.
Experiment 7
ResearchA Physics Paper Without a Physics Degree
Get Physics Done, from curiosity to research artifact in days
The Question
Can an AI-native research workflow turn curiosity in a domain I do not know into something that actually looks and feels like a structured research artifact?
The Setup
Multiple model passes were used to explore competing physics frameworks, draft arguments, and clean the output into a coherent artifact. The point is not peer review yet; it is whether the workflow can get from zero to serious-looking research in days.
QFT
Quantum Field Theory
FVD
False Vacuum Decay
CCC
Penrose Conformal Cycles
BIO
Bohm Implicate Order
5 models · ~40 pages · structured research artifact
Multiple frameworks, multiple model passes, quality control at every stage.
The Finding
The surprise was not the specific physics claim. It was how quickly the workflow moved from loose curiosity to a real research-shaped artifact. Whether the conclusion survives expert scrutiny is a separate question from whether the pipeline works.
The pattern
Domain expertise barriers are falling. AI-native workflows collapse the distance between curiosity and a structured first-pass research artifact from semesters to days.
Experiment 8
ResearchCan Relational Structure Predict the Future?
Intelligence Density, measuring predictive signal in how things connect
The Question
If you know how the parts of a system relate to each other, not just what the parts are, can that relational structure improve prediction?
The Setup
The framework compares a relation-aware predictor against a matched baseline that only sees local state. First it is tested in a synthetic coupled system, then in real language-model traces, to see whether structure itself carries signal.
Synthetic · Coupled Markov
13.8 ± 3.0 pp over matched baseline
At Scale · GPT-2 Prompt Traces
n = 500 prompts · relation-aware prediction still beats the baseline
Accuracy shift
.301
Baseline
.317
Relation-aware
The signal survives contact with a real language model, even if it is much smaller than in the synthetic system.
The Finding
Relational structure appears to carry predictive information. It is strong in a synthetic coupled system and still positive in real language-model traces. The contribution is not just the result; it is a reusable way to ask whether structure itself contains signal.
In plain English: how things connect may help predict what happens next, beyond knowing the things alone.
The pattern
Relational structure, how components connect and not just what they are, may be a measurable predictive signal. The framework to test that is reusable across domains.
What's Next
The experiments continue
None of these are finished products. They're probes, fast, cheap tests of whether AI agents with human judgment can produce real intelligence.
The consistent finding: they can. And the marginal cost of each new experiment is approaching zero because the operating system is reusable. Every new question just needs a brief and a session.
The firms that build this capability will compound an information advantage with every deal, every quarter, every new question they think to ask. The ones waiting for someone to package it into a SaaS product will pay a subscription for yesterday's intelligence.
The tools are open source. The compute is cheap. The scarce resource is the same one it's always been: knowing what to ask.
This post is the map
More detail can branch out from here
Several of these experiments will likely earn their own essays. For now, this piece is the high-level frame first: the operating system, the probes, and the pattern that ties them together.
Part of a series
Three essays, three scales
One argument, expressed at three altitudes: The Principal and the Swarm at 20 feet, Reach at 2,000 feet, and The New Stable Orbits at 20,000 feet.
2,000 FT · PATTERNS
Reach
How breadth of mind decides which work survives the AI era, from dormant cognitive reserve to the overlap where systems-thinking and execution stay alive in the same person.
20,000 FT · SHIFTS
The New Stable Orbits
How AI lowers coordination overhead inside firms, widens the set of viable business shapes, and creates entirely new classes of operators at both ends of the size distribution.
Read them together if you want the full arc: what gets built, the patterns underneath it, and the structural shifts that follow.