Dvora

AI agent swarms that simulate dynamics no single model can produce.

Portfolio project · May 2026

I wanted to see how AI agents could be used to simulate supply chains, and what kind of value that could produce.

I'd been seeing tools come out that use AI agent swarms as one-shot prediction tools. Tons of agents running in parallel, weighting their outputs against each other to produce a single forecast. I was already curious about supply chains and geopolitics and risk, and I noticed nobody had applied this technique to a domain where the interaction dynamics between actors are the whole point, not just an aggregation of opinions.

So I started building. Dvora is what came out of that.

How Dvora reads a scenario.

Inputs in. Coordination dynamics out. Hours, not weeks.

INPUTS

Scenario

Export control, sanctions, permit denial, acquisition

Supply chain

Mapped graph of every actor that matters

Specific question

What positions move? Where do constraints emerge?

OUTPUTS

Coordination chains

Multi-step entity interactions

Emergent dynamics

Coalitions, constraints, vulnerabilities

Evidence trails

Auditable transcripts and reasoning

Overview · scenario to findings.

How it works.

The way it works: Dvora maps a real supply chain into a graph of every entity that matters. Companies, governments, banks, regulators, end customers. Every entity has its own data attached to it, who they are, how they act, their filings, their relationships. A brain of memories for each agent.

Then every entity becomes its own AI agent reasoning in that entity's best interest with that entity's actual context. You give the swarm a scenario, like an export control or a sanctions package or a permit denial, and the simulation runs round by round. Every actor responds based on what they know, what they want, and what the others just did.

Who would use this.

This is for people who think about what would actually happen when a supply chain gets hit, not as a forecast but as a play-by-play. Defense procurement analysts. Manufacturers trying to harden or expand their supply. Government policy offices working on industrial strategy. Specialty risk underwriters pricing political and supply-chain disruption coverage. Hedge funds with critical-minerals or commodity exposure. People building agent systems and running into similar problems.

What you see in the output: where coalitions form and fracture, where binding constraints emerge from somewhere unexpected, where one actor's response opens a vulnerability or closes one. The findings aren't predictions. They're the dynamics of the system itself, made visible.

Finance

Fund analyst

Spots positions about to shift. Reads coordination dynamics.

Operations

Supply chain head

Spots supply-side vulnerabilities. Tests procurement strategies.

Policy

Government policy lead

Surfaces policy gaps. Stress-tests interventions.

Defense

Defense procurement analyst

Maps strategic dependencies. Identifies single points of failure.

What's built so far.

Dvora is in prototype. The core engine works, but there's a long way to go. Here's what runs today.

The data layer mattered most, so that's where I started. An ingestion engine reads source documents (industry reports, regulatory filings, academic papers, government strategy documents) and produces a curated supply chain graph. Four agents handle this: extraction, modeling, mapping, and quality. Entity identity gets enriched against Wikidata, OpenSanctions, GLEIF, and SEC EDGAR. I'm not rebuilding canonical data that already exists in public sources.

Before any agent runs, a deterministic setup tier pulls live data from ten external APIs (trade flows, commodity production, macroeconomic indicators, sanctions, export controls, geopolitical risk, public filings) and runs three computational layers on top (concentration metrics, network topology, critical-minerals classification). The output is a per-scenario world state, cached and reproducible.

Then the simulation runs. Every entity in the graph gets instantiated as its own Claude Agent SDK instance with its own system prompt, model assignment, and access to a five-tool toolkit (forecasting, optimization, network analysis, propagation, web search).

The custom message bus routes everything between them with strict schema validation, write-ahead persistence, structural visibility rules. The runner orchestrates rounds, manages rate limits, and persists state after every round, so the simulation is restartable from any checkpoint.

What comes out: full transcript, per-agent reasoning traces, tool call logs, state-block history, deterministic run summary.

Here's the pipeline, end to end.

Pipeline overview.

Documents in. Curated graph. Running simulation.

Ingest

Four-agent team turns source documents into a curated supply chain graph.

Extraction Modeling Mapping Quality

Enriched against Wikidata · OpenSanctions · GLEIF · SEC EDGAR

Setup

Deterministic Python pulls live data and runs computational layers. Output is a cached, reproducible world state.

DATA

Trade flows
Production
Macro indicators
Sanctions
Export controls
Geopolitical risk
Public filings

COMPUTE

Concentration metrics
Network topology
Critical-minerals classification

Simulate

Every entity runs as its own Claude Agent SDK instance with a tier-routed model. Custom message bus mediates everything.

Agent SDK · per entity Tier-routed: Opus / Sonnet / Haiku Message bus · schema · persistence · visibility

OUTPUT Transcript · Reasoning traces · Tool logs · Run summary

Pipeline · documents to simulation.

First test run: rare earth magnets, May 2026.

May 2026, rare earth magnets supply chain under MOFCOM HREE export controls.

These numbers are from the first test run. They vary by scenario and supply chain.

148

Agents

Rounds

1,241

LLM calls

12.9 h

Runtime

$105.26

API cost

Three coordination chains worth pulling out:

Finding 01 / 03

Finding 1: Inter-ministerial coordination on a magnet-embedded threshold

In rounds 1 through 4, Chinese ministries (MIIT, NDRC, MOFCOM, MLR) negotiated a combined Dy+Tb mass-fraction threshold for finished magnet exports. This addresses a leakage vector that an upstream-only oxide control would miss. Each step responded to an operational problem the prior step exposed.

JL Mag → MIIT, NDRC, MOFCOM (Round 1)

"Confirming clarification: are finished sintered NdFeB magnets containing controlled HREE elements treated as controlled articles, or does the control terminate at oxide / metal stage?"

MIIT → NDRC, MOFCOM, JL Mag, CNREG, CREG (Round 2)

"Preliminary guidance: a tiered combined Dy+Tb mass-fraction threshold is appropriate. 0.10% civilian green channel / 2.00% high-coercivity defense classification. Verification via XRF screening + ICP-MS confirmation."

CNREG → MIIT, NDRC, MOFCOM, MLR (Round 3)

"Concur on the 0.10/2.00 architecture. Reciprocal portfolio discipline accepted with technical-reality carve-out: incidental co-products from light-rare-earth extraction (Bayan-Obo) cannot be reclassified as portfolio incursion. Proposing quarterly director-level reconciliation."

MLR → CREG, CNREG, MIIT (Round 4)

"60-day customs traceability baseline initiated. Operational flags from CREG: ion-adsorption clay variable assay, Burma-origin multi-hop documentation chain, toll-processing beneficial-ownership disclosure."

Demonstrates: heterogeneous agent interaction at multi-step depth. Each step responding to a specific operational problem the prior step exposed.

Finding 02 / 03

Finding 2: Provenance challenge surfacing a real procurement vulnerability

In round 1, Apple challenged MP Materials on whether their existing US supply commitment actually protected against heavy-REE exposure routed through Chinese processors. MP's response, JPMorgan's structured-finance proposal to bridge the gap, and DLA's stockpile inquiry formed a cross-actor chain on a real vulnerability.

Apple → MP Materials (Round 1)

"Our $500M offtake covers Nd-Pr from Mountain Pass for iPhone speakers, Taptic Engines, and MagSafe components. Where does the Dy/Tb in the finished magnet come from? We require a 21-day heavy-REE provenance audit covering all magnet components in our supply commitment."

MP Materials → Apple (Round 1)

"Acknowledge. Heavy-REE fraction in current production sources through Shenghe-routed concentrates. We are opening a formal HREE feedstock procurement track with Lynas in parallel. Title III parallelism request to DoD to extend $110/kg price floor structure to Dy/Tb."

JPMorgan → MP Materials, Apollo, EU Commission (Round 2)

"Structuring proposal: 150-200 bps governance-pricing penalty for opaque-provenance projects. Critical Minerals HREE Capital Gap Analysis to identify 4-6 non-Chinese HREE-bearing projects with NI 43-101 resources."

DLA → MP Materials, Lynas, USA Rare Earth (Round 3)

"Year 1 stockpile target: 50 MT Dy + 10 MT Tb + 50 MT Y. Year 2+: 500 MT HREE. Emergency procurement under Defense Stockpile authority. Indicative pricing requested by Day 15."

Demonstrates: emergent operational specificity from agent interaction. Apple's question wasn't prompted. MP's answer addresses a specific real vulnerability (Shenghe routing). The chain of subsequent commitments forms only because each agent had its own institutional context.

Finding 03 / 03

Finding 3: Permit as binding constraint

Lynas, asked about US heavy-REE expansion timelines, named a specific Texas wastewater permitting situation as the binding constraint. Not capital, not technology, not ore. The kind of operational specificity a real Government Affairs lead would write.

Lynas → DoD, DLA, DFC (Round 1)

"500 mt/yr heavy REE by 2027 is not a commitment we can make from current US footprint without the Seadrift permit unblocked. Wastewater discharge approval requires TCEQ + EPA Region 6 alignment. Requesting a federal interagency convening with TCEQ, EPA Region 6, and DoD at one table."

DoD → Lynas, DOE, EPA, DOI (Round 2)

"Convening mechanism: EO-13953 Emergency Critical Minerals Coordination Group, 7-day plenary. DLA Day 15 NDS inventory assessment to baseline current strategic material position."

EPA Region 6 → Lynas, TCEQ, DOE (Round 3)

"DOE-EPA-TCEQ scoping call within 5 business days. Wastewater discharge permitting under CWA section 402 NPDES. EPA can provide technical assistance and emergency review pathways; statutory authority remains with TCEQ."

Demonstrates: institutional specificity from per-entity context. The Seadrift permit, TCEQ jurisdiction, EPA Region 6 are real-world specifics the agent surfaced because the entity's data substrate included them.

Here's what one agent does in a single round.

How a single agent works.

Per-entity reasoning, mediated by the message bus.

INPUTS

System prompt

Entity identity, role, behavioral disposition

Entity data file

Legal name, supply chain position, history, relationships

World state slice

Scenario parameters, indicators, cached external data

Inbox

Point-to-point, broadcast, observed messages routed by the bus

Tool results

Forecasting, optimization, network, search outputs

AGENT CORE

Claude Agent SDK

Per-entity instance

Tier-routed model

T1→ Opus

T2→ Sonnet

T3→ Haiku

OUTPUTS

Outbound messages

Validated by bus, routed by visibility rule (P2P, broadcast, observed)

Declared state changes

Public state updates, private state updates

Reasoning trace PRIVATE

Private audit log, full chain-of-thought

Tool calls

Args and results, logged separately

MESSAGE BUS Mediating layer · custom asyncio implementation

✓ Schema validation ↻ Write-ahead persistence ⊕ Visibility rules @ Regulatory auto-add → Recipient routing

One agent, one round.

Per-entity AI agents.

What it is: every entity in the supply chain runs as its own independent agent, built on the Claude Agent SDK. The agent has its own institutional context, its own LLM calls, its own reasoning trace, its own state.

Why this choice: a single model role-playing many entities produces narrative consistency, not interaction. Real per-entity agents reason in parallel and respond to each other's specific output. That's what produces emergent dynamics.

Custom message bus with strict visibility rules.

What it is: all inter-agent communication routes through a custom asyncio message bus. Messages get validated against a strict schema. Visibility is enforced structurally: point-to-point, broadcast, or observable (regulators with jurisdiction see it automatically).

Why this choice: information asymmetry is what makes negotiation realistic. A regulator should only see what it has jurisdiction to see. A private commercial conversation should stay private. Building this into the bus instead of agent prompts means the structure can't be bypassed.

Tier-based model routing.

What it is: each entity is assigned a tier reflecting strategic importance. Tier 1 entities default to Opus, tier 2 to Sonnet, tier 3 to Haiku, with optional manual promotion.

Why this choice: reasoning depth should scale with what's being reasoned about. A government ministry's policy decision deserves more compute than a tier-3 long-tail entity's procurement check. This keeps cost controlled while preserving fidelity where it matters.

Canonical entity substrate.

What it is: entity identity (legal name, country, public identifiers, parent/subsidiary structure, classification) gets sourced from canonical public databases like Wikidata, OpenSanctions, GLEIF, and SEC EDGAR. The ingestion engine extracts only the chain-specific structure on top.

Why this choice: canonical entity data is a solved problem. Building it from scratch reproduces work the public databases have already done. The differentiation is the simulation, not the data pipeline.

What's next.

Three things matter most, in order.

First, the deliverable layer. A synthesis output that turns a 9MB transcript into a 30-page scenario report with ranked vulnerabilities, cascading effects, and evidence trails. This is what closes the gap between an interesting simulation run and something actually useful.

Second, calibration. Running the engine against historical events whose outcomes are known, to surface what kinds of dynamics the simulation captures well and where it misses.

Third, scenario engagements. Running Dvora against specific customer questions: a fund's specific position, a manufacturer's specific exposure, a policy office's specific decision. Not generic disruption runs.

Get in touch.

I'm building this in public. If supply chains, scenario analysis, geopolitical risk, or agent-based modeling is your space, or if you're building something similar and running into the same problems, I'd like to hear from you. Open to ideas, pushback, or pointers to people I should be reading.

adinzweigbaum@gmail.com · linkedin.com/in/adin-zweigbaum-11996019b