Minimalist Scalac hero graphic with a black Rust crab at the center, connected to chat, routing, context, code, UI, and database components in an A2A multi-agent system.

Rust as the A2A Orchestrator: What We Learned Building a Multi-Agent System

Minimalist Scalac hero graphic with a black Rust crab at the center, connected to chat, routing, context, code, UI, and database components in an A2A multi-agent system.

A recruiter types a query into a chat interface. Something like: “Show me candidates who applied in the last two weeks and haven’t been contacted yet.”

Before the model sees that message, five things happen. The orchestrator checks the user’s role. It makes a lightweight LLM call to detect sub-intent — is this a search request, a workflow action, or something in between? It loads the domain-specific context the model will need. It decides which agent handles the request. Then it passes a shaped prompt, not raw input, to the model.

All of that pipeline is Rust.

We covered this system’s full architecture in Part 1: the MCP server in TypeScript, the JVM-side domain agent built on ADK-Java, the A2A communication layer, the LiteLLM proxy routing calls to Claude Haiku 4.5. This article goes inside the Rust layer — what it does, how we built it, and what we learned when the ecosystem around us was still being assembled.

The orchestrator isn’t a dispatcher. It’s the layer that decides what the model sees. That reframe matters, and it shapes every engineering decision this article is about.

What the Orchestrator Actually Does

The system is a conversational AI interface for an enterprise recruitment platform. Users — recruiters, admins, hiring managers — can ask about candidates, pipeline stages, and job postings. They can trigger actions. They get tables, buttons, and interactive flows back, not just text.

Managing that is more complex than routing a string to an API.

Two levels of intent

The orchestrator handles intent at two levels.

The first is role-based. When a session starts, the orchestrator reads the user’s identity from the Bearer token and establishes context: this is a recruiter, this is an admin, this is a read-only viewer. That context is static for the session. It narrows the domain — what the model is allowed to do, which agents are available, which skills apply.

The second level is per-message. Within a conversation, users pivot. A recruiter starts asking about candidates and then switches to asking about pipeline analytics. The orchestrator makes a lightweight LLM call on each incoming message to check for sub-intent shift. If the intent changes, the routing changes.

This two-level structure keeps the per-message call cheap — you’re not re-establishing full session context each time, just watching for drift from the established role baseline.

Chart 1 · Orchestrator decision layer

What happens before the model sees a message

Decision layer — not a dispatcher

User
message

raw input

L1

Role check

Bearer token
session context

L2

Intent LLM
call

sub-intent
detection

Agent Skill
load

shaped
context

Route
via A2A

domain
agent

UI event
stream

→ Frontend

L1 / L2 — orchestrator intelligence
Pass-through node
Output

Source: Engineering interview · Scalac production architecture

UI generation via tool calls

When users expect tables, buttons, and action flows rather than plain text, the model needs a way to express structured output. The approach: the model calls tools.

In LLM systems, a “tool call” is a structured output from the model — instead of generating text, the model returns a function name and a JSON payload. The application intercepts that and executes it. This is how LLMs get access to APIs, databases, and actions.

The orchestrator uses the same mechanism for UI. The model calls a tool with a typed JSON payload describing what it wants to show: a candidate table with specific columns, a button labelled “Schedule Interview,” an action flow for rejection. The orchestrator intercepts that tool call, converts it to a typed UI event, and streams the event to the frontend. The frontend renders based on event type — it doesn’t parse free text, it handles typed messages.

The model doesn’t control the UI directly. It expresses intent through a tool call. The orchestrator owns the rendering contract.

Conversation state in Postgres

Every message — user and assistant, interleaved — is stored in Postgres. On each new request, the full conversation history is sent to the model. No sliding window, no summarization, no state machine. This is the same approach ChatGPT uses: full context on every call.

For now, it’s simple and fast enough. The team is aware it becomes a cost and latency concern as conversations grow long, but that’s a scaling question for after launch.

No Redis, no Kafka, no durable message queue. This was a deliberate MVP decision. The consequences of losing an in-flight task on a restart haven’t materialized yet, because the system hasn’t reached real users.

Why Rust — and What Was Actually on the Table

The system’s architecture is now clear. So: why Rust for the orchestration layer?

Four languages were considered. The decision is worth examining because it shaped every constraint that follows.

Four languages, one choice

LanguageWhy it was consideredWhy it was set aside or chosen
PythonDefault for AI work — LangGraph, CrewAI, every new library lands here firstThe lead engineer had direct experience with productionizing Python in distributed systems: runtime errors, instability under load, bad production reputation at scale.
ScalaThree engineers on the team had Scala backgroundsNot chosen because the client preferred Rust for this layer. The decision was organizational, not a rejection of Scala’s technical fit.
GoMinimal resource usage, strong API server storyLost to Rust. The Scala-to-Go mental model transfer is harder than Scala-to-Rust. Strong typing and domain modeling philosophy carry over more naturally.
RustChosenCTO of the client preferred it. Scala-to-Rust jump is smaller for engineers who think in types. Accepted.

This was not a benchmark-driven decision. It was a combination of organizational preference, prior experience with Python’s production failure modes, and team background. We’ve made a similar case for typed languages in agentic systems before — see why type safety matters more than prompts when the model is one component in a larger distributed system.

The trade-off they accepted

Choosing Rust for an AI orchestration layer in January 2026 meant accepting a specific constraint: the ecosystem was thin. Python has LangGraph, CrewAI, AutoGen, and a new agent framework every two weeks. Rust had Rig — a solid LLM abstraction library — and not much else. No agent framework. Limited protocol SDKs. Almost no prior art for this specific problem.

Whether that trade-off was worth it is still an open question. More on that later.

Building the Agent Loop from Scratch

The agent loop — the cycle of send prompt → receive response → decide what to do → act → repeat — is the core of any agentic system. In Python, frameworks like LangGraph handle this. In Rust, as of January 2026, nothing did.

What Rig gave us

Rig (currently at v0.39.0, released June 2026) is the leading Rust library for LLM-powered applications. It provides a unified API across 20+ model providers, handles serialization, and as of recent versions includes OpenTelemetry support for tracing. It’s well-maintained — we submitted feature requests directly, and some were adopted into the library.

What Rig doesn’t provide: the agent loop. It doesn’t decide when to call a tool, when to loop, when to stop. It abstracts away the LLM provider. The orchestration logic above that layer is yours to write.

This is by design. Rig is an abstraction layer, not an orchestration framework. The distinction matters.

The manual agent loop

We built the loop ourselves. Simplified:

  1. Build prompt: conversation history + current message + loaded Agent Skill
  2. Call the model via Rig
  3. Parse response: is this text, or a tool call?
  4. If text: stream to user, check if conversation continues
  5. If tool call: intercept, identify type (UI event or domain agent action), execute
  6. If domain agent: route via A2A, wait for result, incorporate into context
  7. Check for sub-intent shift
  8. Loop or terminate

“There’s nothing. No framework, just Rig. We built the entire agent loop manually. We have full control over what happens.”

Full control is accurate. We know exactly what runs at every step. There are no hidden behaviors, no framework magic to debug, no version update that silently changes routing logic.

The cost is also real. Every piece of the loop — error handling, retry logic, timeout management, streaming — was custom work. When Rig added features we had built ourselves, we migrated custom code back to the library. That migration is ongoing.

The ecosystem in mid-2026

Teams starting today face a different situation. By mid-2026, Rust agent frameworks exist that didn’t when we began: AutoAgents uses the Ractor actor model for multi-agent coordination; ADK-Rust ports Google’s Agent Development Kit to Rust; OpenFANG frames itself as a full agent operating system. Tokio remains the uncontested async runtime, with JoinSet and CancellationToken as the canonical pattern for sub-agent lifecycle management.

This doesn’t mean our build-from-scratch approach was wrong in January 2026 — there was no viable alternative. It means the build-vs-framework trade-off is different for teams starting now.

The A2A Protocol Reality

Building the agent loop from scratch was expensive work we controlled entirely. The A2A protocol was someone else’s spec — and it was moving.

Implementing A2A without an SDK

A2A — the Agent-to-Agent protocol developed by Google and now hosted by the Linux Foundation — defines how agents communicate: task submission, status updates, result retrieval. It has official SDKs for Python and Java. When we started, there was no Rust SDK.

We implemented A2A v0.3 ourselves. The JVM domain agent team used the official Google Java SDK, also on v0.3. Both sides converge on the same protocol spec, serialized to JSON over HTTP. Interoperability worked.

This arrangement was functional. It also created an ownership structure: our team owns the Rust protocol implementation. Any change to the spec is our problem to track and re-implement.

A2A v1.0 and the version lock

A2A v1.0 was released in early 2026. The jump from v0.3 introduces breaking changes in the interaction protocol — the way tasks are submitted and status is communicated. The AgentCard format (how agents advertise their capabilities) remained backward-compatible, but the interaction model did not.

A community Rust SDK appeared targeting v1.0. The official Google Java SDK, which the domain agent team uses, has not yet been updated to v1.0 at the time of writing.

The result: both teams are waiting. The Rust side could migrate — the community SDK exists. The JVM side cannot migrate until Google ships the official Java v1.0 update. Until then, both sides stay on v0.3.

“We were blocked yesterday. We’re waiting for Google to ship the new version of the Java library.”

This is the coordination reality of a polyglot system on evolving protocols: migration speed is determined by the slowest SDK, not the fastest team. Our ability to move quickly on the Rust side was irrelevant.

The same pattern, twice

A second protocol governs the UI event stream — the mechanism for sending structured events to the frontend. We implemented this protocol ourselves starting on v0.8. When v0.9 was released, the spec changed significantly. We had to re-implement. At the same time, the image generation feature — which the protocol supported in theory — was cut. The model’s image outputs were too slow for the product; the feature and the protocol scope were reduced together.

The pattern: two protocols, both changed within the project timeline. AI standards in early 2026 iterate on a roughly two-month cycle. If you build on Rust — where you’re often implementing protocols yourself rather than consuming a mature SDK — you’re accepting responsibility for tracking and applying those changes.

That’s not a Rust problem specifically. It’s an ecosystem maturity problem. The difference is that in Python, someone else usually does the implementation work first.

Agent Skills: The Problem That Mattered Most

The protocol challenges were real. But the hardest engineering problem wasn’t the protocol. It emerged from something more fundamental: how do you give a small model a complex domain without drowning it in context?

The system prompt wall

The system needed to guide users through a multi-step CRM workflow. A recruitment pipeline has stages — sourcing, screening, scheduling, offer, close. Each stage has conditions, allowed actions, and hand-off criteria. The model needs to understand where the user is in the workflow, what’s allowed at this stage, and what the next step is.

The first approach: put all of that in the system prompt.

“The hard problem was modelling a multi-step workflow and passing it to the model without it becoming a wall of system prompt.”

A monolithic system prompt with full workflow knowledge is expensive in tokens, harder to maintain, and pushes even capable models toward degraded reasoning on the parts they’ve loaded but don’t currently need. Add a second domain, a second workflow — the prompt grows, and the problem compounds.

What Agent Skills are

The solution: Agent Skills — an architecture that mirrors the approach formalized in the agentskills.io specification, which Google ADK has also adopted.

An Agent Skill is a self-contained unit of context. It encapsulates the instructions, tools, and domain knowledge an agent needs for a specific capability — not the whole system, just one coherent piece.

The format has three layers. L1 is metadata: a short description in the skill’s frontmatter, always loaded, roughly 100 tokens per skill. L2 is the instruction body: the actual task instructions, loaded when the skill is activated. L3 is resources: reference documents, schemas, assets, loaded on demand.

An agent with 10 skills starts each call with approximately 1,000 tokens of L1 context — the menu — instead of 10,000 tokens of full instruction text spread across every skill.

Chart 2 · Context window comparison

Agent Skills vs. monolithic prompt — baseline token load

~90% reduction in baseline context
Monolithic system prompt 10,000 tokens

All skills loaded on every call — full domain knowledge always in context

10,000 tokens — always loaded
Agent Skills · baseline (L1 only) ~1,000 tokens

10 skills × ~100 tokens L1 metadata each — the menu, always loaded

L1
90% not loaded
Agent Skills · active skill (L1 + L2) ~2,000 tokens

L1 baseline + L2 instructions loaded only for the triggered skill

L1
L2
80% not loaded

Baseline reduction

~90%

1,000 vs 10,000 tokens

L1 per skill

~100

tokens — always loaded

L2 on trigger

~1,000

tokens — active skill only

Monolithic — always loaded
L1 — always loaded
L2 — on trigger only
Not loaded

Source: agentskills.io specification · Google ADK documentation · Scalac production architecture

The model reads what it needs to know about each capability, then loads the full instructions for the one that applies. Everything else stays out of the context window.

What changed for us

We applied this to the CRM workflow. The domain skill carries the workflow structure: here are the stages, here are the tools for this stage, here are the conditions for advancing. The orchestrator loads the appropriate skill based on detected intent and user role. The model receives shaped, relevant context — not a comprehensive brief it has to scan to find the relevant section.

“We got Claude Haiku 4.5 — a small model — to guide a user through an entire CRM workflow, start to finish, without hallucinations.”

That’s the result. A model in the lightweight tier, on a constrained context, navigating a multi-step domain workflow correctly. The breakthrough wasn’t the language or the runtime. It was how the context was structured.

This matters beyond our specific system. Any agentic application that models complex domain workflows will hit the system prompt wall. Agent Skills offer a way through: load only what’s needed, when it’s needed.

Where We Are Now — and What We Don’t Know Yet

The system is in the final pre-production phase: integrated end to end, in daily use by developers and data scientists, and being prepared for real-user rollout.

Developers and data scientists are the current user base. A team is building evaluation infrastructure: LLM traces flow to MLFlow, and automated evals are being built for tool call correctness, response quality, and hallucination detection. Replay testing — running historical conversations and checking outputs — is just starting.

One dependency before launch: another team is adding the AI interface to the main product UI. We built a demo on their stack to unblock the work and stop waiting.

What real users will expose

Developer testing and real users are not the same thing. Users who know the domain will query the system in ways that developers who designed it don’t anticipate. Edge cases in intent detection will surface. Agent Skill boundaries will be tested by requests that sit between categories. The absence of durable messaging will matter if the system restarts during a long-running operation.

“We’re in a lucky position — we don’t have users yet. We can still make changes without breaking anything. Once real users arrive, the rules change.”

Put differently: the team is still in a useful window. Real-user rollout hasn’t happened yet, so architecture changes can still be made before external usage hardens the system. These aren’t concerns about the design being wrong — they’re a deliberate list of what the team is watching for, made possible by testing internally before anyone outside the company depends on the system.

The honest Rust verdict

“You’re asking too early. What I’m curious about: was Rust the right call versus Python, which has full ecosystem support and every new feature first? We don’t know yet.”

It’s too early to claim Rust was categorically the best choice — and the team isn’t claiming that. What can be said is narrower and more useful: Rust gave the team full control over the orchestration loop, at the cost of owning more integration and protocol work than a Python stack would have required. That’s not an absence of a verdict. It’s a specific, bounded trade-off, and the team can name exactly what it cost and what it bought.

This case study is evidence about what it’s like to build, not a final benchmark verdict on Rust versus Python.

When Rust Makes Sense as the Orchestration Layer

Not a recommendation — a set of conditions based on what this team actually experienced.

For context: as of mid-2026, production benchmarks from the Rust AI agent ecosystem (Zylos Research, April 2026) show roughly 5x lower memory usage and ~43% lower latency compared to Python equivalents in agent workloads. These are framework benchmarks, not our numbers. We can’t confirm them from our own system yet. But the structural reasons behind the numbers — no GIL, no GC pauses, true parallelism via Tokio — are real and observable in development.

Rust is a reasonable choice when:

  • You need full ownership of the agent loop. No framework between you and the execute cycle means no hidden behavior and no version updates that silently change routing logic. If your production system needs auditable, deterministic orchestration, that control matters.
  • Your team has Scala or typed-language background. The mental model transfer is real — strong typing, algebraic data types, domain modeling through the type system carry over in ways that Go doesn’t support as naturally.
  • The protocols you need have Rust SDKs, or you’re willing to implement them. By mid-2026, a community Rust SDK for A2A v1.0 exists. MCP has Rust client libraries. The ecosystem is no longer empty, but it’s still thinner than Python or Java.
  • You plan to track library releases actively. Rig is moving fast. Features we built custom in January 2026 were in the library by June. Plan to migrate back as the library catches up.

Rust is a harder choice when:

  • You need to iterate quickly on agent design. Python’s framework density lets you experiment with agent architectures in days. Rust’s build-from-scratch approach costs more per iteration.
  • The protocols you depend on don’t have Rust SDKs yet, and you can’t maintain protocol implementations internally. The v0.3/v1.0 lock we hit illustrates the dependency surface: in a polyglot system, you move at the speed of the slowest SDK.
  • You’re in a discovery phase where domain complexity is still unknown. The thin ecosystem is easier to accept when you know exactly what you’re building.
  • Your team has no systems programming background. The learning curve is real — though AI-assisted development helped compress it significantly. One engineer joined this project with no prior Rust or LLM experience and was contributing at normal velocity within weeks.

One distinction worth making: the protocol churn we experienced — two specs changing within the project timeline — would have been less painful in Python, where the community typically absorbs the migration cost. That’s not an argument against Rust. It’s an accurate description of what it costs to be early in a thin ecosystem.

Conclusion

The language choice was the first decision. The hardest problem came much later — how to give a small model a complex domain without drowning it in context. That turned out to be an architecture problem. Agent Skills solved it. Rust was where it ran.

If you’re building a multi-agent orchestration layer and want to talk through the engineering trade-offs, reach out: projects@scalac.io or send us a message through this form.

And if you haven’t read Part 1 — the full system architecture, from MCP to the JVM domain agent to A2A to LiteLLM — start there.

Get the State of

Scala 2025 report

Download now

Latest Blogposts

01.07.2026 / By 

Rust as the A2A Orchestrator: What We Learned Building a Multi-Agent System

Minimalist Scalac hero graphic with a black Rust crab at the center, connected to chat, routing, context, code, UI, and database components in an A2A multi-agent system.

What we learned building a Rust orchestration layer for a real multi-agent AI system — from A2A protocol churn and manual agent loops to Agent Skills and context design.

18.06.2026 / By 

AI Agents Are Distributed Systems. Why Scala’s Type Safety Matters More Than Prompts

Scalac blog hero image showing AI agents as a distributed system with connected service nodes.

AI agents fail at system boundaries, not in prompts. Here's where Scala's type safety helps when LLM pipelines move from prototype to production.

17.06.2026 / By 

Scalendar – July 2026

Welcome to the July 2026 edition of Scalendar — your monthly guide to Scala events, conferences, meetups, and community happenings from around the world. This month features a strong lineup of events for Scala developers, with a particular focus on programming languages, software engineering, functional programming, and AI. From Scala-specific workshops to major international conferences […]

software product development

Need a successful project?

Estimate project