
AI on the JVM: Multi-Agent Architecture with Apache Pekko, Java, and Rust

1. Introduction: Agentic AI is a Backend Engineering Challenge, Not Just Prompt Engineering
LLM models are useless without access to data. They can generate text, but to “find suitable candidates for open vacancies,” they must call real APIs, read results from a database, and make decisions based on the current state of the system. While Python dominates model training, in Enterprise-class systems, where data is sensitive, clients are rigorously isolated, and availability is measured in “nines”: the integration layer requires production-ready tools.
We have built such a system: an autonomous agent that, based on a user’s prompt, searches a multi-tenant database, filters results, and orchestrates sub-tasks between services written in different languages. It is a specialized domain agent connected to a common orchestrator written in Rust. This architecture allows for the seamless inclusion of additional agents without modifying existing code.
Below, we describe the specific architectural decisions we faced when combining the JVM and Rust ecosystems with the world of artificial intelligence, along with their justifications and costs.
2. Model Context Protocol: A Pragmatic Technology Choice
A fundamental problem in building agents is standardizing communication between the AI model and external systems. Instead of building a dedicated integration for each model separately, we relied on the Model Context Protocol (MCP) — an open standard based on JSON-RPC 2.0 that defines how an AI application discovers available tools (tools/list) and calls them (tools/call).
Our core domain logic, HTTP endpoints, multi-concurrency, database integrations, runs on Apache Pekko HTTP (a high-performance, open-source continuation of Akka). However, the MCP server itself was exposed using the reference @modelcontextprotocol/sdk library in a TypeScript environment.
This is a conscious engineering compromise. TypeScript serves here solely as a thin adapter translating the MCP protocol into HTTP calls to our Pekko backend. All “hard” business logic remains in the safe, strongly-typed world of Scala. This gives us compatibility with the rapidly evolving MCP standard without the risk that the need to backport a new version of the specification will break our stable, production code on the JVM.
We also deliberately avoided excessive input parameter validation on the MCP server side. A tool (e.g., list_jobs) accepts parameters such as limit, open_only, or job_type without checking them in the TypeScript layer. Verification occurs only at the Pekko endpoint. If the JSON structure is incorrect, the agent receives an exception directly from the server. This approach simplifies the intermediate layer and forces self-correction on the model side instead of error suppression.
3. The Java Ecosystem in the AI World: Encountering ADK-Java
The decision-making layer of our agent is based on ADK-Java (Agent Development Kit). To be outspoken: while the library is highly useful, the AI tool ecosystem for the JVM is still less mature than its Python counterparts.
This manifests in two key areas:
1. Task State Management: While in Python tools the task lifecycle (submitted → working → completed/failed) is often fully abstracted by the framework, in ADK-Java we had to manually implement status tracking, state management, and polling.
2. Serialization in a Polyglot Environment: Our Pekko backend (Scala) uses the lightweight and safe Circe library (io.circe.Json). Meanwhile, ADK-Java is deeply integrated with Jackson (com.fasterxml.jackson.databind.JsonNode). Since both formats cannot be directly cast, it was necessary to create dedicated converters between the layers. This is pure overhead resulting from a polyglot technology stack.
In practice, this meant creating a dedicated conversion layer: deserializing io.circe.Json to String, then parsing with Jackson to JsonNode. The solution works, but it is a cost we pay with every call between layers.
4. Multi-tenancy: Data Isolation Without LLM Involvement
Enterprise-class systems operate on extremely sensitive data. Client A must never gain access to Client B’s data, even (or perhaps especially) when an agent generates database queries dynamically based on natural language prompts.
We solved this problem by completely pushing the tenant isolation decision outside the LLM model. This mechanism works in three steps:
1. A Bearer token is retrieved at the system entry and passed in the HTTP header through all communication layers — from the A2A client, through the MCP server, to the Pekko endpoint.
2. The backend on the Pekko platform verifies the token and extracts the `tenantId`. Every query reaching the database is automatically filtered by this identifier. This layer cannot be bypassed.
3. The LLM model never receives knowledge of the tenantId or access to the authorization mechanism. Even if a malicious user uses a prompt injection attack, typing “show me data for candidates of Company X,” the agent does not have a tool (tools/list) that would allow it to override the security context.
The TypeScript adapter passes the Bearer token without inspection; the verification logic and tenant isolation are exclusively on the Pekko side.
Data isolation enforced at the infrastructure level, not by the model, is the same principle behind our broader AI work at Scalac.ai. For organizations where sensitive data cannot leave a controlled environment, we deploy open-weight models directly within your infrastructure. More on that approach: scalac.ai/sovereign-ai.
5. Agent-to-Agent (A2A) Architecture and Polyglot Backend
A distributed AI system rarely relies on a single central agent. In our case, we applied a clear division of roles. The shared orchestration infrastructure (acting as a dispatcher) was written in Rust. Our system based on the JVM environment (Java/Scala and ADK-Java) is one of the specialized domain agents connected to this orchestrator.
This topology means that the main agent in Rust routes sub-tasks to the appropriate specialist based on the semantic content of the task. Agents exchange information using the A2A communication standard. Importantly, we opted out of implementing heavy message brokers (like Apache Kafka or AWS SQS) here.
Agents send Task objects to each other directly via HTTP. Each Task has a unique identifier and its lifecycle: submitted -> working -> completed/failed. Polling is done by taskId. Implementing such a solution radically simplifies infrastructure operations and environment debugging at an early stage of the project, accepting the trade-off of lacking full message durability in the event of a hard compute instance failure.
6. LLMOps: Model Proxies
To optimize queries sent to language models, we use LiteLLM as a proxy layer. By exposing a unified API (OpenAI-compatible), we can freely switch between different providers like Anthropic, OpenAI, or local models, without having to interfere with the code on the JVM. Currently, we achieve an excellent cost-to-reasoning precision ratio using the Claude Haiku 4.5 model.
7. Summary
Transforming an existing API into an Enterprise-level Agentic AI system is primarily a challenge in distributed engineering: task state management, identity propagation through layers, translation between serialization formats, and debugging communication between agents written in different languages.
The technology stack we chose: Apache Pekko HTTP as the domain core, ADK-Java as the recruiting agent orchestrator, TypeScript as the MCP adapter, and Rust as the general A2A orchestrator, is working in production. Each of these choices carries a specific cost, which we have described above.
Do you have an innovative idea for integrating AI into your product, but your system requires performance and reliability unavailable to simple scripts? Contact Scalac.io. Our Team Extension teams offer a decade of experience in building reactive systems on the JVM (Scala, Java), implementing the MCP protocol, creating microservices in Rust, and optimizing infrastructure bottlenecks. We will help you build your agent securely, without scaling operational costs.


