THE SIGNAL: What matters in distributed systems | #4

June 2026 | Issue #4

Welcome back. AI coding agents pushed GitHub to 275 million commits a week, and Microsoft’s own cloud couldn’t keep up: enterprise SLAs slipped below 99.9% before Microsoft started routing overflow to AWS. A 119-point Hacker News thread argues that multi-agent coding tools quietly rediscovered a 40-year-old impossibility result.


Today: What to watch this month


The Architecture Debate: Is Multi-Agent Coding a Distributed Systems Problem?

A post by Kiran that spread fast this month argues that multi-agent AI coding systems inherit the coordination failures Fischer, Lynch, and Paterson proved impossible to solve in 1985, and draws the same line to the Byzantine Generals problem. Agreement among independent actors that can fail or communicate unreliably has no guaranteed-terminating solution. Point several coding agents at the same codebase and you built a consensus problem, whether you meant to or not. The Hacker News thread pulled 119 points and 62 comments, real engagement for a theory post.

Where the argument holds. Teams running parallel coding agents against shared state hit exactly the symptoms distributed systems engineers know by heart: stale reads, conflicting writes, work that silently duplicates because no agent knew another agent already finished it. Sequential stages with deterministic verification gates (plan, design, code, then a hard check before merge) work in practice precisely because they avoid the shared-mutable-state trap that makes consensus hard in the first place.

Where it breaks down. LLM agents are not independent actors in the Byzantine fault tolerance sense. They share weights and training data, so two agents facing the same ambiguous prompt tend to drift toward the same wrong answer instead of failing independently. That undermines the FLP framing, which assumes failures are uncorrelated. Commenters also point out that humans coordinate systems at Linux-kernel scale without a formal consensus proof in sight. Task scoping and clear specs do more work than any protocol.

Scalac angle: don’t reach for consensus algorithms to fix multi-agent coding chaos. Reach for the boring distributed-systems trick that actually works here: reduce shared mutable state. Give each agent an isolated workspace, a narrow task, and a deterministic gate before merge. If two agents must touch the same files, that’s a scheduling problem, not a Byzantine fault tolerance problem. Solve it with locks and ordering, not philosophy.

Notes from the Trenches: GitHub’s Infrastructure Cracks Under the AI Agent Load

The problem: GitHub processed 1 billion commits across all of 2025. By April 2026, that pace hit 275 million commits a week, putting the platform on track for roughly 14 billion commits this year. GitHub COO Kyle Daigle put it plainly: “There were 1 billion commits in 2025. Now, it’s 275 million per week.” Claude Code alone went from about 100,000 commits a week in September 2025 to 2.6 million by April, a 25x jump in six months. AI agent pull requests grew from 4 million to 17 million over the same stretch, a 325% increase.

The infrastructure, built for human-scale commit patterns, buckled: on April 1-2, Copilot’s backend exhausted resources and went down for 2.7 hours, code search stayed offline for 8.7 hours, and emergency rate limiting degraded the Copilot Cloud Agent for four hours. The following week, agent session wait times hit 54 minutes against a normal baseline of 15-40 seconds, and session startup failures briefly spiked to 97.5%. May added nine more outages. By June, availability had dropped below the 99.9% enterprise SLA threshold.

The solution, for now. Microsoft confirmed on June 16 that it’s routing GitHub’s burst compute and storage load, specifically GitHub Actions runners and Codespaces, to AWS, a competitor’s cloud. Azure’s own data center expansion is underway but construction and hardware procurement run 18-24 months. Routing overflow to AWS is a bridge, not a fix, while that capacity comes online.

Scalac angle: capacity models built around human developer behavior don’t transfer to agent-driven traffic. Agents commit more often, open more pull requests, and run in bursts tied to session length rather than working hours. If you’re scaling infrastructure for agent workloads, load-test against agent-shaped traffic specifically, not a scaled-up version of your human usage curve, and keep a real multi-cloud escape hatch ready before you need it. Microsoft had one. Building yours after the outage starts is too late.


Signal Over Noise: Three Critical Changes This Month

1. Apache Kafka 4.3.1 closes a native memory leak in Kafka Streams

Kafka 4.3.1 fixes roughly 15 issues, and the headline one is KAFKA-20616: createOffsetsCFOptions() created a ColumnFamilyOptions object that was never stored or closed, and on the JNI side that silently allocates an 8 MB LRUCache every time. The leak multiplies per segment and per task, hitting windowed stores hardest. A second bug on the close path skipped handle cleanup entirely when an exception fired during EOSv2 shutdown, because the cleanup ran outside a finally block. If you run Kafka Streams with RocksDB state stores, especially windowed ones, upgrade to 4.3.1 and check native memory usage trends over the following week, not just at deploy time.

2. DK 27 locks its feature set

JDK 27 forked off for its rampdown phase on June 4, and the feature list is now frozen. Compact object headers become the default header layout; in the JEP’s own SPECjbb2015 benchmarks that cuts heap usage 22% and CPU time 8%, with 15% fewer garbage collections under G1 and Parallel. G1 becomes the default garbage collector in every environment, not just server-class machines. JFR gets data redaction for secrets (JEP 536), and jcmd gains post-mortem core dump analysis (JEP 528). Oracle deprecates the macOS/x64 port for removal. If you run distributed systems on the JVM at scale, the heap reduction alone is worth planning a JDK 27 upgrade around once it ships, independent of any other feature.

3. Scala’s codebase passes an independent security audit

Quarkslab’s audit of the Scala codebase, commissioned through the Open Source Technology Improvement Fund and published in March, found no critical or major issues: 5 medium, 1 low, and 2 informational findings, all fixed. Scala 3.8.4, released June 5, ships those fixes. Alongside it: the REPL now ships as a separate artifact, scala3-repl, a breaking change for any project or tool that assumed it came bundled. If your tooling shells out to the Scala REPL, add the explicit dependency before you upgrade, not after something breaks in CI.


Community Voice

Scala tooling caught a real practitioner’s attention this month. Yann Moisan writes about running Claude Code against a Scala codebase, working around the lack of native Scala support by wiring up Metals, the Scala language server, as a standalone MCP server. Moisan synthesizes: “This setup is particularly useful when you want Claude Code to reason about your Scala project with full awareness of its structure, dependencies, and compiler errors rather than treating it as plain text.” On the build-tool tradeoff, he’s specific: “The tradeoff is stability vs. efficiency. With sbt, metals and sbt compile independently (double compilation).”

The pattern generalizes past Scala. Any statically typed language with a mature language server gets the same upgrade: point your agent at the compiler’s own understanding of the code, not just the text of it.


In the Know

GitHub Copilot for JetBrains deprecates its local agent harness — Microsoft consolidates on Copilot CLI as the default harness across every surface. JetBrains counters the same window with Mellum2, a self-hosted, on-premises model. One harness everywhere versus own your model and your harness: a real fork in agent-tooling philosophy.

SIP-80: Target-Typed Companion Shorthand — Less .apply ceremony in generic code.

sbt 2.0.0-RC15 — Another release candidate down. Plugin compatibility remains the last blocker before GA.


Top Resources

Repo to watch: kstreamplify

Michelin’s open-source Kafka Streams utility library: bootstrapping, error handling, serialization, Kubernetes integration, in pure Java. Shipped v1.8.0 on June 29. Pairs directly with this month’s Kafka Streams memory-leak fix if you’re standardizing on shared Streams tooling.

Article to read: “An Ex-Meta L8’s Agentic Engineering Setup”

Kun Chen, a former Meta principal engineer, runs a terminal-first setup: WezTerm, Neovim, and Tmux, Claude Code for Anthropic’s models and OpenCode for everything else, voice input through a local Whisper model, and custom tooling to manage parallel git worktrees and validate agent output before it merges. The operating principle: direct the agents like an engineering manager, stay focused on what to build and whether it works, not on typing the code yourself.

Paper to read: “Reaching Agreement Among Reasoning LLM Agents”

A formal consensus protocol for multi-agent LLM systems with provable safety and liveness guarantees, cutting latency 1.2-20x versus ad-hoc orchestration baselines while keeping answer quality within 2.5% of uncoordinated baselines. Classical consensus theory applied directly to the systems this issue’s Debate argues already need it.

What is SIGNAL?

SIGNAL is a monthly, opinionated newsletter for JVM and Scala teams who run real distributed systems. Each issue has three sections — Architecture Debate, Notes from the Trenches, Signal Over Noise — and focuses only on changes and incidents that matter in production, not on hype or vendor marketing.

Scalac builds high‑throughput systems in Scala, Java, and Rust; SIGNAL is the filter we wish we had when making our own architecture bets.


References

Get the State of

Scala 2025 report

Download now

Latest Blogposts

02.07.2026 / By 

THE SIGNAL: What matters in distributed systems | #4

AI agents pushed GitHub to 275 million commits a week, and Microsoft's own cloud couldn't keep up. Microsoft and JetBrains just placed opposite bets on who should own the harness your coding agents run in.

01.07.2026 / By 

Rust as the A2A Orchestrator: What We Learned Building a Multi-Agent System

Minimalist Scalac hero graphic with a black Rust crab at the center, connected to chat, routing, context, code, UI, and database components in an A2A multi-agent system.

What we learned building a Rust orchestration layer for a real multi-agent AI system — from A2A protocol churn and manual agent loops to Agent Skills and context design.

18.06.2026 / By 

AI Agents Are Distributed Systems. Why Scala’s Type Safety Matters More Than Prompts

Scalac blog hero image showing AI agents as a distributed system with connected service nodes.

AI agents fail at system boundaries, not in prompts. Here's where Scala's type safety helps when LLM pipelines move from prototype to production.

software product development

Need a successful project?

Estimate project