Backwards Data Snapshot VM Fixtures Stubs Agents Triage Skip Cost

Stop Replaying Bugs. You Already Watched Them Happen.

A shift from reconstruction to observation.

Contents

  1. Error Reproduction Is Backwards
  2. You Already Have the Data
  3. From Snapshot to Environment
  4. The Firecracker VM
  5. Fixture Rules, Not Database Snapshots
  6. Stubbing the Outside World
  7. The Master-Slave Agent Split
  8. The Four-Tier Triage Pipeline
  9. When Reproduction Isn't Needed
  10. The Economics
  11. What Changes

01. Error Reproduction Is Backwards

Here's the workflow most teams follow: an error fires in production, someone opens the stack trace, and the guessing begins. What were the inputs? What was the database state? Which API responses came back? You try to recreate the conditions locally, seed a database with plausible data, mock the external calls, run the code, and hope the same failure shows up. More often than not, it doesn't.

You're reconstructing a causal chain from incomplete information. The reproduction rate is low because you're working backwards from an effect and trying to guess the cause. Most teams spend more time trying to reproduce bugs than they spend actually fixing them. The fix itself is usually a few lines. The reproduction is the bottleneck.

This is the default approach because, historically, there was no alternative. You didn't have the failure state captured. You only had the aftermath. But that constraint doesn't hold anymore.

Traditional Stack Trace Guess Inputs Seed DB Mock APIs Run ??? Low confidence. High effort. Observation-based Error Package Set State Run Confirmed

02. You Already Have the Data

If your observability layer captures the IO timeline, local variables, request context, state reads, and environment metadata at failure time, you're not guessing anymore. You have a snapshot of the application's state at the exact moment it broke. The request that triggered the error, the database queries that ran and the rows they returned, the HTTP calls that went out and the responses that came back. All of it, frozen in time.

The question changes shape entirely. It's no longer "what happened?" because you already know what happened. It's "how do I put another environment into this exact state and watch it break again?" That's a different problem. The first is forensic guesswork. The second is engineering.

Most modern observability stacks already collect pieces of this data. Structured logs capture request context. Distributed traces capture call graphs. The missing piece is usually the local variable state and the full IO timeline at the throw site. Once you capture those, you have everything you need to skip the guessing phase entirely.

03. From Snapshot to Environment

Reproduction becomes a deterministic operation. Take the captured data, use it to set up an environment that matches the failure state, run the code path, confirm the failure. You're not replaying inputs and hoping for the same output. You're directly forcing the state and letting the code execute against it.

The difference is mechanical. Traditional reproduction is probabilistic: you approximate the conditions and see if the bug shows up. Observation-based reproduction is deterministic: you set the exact conditions and run the exact code path. The failure either reproduces or it doesn't. No guessing involved.

This also means reproduction attempts are repeatable. If the first attempt confirms the failure, you've got your repro case. If it doesn't, the captured data was incomplete or the root cause lies elsewhere. Either way, you know within seconds instead of hours.

04. The Firecracker VM

You need four things from the reproduction environment. Isolation, because the forced state might be destructive and you don't want it touching anything real. Speed, because waiting minutes for a VM to boot defeats the purpose. Network access, because the application makes real calls to stubbed endpoints. And disposability, because the VM should be thrown away after confirmation.

Firecracker microVMs give you all four. Firecracker is a lightweight virtual machine monitor built by AWS for Lambda and Fargate. It's open source. It's designed for starting thousands of VMs per second. Each VM gets its own kernel, its own filesystem, its own network namespace. Boot to ready in under 150ms.

The lifecycle is simple: receive an error package, spin up a Firecracker VM with the right rootfs, inject the fixture data and stub configuration, start the application, trigger the code path, observe the result, destroy the VM. The whole thing takes seconds. The compute cost is negligible.

Error Package Firecracker VM Fixture Data Application Payment API Stub Redis Stub S3 Stub Stub Services < 150ms boot | isolated kernel | disposable

05. Fixture Rules, Not Database Snapshots

You don't need a full copy of production data. The error package's IO timeline shows the exact queries that ran and what they returned. A structured fixture DSL takes those query traces and creates minimal data. If the query was SELECT * FROM users WHERE id = 47 and it returned { id: 47, role: 'admin', suspended: true }, the fixture creates exactly that row. Nothing more.

This approach is fast. Creating three or four rows in a SQLite database takes microseconds. It's deterministic because the data comes directly from the captured IO, not from a snapshot that might have drifted. And it doesn't need access to production databases, which matters for security and compliance.

The fixture DSL is declarative. It reads the IO timeline, extracts every database interaction, and generates the minimal set of tables and rows needed to satisfy those queries. If the application code runs a join across two tables, the fixture creates both tables with exactly the rows needed for that join to produce the same result as production.

06. Stubbing the Outside World

If the IO timeline shows an outgoing HTTP call to a payment API that returned a 503, the environment stubs that endpoint to return 503. If a Redis call timed out after 3 seconds, the stub simulates a 3-second timeout. If an S3 GetObject returned a specific byte payload, the stub returns that exact payload. Every external dependency is replaced with a stub that replays the exact behavior observed during the failure.

The application doesn't know the difference. From its perspective, the payment API is slow, Redis is timing out, and S3 is returning the expected data. The stubs are configured entirely from the IO timeline. No manual mock setup, no guessing at API contracts, no maintaining a separate test fixture for each service.

This is where the captured data pays for itself. Traditional reproduction requires someone to look up what the payment API returns on a 503, figure out the response body format, and write a mock. With captured IO, the stub is auto-generated. The response headers, body, status code, and latency are all recorded. The stub replays them verbatim.

07. The Master-Slave Agent Split

Two agents, asymmetric by design. The master agent handles triage: it reads the error package, classifies causality, decides whether reproduction is warranted, and orchestrates the VM lifecycle. It reasons infrequently but with high fidelity.

The slave agent operates inside the VM. It applies fixture rules, configures stubs, starts the application, triggers the failing code path, and reports the outcome. Mechanical execution, no reasoning required.

The cost structure follows the split. Intelligence at the decision layer, efficiency at the execution layer.

Master Agent Instructions Firecracker VM Slave Agent Fixtures Stubs App Result

08. The Four-Tier Triage Pipeline

Not every error warrants a VM. The triage pipeline applies four progressive filters.

Tier 1: Fingerprint deduplication. Identical stack trace and context as an existing issue. Increment, link, move on.

Tier 2: Root cause classification from captured data. The IO timeline makes causality self-evident. A connection timeout preceding the throw, a missing row after a query. No execution needed.

Tier 3: Novel error with unclear causality. The sequence of events is visible but the causal link is not. Possible race conditions, accumulated state corruption. Spin up a VM, reproduce, confirm.

Tier 4: Escalation. Reproduction failed or the failure depends on conditions outside the IO timeline: hardware timing, kernel behavior, intermittent network partitions. Route to a developer.

Most errors resolve at Tier 1 or 2. Compute is spent only where causality demands it.

10. The Economics

The master agent activates only for Tier 3+ errors. Per-call cost is bounded by the structured nature of error packages. The slave agent handles mechanical execution at marginal cost. Firecracker VMs are ephemeral -- no idle compute, no standing infrastructure.

A single prevented manual reproduction session (typically 1-2 hours of developer time) offsets thousands of automated VM spin-ups. The cost difference is two to three orders of magnitude.

11. What Changes

The debugging workflow inverts. The question shifts from "how do I reproduce this?" to "do I need to?" For the ~80% resolved from captured data alone, a developer goes straight to writing the patch. For the ~20% requiring execution, a confirmed repro case lands in the queue within seconds, full context attached.

Developer time moves from reconstruction to resolution. The work that remains is the work that matters: understanding the bug, writing the fix, verifying it, and shipping it.

Let's Connect

Socials

Location

Bangalore, India