Deployment Strategies Visualized

There’s no universal deployment strategy let’s understand the most common deployment strategies.

The strategy you pick is one of the highest-leverage decisions you make when shipping software. Get it wrong and a routine release becomes a 2 AM incident. Get it right and your team ships confidently multiple times a day without breaking a sweat. The tradeoffs are real: downtime tolerance, infrastructure cost, rollback speed, and observability maturity all pull in different directions. Understanding what each strategy actually does to your traffic, your database connections, and your on-call rotation is the only way to make a rational choice.

Recreate (Big Bang)

This is the “turn it off and on again” of deployments. You stop all instances of the old version, then start the new one. Yes, there’s downtime. No, that’s not necessarily bad. If your users can tolerate a maintenance window, why overcomplicate things?

Step 1 of 3

All instances switch simultaneously, simple but causes downtime

What happens in practice: the orchestrator (Kubernetes, ECS, whatever you’re using) sends a termination signal to every running v1 pod simultaneously. In-flight requests get a grace period to drain, then they’re gone. The load balancer deregisters all targets. Your service is returning 503s. New v2 pods start coming up, pass health checks, get registered with the load balancer, and traffic resumes. The gap between “all v1 gone” and “first v2 healthy” is your downtime window. On a well-provisioned cluster this might be 30 seconds. On a cold-start-heavy JVM service it could be three minutes.

The failure mode is simple, and that’s actually its main advantage. If v2 is broken, it fails to pass health checks and never receives traffic. Your monitoring fires, you redeploy v1, and you’re done. There’s no mixed-version state to reason about, no database compatibility window to manage, no partial rollout to untangle. Every schema migration, every breaking API change, every config overhaul is safe to ship because you never have two versions talking to the same database at the same time.

Where it breaks down is obvious: anything that can’t tolerate downtime is out. But don’t dismiss it for internal tooling, batch processing systems, or services behind a maintenance page. The operational simplicity has real value.

Rolling Deployment

Instead of switching everything at once, you update instances one by one (or in small batches). Traffic keeps flowing to healthy instances throughout the process.

For a brief period, you’ll have both v1 and v2 running simultaneously. If your API and DB isn’t backward compatible, you might have a bad time. You can mitigate this by versioning you APIs and applying the expand and contract pattern for DB migrations.

Step 1 of 6

Instances update sequentially, zero downtime but mixed versions temporarily

The orchestrator picks one instance (or a batch, controlled by maxUnavailable and maxSurge in Kubernetes), drains its connections, terminates it, and starts a replacement running v2. Once that replacement passes its readiness probe, the load balancer routes traffic to it and the process repeats for the next instance. At peak mixed-version state, you might have half your fleet on v1 and half on v2, all serving real requests.

The readiness probe is doing a lot of work here. If v2 has a startup bug that passes the probe but produces errors under load, you’ll roll it out to your entire fleet before you notice. The deployment controller doesn’t know about your p99 latency or your error rate. It only knows “did the pod start?”. This is where rolling deployments have a hidden sharp edge: they feel safe, but they have no built-in circuit breaker based on application-level signals.

The mixed-version window is the other thing that bites teams. If you added a new database column in this release, v2 will write to it and v1 will ignore it. That’s fine. But if you removed a column, v1 is about to throw errors trying to read something that’s gone. The expand-and-contract pattern (add the column, deploy, migrate data, deploy again to remove the old column) exists precisely for this reason. Rolling deployments make that discipline non-optional.

Cost-wise, rolling is the cheapest of the zero-downtime strategies. You’re running roughly the same number of instances throughout, just swapping them one at a time.

Blue-Green Deployment

This strategy keeps two identical environments: Blue (what’s currently live) and Green (the new version waiting in the wings). Once you’ve validated Green, you flip the switch and all traffic moves over instantly.

The rollback is straightforward: Just point traffic back to Blue. Done. The downside is you’re paying for two environments.

Blue

Green

Traffic

Step 1 of 2

Traffic switches between environments, instant rollback capability

The traffic switch itself is typically a DNS record update or a load balancer target group swap. DNS carries TTL risk, so most teams prefer the load balancer approach: you update the listener rule and within seconds all new connections go to Green. Existing connections to Blue continue until they close naturally. That’s the part people forget: the cutover isn’t truly instantaneous for persistent connections. WebSocket sessions, long-polling clients, or gRPC streams that opened against Blue will stay on Blue until they reconnect. Plan for a brief period where both environments are handling real traffic.

The rollback story is genuinely as good as advertised. You flip the load balancer back to Blue and you’re done. No pod restarts, no rolling back individual instances. The entire state of your previous deployment is sitting there warm, ready to serve.

The failure mode to watch for is database state. If v2 ran migrations before you cut over, rolling back to v1 means v1 is now talking to a migrated schema. If those migrations were additive, v1 handles it fine. If you dropped a column or changed a type, v1 is broken. Blue-green gives you instant application rollback, but it doesn’t give you database rollback. That asymmetry has burned a lot of teams.

The cost is the obvious downside: you’re maintaining two full production environments. For large-scale services that’s a meaningful line item. Some teams manage this by spinning up Green on-demand before a release and tearing it down after Blue is decommissioned, though that adds orchestration complexity.

Canary Deployment

Named after the canaries miners used to detect toxic gases, this strategy sends a small percentage of traffic to the new version first. If those users don’t experience issues, you gradually increase the percentage.

You need solid metrics to know if that 5% of traffic is happy or not. Error rates, latency percentiles, business metrics, pick your signals and watch them closely.

100%

Step 1 of 6

Traffic gradually shifts from v1 to v2 based on metrics

Traffic splitting is typically done at the load balancer or service mesh layer. In Kubernetes you might use two Deployments (v1 and v2) with replica counts that approximate your desired split: 19 replicas on v1 and 1 on v2 gives you roughly 5% canary traffic. Istio or Linkerd give you weighted routing that’s independent of replica count, which is cleaner. Either way, the canary instances are running in production, hitting production databases, talking to production downstream services.

When it works, it’s extremely effective. You’ve limited your blast radius. A bad release that breaks 5% of requests is painful but survivable. You catch it, roll back the canary (scale v2 to zero), and only a small fraction of your users saw the error. Compare that to a rolling deployment that’s already at 80% v2 when you notice the problem.

When it fails, it’s because teams treat the canary as a checkbox rather than a real signal. “5% of traffic looked fine, let’s ship.” Fine at 5% with a small sample of requests doesn’t always mean fine at 100% with edge cases you hadn’t seen yet. A bug that only surfaces under specific query patterns might not show up until you hit 30% traffic. The canary delays the blast radius, it doesn’t eliminate it.

The other hidden cost is operational: you’re managing two versions in production simultaneously, which means your logging, tracing, and dashboards need to distinguish between them. If you can’t filter your error rate by deployment version, you can’t actually use the canary signal meaningfully.

Progressive Rollout

Think of this as canary deployments with autopilot. Instead of manually deciding when to increase traffic, you define gates: “if error rate stays below 1% for 10 minutes, proceed to the next stage.”

✓

10%

✓

50%

✓

100%

✓

Step 1 of 5

Automated gates control progression based on error rates, latency, etc.

Each gate in a progressive rollout is an assertion against your observability stack. Tools like Argo Rollouts or Flagger query your metrics backend (Prometheus, Datadog, whatever) at each stage transition. If the canary’s error rate or latency breach the threshold, the controller automatically rolls back: traffic shifts back to the stable version, the canary deployment is scaled down, and the on-call gets paged. No human needs to be watching the dashboard at 3 AM.

The sequence looks like this: deploy to 1% of traffic and wait for the soak period. Metrics pass? Advance to 10%, soak again. And so on through 25%, 50%, 100%. Any gate failure aborts the whole thing. The gate metrics and soak durations are configurable, and tuning them is where most of the craft lies. Too tight and you get false positives blocking valid releases. Too loose and the gate is theater.

The setup cost is real. You need reliable metrics, and they need to be meaningful. If your error rate metric includes noise from unrelated services or your p99 latency bounces wildly on a good day, the automated gates will either miss real problems or block deployments unnecessarily. Progressive rollouts raise the floor on your observability maturity. Teams that have never had a clean error-rate-by-version graph in their dashboards will struggle.

When it works well, it’s genuinely freeing. Your deployment pipeline becomes a background process. Engineers ship, the system validates, and rollback (if needed) happens faster than any human could react.

Choosing Your Strategy

The right deployment strategy depends entirely on your application’s needs. There’s no one-size-fits-all answer, and you’re not locked into a single approach, these strategies can be mixed and matched.

Consider what matters most for your application:

Can you tolerate downtime? Recreate deployments are simple and simplify version conflicts.
Need zero downtime? Rolling, blue-green, canary, and progressive all keep your service available.
How fast must you rollback? Blue-green offers instant rollbacks; rolling deployments take longer.
Can you afford extra infrastructure? Blue-green doubles your resource cost; rolling uses about the same capacity.
How mature is your monitoring? Canary and progressive require solid observability to be effective.
What’s your risk tolerance? Canary and progressive let you test with a subset of users first.

You can also combine strategies. For example, use rolling deployments for routine updates, but switch to canary for major changes. Or use blue-green for your API while doing rolling updates on your workers.

A pattern I’ve seen work well in practice: stateless application services use rolling deployments for day-to-day feature work, where the operational cost is low and the risk is manageable. But when a release includes a database schema change, the database migration layer gets its own blue-green treatment. You spin up the new schema alongside the old, run the expand phase (adding new columns, not dropping old ones), validate that both v1 and v2 application code works against the new schema, then cut over the application with a rolling deployment. Once you’re confident in the new state, you do the contract phase (dropping the old columns) in a follow-up release. The database and application each get the strategy that matches their rollback requirements, rather than forcing everything through a single approach.

# Deployment Strategies Visualized