Interactive Demo | Start your free trial →
Beaver AI Beaver DEMO
Meetings Mar 20, 2026

Postmortem — March 12 API Outage

Engineering

https://meet.google.com/pm-outage-001

Summarized

Live Transcript

No transcript segments available.

AI Summary

Overview

Root cause was a misconfigured rate limiter deployed in v3.4.2 that caused cascading timeouts under load. The incident lasted 47 minutes affecting 12% of API requests. The team identified three remediation steps and updated the deployment checklist.

Key Points

  • Root cause: rate limiter misconfiguration in v3.4.2
  • Impact: 47 minutes downtime, 12% of API requests affected
  • Detection was delayed by 8 minutes due to missing alert threshold
  • Rollback completed in 6 minutes once identified

Decisions Made

  • Add rate limiter config to deployment checklist
  • Lower alert threshold for 5xx error rate from 5% to 2%
  • Implement canary deploys for config changes

Action Items

Lower 5xx alert threshold from 5% to 2%

Jordan Lee · Due Mar 27

Add rate limiter config to deployment checklist

Jordan Lee · Due Mar 21

Roll back v3.4.2 rate limiter configuration

Marcus Park · Due Mar 18

Commitments Made

View all →

Implement canary deploys for config changes

Jordan Lee → Engineering Team · Due Apr 13, 2026

Add rate limiter to deployment checklist

Jordan Lee → Engineering Team · Due Mar 21, 2026

Started: Mar 20, 2026 02:05 Ended: Mar 20, 2026 03:06 Team: Engineering