GitHub Merge Queue Emergency Bypass Governance: Approval Criteria for Rollback Incidents (2026)
During severe incidents, teams sometimes consider bypassing merge queue or temporarily relaxing required checks to land a rollback fast. The dangerous part is not only the bypass itself, but the lack of governance around who can approve, under which criteria, for how long, and with what audit trail.
This runbook gives a practical governance model for emergency bypass decisions in GitHub protected-branch repos: criteria, dual control approvals, expiry policy, and post-incident restoration.
Table of contents
1. What emergency bypass means
Emergency bypass does not mean "disable protections and merge anything." In this guide, bypass means a temporary, bounded policy delta used only to land a validated rollback when normal queue path cannot meet incident recovery SLA.
| Option | Typical use | Risk level |
|---|---|---|
| Retry within baseline policy | Transient queue instability | Low |
| Temporary incident check profile | One required check is non-deterministic or infra-blocked | Medium |
| Manual bypass with dual approval | Active production impact with rollback SLA breach risk | High |
| Permanent policy relaxation | Not valid for emergency handling | Unacceptable |
2. 90-second approval matrix
| Condition | Decision | Who must approve |
|---|---|---|
| No active customer impact, rollback ETA under SLA | No bypass; continue queue-safe remediation | Incident commander only |
| Active customer impact, rollback blocked by CI pathology | Temporary incident check profile | Incident commander + service owner |
| SEV-1 and projected SLA breach in next 10 minutes | Manual bypass with compensating controls | Incident commander + duty engineer manager |
| Evidence missing (rollback diff not validated) | Reject bypass request | Any approver can veto |
3. Approval criteria before bypass
- Incident severity: active customer impact or imminent outage, documented in incident ticket.
- Rollback confidence: rollback target commit and scope are verified by owner team.
- Queue failure evidence: reason baseline path cannot land rollback in time (timeouts, cancellation loops, runner saturation).
- Compensating controls: post-merge smoke tests, canary scope, extra monitoring, and on-call hold.
- Expiry window: exact timestamp when temporary policy reverts automatically or by owner.
- Restoration owner: named person accountable for restoring normal branch protections.
If any of these six criteria are missing, hold the bypass and continue queue stabilization first.
4. Approver roles and dual control
- Requester: usually the rollback PR owner or incident responder who proposes bypass with evidence.
- Approver A: incident commander validating urgency and impact.
- Approver B: service owner (or delegate) validating rollback correctness and risk controls.
- Recorder: any responder assigned to update incident timeline and PR comments in real time.
5. Queue-safe execution workflow
- Post a bypass proposal comment in rollback PR with criteria checklist and expiry timestamp.
- Collect dual approval in writing (PR + incident channel).
- Apply minimal temporary policy delta (never broad disable-all).
- Merge rollback, then run pre-defined smoke test pack within 5 minutes.
- Restore baseline protections immediately after stabilization.
- Create follow-up tasks for root-cause fixes (CI stability, runner capacity, policy drift).
PR bypass note template:
Emergency bypass request
Incident: INC-2026-02-16-142
Severity: SEV-1
Rollback PR: #8421
Reason baseline queue path failed: required-ci timeout + cancellation loop
Compensating controls: smoke suite + canary 5% + on-call watch
Expiry: 2026-02-16T21:00:00Z
Approver A (IC): @alice
Approver B (Service Owner): @bob
Restoration owner: @carol
Post-bypass restoration checklist:
# Verify baseline policy restored
- merge queue required checks list restored
- temporary override removed
- incident timeline updated with exact timestamps
- follow-up tickets created and linked
6. Audit template and expiry policy
| Field | Required value |
|---|---|
| Incident ID | Immutable incident reference |
| Bypass scope | Exactly which check or rule changed |
| Approvers | Two distinct identities with roles |
| Start + expiry | UTC timestamps with max window (for example 30-60 min) |
| Restoration proof | Link to commit, policy screen capture, or API diff |
Recommended expiry policy: auto-expire within 60 minutes unless reapproved with fresh evidence.
7. Guardrail metrics
| Metric | Target | Alert threshold |
|---|---|---|
| Emergency bypass count per month | 0-2 | > 4 |
| Bypass entries missing expiry | 0% | > 0% |
| Bypass without dual approval | 0% | > 0% |
| Time to restore baseline policy | < 30 minutes | > 60 minutes |
If bypass count keeps rising, the system problem is usually CI determinism or runner capacity, not governance paperwork. Fix root causes and bypass volume drops naturally.
8. FAQ
When is emergency bypass acceptable in GitHub merge queue?
Only when customer impact is active or imminent, rollback evidence is clear, and compensating controls plus expiry are documented before execution.
Who should approve bypass during incidents?
Use dual control: incident commander and service owner (or approved delegate). Requester should not be the sole approver.
Should we permanently disable required checks to speed rollback?
No. Keep bypass temporary and restore baseline protections as part of incident closure.
What evidence is needed before approval?
Incident severity, rollback scope, known-good target, queue-failure reason, and post-merge safety controls.
How do we keep emergency bypass auditable?
Record approvers, timestamps, policy delta, expiry, and restoration evidence in both PR discussion and incident timeline.