GitHub Merge Queue Required Checks Timed Out or Cancelled: Rollback Incident Runbook (2026)

Published February 16, 2026 · 10 min read

Your rollback PR enters merge queue, required checks start, then one job times out. You requeue, and the next run gets cancelled before completion. During incidents, this timeout/cancel loop can block recovery even when the rollback itself is correct.

This guide gives a practical workflow for required checks timed out or cancelled failures in GitHub merge queue: classify failure mode fast, remove instability sources, and land rollback safely without abandoning branch protection.

⚙ Quick links: Checks Keep Restarting Guide · Flaky Required Checks Guide · Saturation vs Starvation Guide · merge_group Trigger Guide · Pending Checks Guide · Required Check Name Mismatch Guide · Stale Review Dismissal Guide · Required Checks Rollback Guide · Emergency Bypass Governance Guide · GitHub Actions CI/CD Guide · Git Commands Cheat Sheet

Table of contents

  1. Timeout vs cancelled: fast classification
  2. Root causes behind timeout/cancel loops
  3. 8-minute incident triage workflow
  4. Stabilization playbook for rollback PRs
  5. Workflow + CLI recipes
  6. Guardrails and SLO thresholds
  7. FAQ

1. Timeout vs cancelled: fast classification

Observed signal Likely class Immediate action
Job reaches max runtime and exits with timeout Execution timeout Inspect slow step, dependency latency, and runner contention.
Job is cancelled shortly after queue entry updates Queue invalidation cancellation Reduce branch churn and requeue once on fresh snapshot.
Job cancelled by concurrency rule on same ref Workflow concurrency conflict Adjust concurrency group to avoid canceling incident rollback runs.
Timeouts spike only during incidents Capacity collapse + queue pressure Prioritize incident runners and defer non-critical workflows.
Timeout followed by random pass/fail outcomes Mixed timeout + flake Apply deterministic test profile and controlled retry policy.
Rule of thumb: treat timeout as runtime pathology and cancelled as queue-control pathology until data proves otherwise.

2. Root causes behind timeout/cancel loops

High-risk pattern: raising timeout limits without fixing queue invalidation. This hides symptoms while rollback latency keeps growing.

3. 8-minute incident triage workflow

  1. Label each failed run as timeout or cancelled (never mix labels).
  2. Measure check start delay vs execution duration for rollback runs.
  3. Verify recent protected-branch updates during each cancellation timestamp.
  4. Inspect workflow concurrency groups for cancel collisions on rollback refs.
  5. Confirm required workflows include merge_group event and correct job guards.
  6. Identify top 1-2 slowest steps in timed-out jobs.
  7. Apply incident runner priority and pause non-urgent merges briefly.
  8. Requeue rollback once after controls are in place; avoid unlimited reruns.

Keep an incident ledger with one reason code per rerun (for example: timeout-network, cancel-base-churn, cancel-concurrency). This is critical for post-incident fixes.

4. Stabilization playbook for rollback PRs

A) Timeout containment

B) Cancellation containment

C) Governance containment

Target outcome: reduce rollback queue-to-merge latency while keeping change control auditable. Incident shortcuts must always have owner and expiry.

5. Workflow + CLI recipes

Queue-safe rollback refresh:

git checkout rollback/incident-2026-02-16

git fetch origin

git rebase origin/main

git push --force-with-lease

Merge queue workflow with explicit timeout and safer concurrency:

name: required-ci
on:
  pull_request:
  merge_group:

concurrency:
  group: required-ci-${{ github.event_name }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  verify:
    if: ${{ github.event_name == 'pull_request' || github.event_name == 'merge_group' }}
    runs-on: ubuntu-latest
    timeout-minutes: 20
    steps:
      - uses: actions/checkout@v4
      - name: Install deps
        timeout-minutes: 6
        run: pip install -r requirements.txt --require-hashes
      - name: Run tests
        timeout-minutes: 12
        run: pytest -q --maxfail=1

Incident retry ledger example:

# timestamp, check, outcome, reason
2026-02-16T18:08:00Z required-ci timeout timeout-dependency-latency
2026-02-16T18:14:00Z required-ci cancelled cancel-base-churn
2026-02-16T18:21:00Z required-ci pass retry-after-stabilization

6. Guardrails and SLO thresholds

Signal Healthy target Incident threshold
Rollback required-check timeout rate < 3% > 10% in 30 minutes
Rollback required-check cancellation rate < 5% > 15% in 30 minutes
Queue invalidation rate (rollback entries) < 8% > 20%
Rollback queue-to-merge time < 15 minutes > 30 minutes

Use dual triggers. Timeout rate without cancellation spike usually indicates runtime bottlenecks. Cancellation spike without timeout spike usually indicates queue churn and policy/concurrency issues.

7. FAQ

What is the difference between timed-out and cancelled required checks in merge queue?

Timed-out checks exceed allowed runtime. Cancelled checks are interrupted by queue invalidation, concurrency cancellation, manual actions, or policy changes before completion.

Should we remove required checks to land rollback faster during an incident?

Avoid permanent removal. Use a temporary incident required-check profile with explicit expiry and restore the baseline after stabilization.

Why do cancellations repeat even when rollback code is unchanged?

Merge queue evaluates integration snapshots. Base updates, queue reordering, or concurrency settings can invalidate the same rollback entry repeatedly.

What is the safest first action when rollback checks keep timing out?

Classify whether the issue is start-delay saturation, execution bottleneck, or queue cancellation. Stabilize runner priority and queue inputs before another rerun.

How do we prevent timeout and cancellation loops after incident closure?

Track timeout/cancellation metrics, enforce timeout budgets per workflow step, refine concurrency keys, and define a merge freeze protocol for severe rollback incidents.

Related rollback guides

Checks Keep Restarting Guide
Stop requeue loops caused by queue invalidation and branch churn
Flaky Required Checks Guide
Stabilize nondeterministic CI failures during rollback incidents
Saturation vs Starvation Guide
Separate runner bottlenecks from queue-control instability quickly
Required Checks Rollback Guide
Baseline queue-safe rollback workflow for protected branches
Rollback Stuck Guide
Full triage workflow when rollback PRs cannot land
Required Check Name Mismatch Guide
Fix waiting-for-status deadlocks caused by expected check names drifting from emitted contexts.
Stale Review Dismissal Guide
Unblock rollback PRs that repeatedly lose approvals after queue updates.
merge_group Trigger Guide
Fix queue checks that never start in merge_group context