Why do cancellations happen repeatedly even when rollback code is unchanged?

Merge queue validates integration snapshots. If base branch changes, required-check sets mutate, or concurrency groups cancel in-progress jobs, queue entries can be invalidated and checks cancelled repeatedly.

How do we prevent timeout and cancellation loops after the incident?

Track timeout rate, cancellation rate, and queue invalidation rate. Add workflow-level timeout standards, queue-aware concurrency groups, and an incident merge freeze protocol with clear activation criteria.

GitHub Merge Queue Required Checks Timed Out or Cancelled: Rollback Incident Runbook (2026)

Q: What is the difference between timed-out and cancelled required checks in merge queue?

Timed-out checks hit runtime limits and often indicate job hangs, dependency slowness, or runner contention. Cancelled checks are usually interrupted by queue invalidation, newer commits, concurrency cancellation, or manual interruption.

Q: Should we remove required checks to land rollback faster during an incident?

Avoid permanent removal. Use a preapproved incident required-check profile with explicit expiry, audit notes, and owner sign-off, then restore baseline policy after stabilization.

Q: What is the safest first action when rollback checks keep timing out?

Classify timeout source quickly: runner wait, dependency latency, or test deadlock. Then reduce queue churn, prioritize incident runners, and enforce deterministic check paths before requeueing.

Published February 16, 2026 · 10 min read

Your rollback PR enters merge queue, required checks start, then one job times out. You requeue, and the next run gets cancelled before completion. During incidents, this timeout/cancel loop can block recovery even when the rollback itself is correct.

This guide gives a practical workflow for required checks timed out or cancelled failures in GitHub merge queue: classify failure mode fast, remove instability sources, and land rollback safely without abandoning branch protection.

⚙ Quick links: Checks Keep Restarting Guide · Flaky Required Checks Guide · Saturation vs Starvation Guide · merge_group Trigger Guide · Pending Checks Guide · Required Check Name Mismatch Guide · Stale Review Dismissal Guide · Required Checks Rollback Guide · Emergency Bypass Governance Guide · GitHub Actions CI/CD Guide · Git Commands Cheat Sheet

Timeout vs cancelled: fast classification
Root causes behind timeout/cancel loops
8-minute incident triage workflow
Stabilization playbook for rollback PRs
Workflow + CLI recipes
Guardrails and SLO thresholds
FAQ

1. Timeout vs cancelled: fast classification

Observed signal	Likely class	Immediate action
Job reaches max runtime and exits with timeout	Execution timeout	Inspect slow step, dependency latency, and runner contention.
Job is cancelled shortly after queue entry updates	Queue invalidation cancellation	Reduce branch churn and requeue once on fresh snapshot.
Job cancelled by concurrency rule on same ref	Workflow concurrency conflict	Adjust concurrency group to avoid canceling incident rollback runs.
Timeouts spike only during incidents	Capacity collapse + queue pressure	Prioritize incident runners and defer non-critical workflows.
Timeout followed by random pass/fail outcomes	Mixed timeout + flake	Apply deterministic test profile and controlled retry policy.

Rule of thumb: treat timeout as runtime pathology and cancelled as queue-control pathology until data proves otherwise.

2. Root causes behind timeout/cancel loops

Unbounded job runtime: long integration tests, hung network calls, or missing step-level timeout guards.
Runner pressure: rollback jobs wait too long, then hit overall timeout windows.
Concurrency misconfiguration: cancel-in-progress: true cancels active rollback checks on new queue attempts.
Snapshot churn: protected branch moves repeatedly, invalidating in-flight queue entries.
Policy churn: required check names or workflow mappings changed mid-incident.
Dependency instability: package index latency, API rate limits, or transient DNS issues inflate runtimes.

High-risk pattern: raising timeout limits without fixing queue invalidation. This hides symptoms while rollback latency keeps growing.

3. 8-minute incident triage workflow

Label each failed run as timeout or cancelled (never mix labels).
Measure check start delay vs execution duration for rollback runs.
Verify recent protected-branch updates during each cancellation timestamp.
Inspect workflow concurrency groups for cancel collisions on rollback refs.
Confirm required workflows include merge_group event and correct job guards.
Identify top 1-2 slowest steps in timed-out jobs.
Apply incident runner priority and pause non-urgent merges briefly.
Requeue rollback once after controls are in place; avoid unlimited reruns.

Keep an incident ledger with one reason code per rerun (for example: timeout-network, cancel-base-churn, cancel-concurrency). This is critical for post-incident fixes.

4. Stabilization playbook for rollback PRs

A) Timeout containment

Add explicit step-level timeouts to long-running integration tasks.
Cache dependencies and pin mirrors to reduce network variance.
Split high-variance checks out of required rollback gate when policy permits incident profile.

B) Cancellation containment

Freeze non-incident merges for a short window (10-15 minutes).
Rebase rollback branch once, then avoid additional churn until merge.
Adjust concurrency keys so rollback queue jobs are not cancelled by unrelated PR pushes.

C) Governance containment

Use a preapproved incident-required-check profile with automatic expiry.
Track who changed queue policy and when during the incident.
Open postmortem tasks before closing incident to restore strict baseline checks.

Target outcome: reduce rollback queue-to-merge latency while keeping change control auditable. Incident shortcuts must always have owner and expiry.

5. Workflow + CLI recipes

Queue-safe rollback refresh:

git checkout rollback/incident-2026-02-16

git fetch origin

git rebase origin/main

git push --force-with-lease

Merge queue workflow with explicit timeout and safer concurrency:

name: required-ci
on:
  pull_request:
  merge_group:

concurrency:
  group: required-ci-${{ github.event_name }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  verify:
    if: ${{ github.event_name == 'pull_request' || github.event_name == 'merge_group' }}
    runs-on: ubuntu-latest
    timeout-minutes: 20
    steps:
      - uses: actions/checkout@v4
      - name: Install deps
        timeout-minutes: 6
        run: pip install -r requirements.txt --require-hashes
      - name: Run tests
        timeout-minutes: 12
        run: pytest -q --maxfail=1

Incident retry ledger example:

# timestamp, check, outcome, reason
2026-02-16T18:08:00Z required-ci timeout timeout-dependency-latency
2026-02-16T18:14:00Z required-ci cancelled cancel-base-churn
2026-02-16T18:21:00Z required-ci pass retry-after-stabilization

6. Guardrails and SLO thresholds

Signal	Healthy target	Incident threshold
Rollback required-check timeout rate	< 3%	> 10% in 30 minutes
Rollback required-check cancellation rate	< 5%	> 15% in 30 minutes
Queue invalidation rate (rollback entries)	< 8%	> 20%
Rollback queue-to-merge time	< 15 minutes	> 30 minutes

Use dual triggers. Timeout rate without cancellation spike usually indicates runtime bottlenecks. Cancellation spike without timeout spike usually indicates queue churn and policy/concurrency issues.

7. FAQ

What is the difference between timed-out and cancelled required checks in merge queue?

Timed-out checks exceed allowed runtime. Cancelled checks are interrupted by queue invalidation, concurrency cancellation, manual actions, or policy changes before completion.

Should we remove required checks to land rollback faster during an incident?

Avoid permanent removal. Use a temporary incident required-check profile with explicit expiry and restore the baseline after stabilization.

Why do cancellations repeat even when rollback code is unchanged?

Merge queue evaluates integration snapshots. Base updates, queue reordering, or concurrency settings can invalidate the same rollback entry repeatedly.

What is the safest first action when rollback checks keep timing out?

Classify whether the issue is start-delay saturation, execution bottleneck, or queue cancellation. Stabilize runner priority and queue inputs before another rerun.

How do we prevent timeout and cancellation loops after incident closure?

Track timeout/cancellation metrics, enforce timeout budgets per workflow step, refine concurrency keys, and define a merge freeze protocol for severe rollback incidents.