GitHub Merge Queue Checks Keep Restarting: Stop Requeue Loops During Rollback (2026 Guide)
Incident rollback PR approved, checks start, then the queue entry restarts again. And again. This is one of the most frustrating merge queue failure modes because it looks like progress while silently extending outage duration.
This guide gives a practical incident workflow to stop requeue loops without disabling protections: isolate branch churn, flaky required checks, and policy-driven invalidations, then land rollback safely.
Table of contents
1. Quick diagnosis matrix
| Observed signal | Likely class | First action |
|---|---|---|
| Single rollback PR requeues repeatedly after checks begin | Queue invalidation loop | Freeze non-incident merges briefly and rebase rollback branch once. |
| Checks fail inconsistently with no code changes | Flaky required checks | Switch required check set to stable subset for incident windows if policy allows. |
| Many PRs start checks slowly but do not restart | Runner saturation | Scale runners and deprioritize non-incident workflows. |
| Checks never start and logs stay empty | Trigger mismatch | Verify merge_group workflow triggers and job guards. |
| Queue entry becomes stale after every main branch update | Branch churn | Create short stabilization window until rollback merges. |
2. Why checks keep restarting in merge queue
Merge queue validates a generated integration snapshot, not only your PR branch. Anything that repeatedly invalidates that snapshot can force restarts:
- Base branch churn: frequent merges update protected tip before rollback checks finish.
- Flaky required checks: unstable tests produce random failures and retries that prolong queue occupancy.
- Policy mutation during incident: changing required checks or approval rules mid-incident invalidates active queue entries.
- Workflow guard drift: job conditions behave differently for
merge_groupthanpull_request. - Runner priority inversion: long non-incident jobs consume capacity, increasing restart windows.
3. 5-minute triage for requeue loops
- Confirm whether restarts affect one rollback PR or the whole queue.
- Inspect queue event timeline for "entry updated" or "base moved" churn.
- Check if required checks list changed in the last 30 minutes.
- Verify all required workflows run on
merge_groupand are not event-filtered out. - Measure retry count per required check. Flag any check retrying more than twice.
- Inspect runner depth to separate capacity delay from restart invalidation.
If requeues are isolated to rollback entries during high merge velocity, enforce a short stabilization window: pause non-critical merges for 10-15 minutes, then requeue rollback once on a fresh snapshot.
4. Queue stabilization playbook
A) Stabilize queue inputs
- Temporarily reduce merge churn on the protected branch.
- Avoid changing branch protection or required check names mid-incident.
- Rebase rollback PR exactly once before requeue.
B) Stabilize required checks
- Tag historically flaky checks as non-blocking when incident policy allows.
- Prioritize deterministic smoke/integration checks for rollback validation.
- Use workflow-level concurrency to prevent duplicate runs on stale snapshots.
C) Stabilize capacity
- Route rollback workflows to dedicated incident runner pools.
- Pause heavy nightly jobs and long integration pipelines temporarily.
- Expose queue invalidation and retry metrics in incident dashboards.
5. CLI and workflow recipes
Queue-safe rollback branch refresh:
git checkout rollback/incident-2026-02-16
git fetch origin
git rebase origin/main
git push --force-with-lease
Workflow trigger + concurrency guard for merge queue stability:
name: required-ci
on:
pull_request:
merge_group:
concurrency:
group: required-ci-${{ github.ref }}
cancel-in-progress: true
jobs:
test:
if: ${{ github.event_name == 'pull_request' || github.event_name == 'merge_group' }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: make test
Retry accounting during incident:
# Capture queue events and retries for postmortem
# (replace with your internal API/CLI workflow)
echo "rollback_pr=1234 retries=2 invalidations=1" >> incident-queue-log.txt
6. SLO thresholds and alerting
| Signal | Healthy baseline | Incident alert threshold |
|---|---|---|
| Rollback check start latency | < 3 minutes | > 7 minutes for 2 consecutive runs |
| Rollback queue-to-merge time | < 15 minutes | > 30 minutes |
| Queue invalidation rate | < 5% | > 10% during incident window |
| Required check retry count | 0-1 retries | > 2 retries on same rollback PR |
Alert on combinations, not single signals. For example, high invalidation with normal runner utilization strongly indicates queue churn rather than capacity shortage.
7. FAQ
Why do merge queue checks keep restarting during rollback incidents?
Most restart loops come from queue snapshot invalidation: branch tip moving rapidly, flaky required checks, or policy changes that force queue reevaluation before checks settle.
What is the first safe action when rollback PR checks keep requeuing?
Pause non-urgent merges for a short window, rebase rollback branch once to latest protected tip, then requeue with stable required checks and unchanged policy gates.
Are repeated restarts always a runner capacity issue?
No. Capacity issues usually delay starts across many PRs. Restart loops usually hit one rollback entry repeatedly even when runners are available.
Should we disable branch protection to break the loop?
Only if your formal emergency policy requires it and approvers accept risk explicitly. In most cases, queue stabilization resolves the loop faster and keeps auditability intact.
Which SLOs help detect restart loops early?
Track queue invalidation rate, rollback check-start latency, queue-to-merge duration, and retry count per required check. These metrics catch instability before rollbacks miss incident deadlines.