Replies: 14 comments 8 replies
-
|
Hey lelegard, You are definitely not alone in seeing this, and your data perfectly illustrates a massive headache that a lot of open-source maintainers are facing right now. That 4+ hour delay isn't a problem with your code, it's an infrastructure bottleneck. Even though you wisely avoided the exact 00:00 UTC mark, scheduling at 00:40 UTC puts your workflow right in the path of the global peak traffic wave. Thousands of developers schedule builds in that first hour of the UTC day. Because GitHub's cron queue runs on shared global infrastructure, a massive backlog at midnight causes a compounding ripple effect that pushes later jobs further and further back. As GitHub has grown over the last year, that queue has simply become more congested. Since you need a reliable nightly build, here are two ways to fix this:
It's awesome that you kept such a detailed log of your build times. It really highlights how platform scale affects open-source infrastructure. Good luck tracking down a quieter slot for the build! |
Beta Was this translation helpful? Give feedback.
-
|
Your data suggests something more interesting than normal scheduler jitter. The trend appears progressively increasing rather than randomly distributed, which makes me hesitant to attribute it solely to expected queue latency. A few observations:
I would separate the problem into two distinct phases: 1. Scheduler latency 2. Runner acquisition latency Right now those two are blended together. I would inspect:
If workflow creation itself is delayed by several hours, that points toward scheduling backlog. If workflow creation happens close to I would also run a controlled experiment: on:
workflow_dispatch:
schedule:
- cron: '40 0 * * *'Manually trigger the identical workflow near the same time window and compare queue behavior. Another useful signal would be moving the scheduled time to a deliberately unusual slot (for example The most interesting aspect here is not the existence of delay — scheduled workflows have always had some elasticity — but the apparent monotonic increase over time. That trend suggests either changing infrastructure characteristics or changing load patterns rather than ordinary scheduling variance. |
Beta Was this translation helpful? Give feedback.
-
|
We're seeing issues where our workflows that run every 15 minutes are only running every 90 minutes, starting today. Seems like something changed around midnight UTC today where they started getting slower and slower (30 minutes to start, now up to 90 minutes). Our cron trigger: |
Beta Was this translation helpful? Give feedback.
-
|
Thanks @hzijad, @kailashv2, @johan-lindqvist for your interest. My data were extracted by a Python script which uses the GitHub API to collect information on all runs of that specific workflow. I posted only one line per month but I got a log of all executions, numbered 1939 to 2369. Data are retained down to April 2025. The oldest run is numbered 1939, meaning that information from the previous 1938 runs of that workflow were erased (I can understand that). So, we have a view over one year only. If my memory serves me right, initial runs were reasonably aligned on the scheduled time, years ago. I will try to collect more information and move the schedule to some exotic day-time. It is weird that people target "night time" UTC instead of "night time" in their own time zone. Being located in Europe, UTC is almost my time and it makes sense to use round midnight UTC. But I don't understand why American or Asian developers schedule their jobs in the middle of their working days. |
Beta Was this translation helpful? Give feedback.
-
How would you get the first two from the GitHub API? I can only get "created" and "started" timestamps and they are always identical. I don't know if "created" means "enqueued". Concerning the planned scheduled date, I can only assume what's in the yml file. I have updated the nightly build schedule to 14:37 UTC (no so nightly now). We'll see if it affects the delay. I cannot update that too often because each push results in 2 hours of CI/CD execution. |
Beta Was this translation helpful? Give feedback.
-
|
Hey lelegard, You're completely right, and you aren't missing anything in your Python script, the GitHub API fundamentally collapses this telemetry, making it impossible to see the "hidden" queue time. To answer your question directly: you cannot pull a separate "scheduled execution" or "queued" timestamp from the API. For scheduled crons, GitHub doesn't create the workflow object until it is actually ready to execute it. The created_at timestamp is generated the exact millisecond the workflow is instantiated in their database, and because a runner is assigned immediately after, created_at and started_at will always look identical. The hours of drift you are seeing happen before the API even knows the run exists. GitHub’s internal cron engine is holding onto your trigger, delayed by the global backlog, before it finally spawns the workflow object. Your point about developers targeting midnight UTC is spot on, and it explains why the drift is compounding. Most people don't map crons to their local time zones; they just copy-paste boilerplate templates from documentation, which almost always default to 0 0 * * *. On top of that, enterprise tools and automated dependency bots (like Dependabot) are hardcoded to trigger at the turn of the UTC day. You've been fighting a massive wave of default settings. Moving your schedule to 14:37 UTC is the perfect fix. It gets you completely out of that midnight bottleneck and avoids round numbers. Don't worry about pushing any more changes just let this one ride for a few days, and you should see those Created timestamps finally align perfectly with your scheduled time! |
Beta Was this translation helpful? Give feedback.
-
|
For the API question in your follow-up: I do not think you are missing a separate timestamp. For scheduled workflows, the public workflow run object is created only when GitHub actually materializes the scheduled run. That means So, from the public API, you generally have:
The hidden part is the time between the cron expression's expected fire time and GitHub creating the workflow run. That is the drift you are measuring, but GitHub does not expose it as a first-class For measurement, I would compute it explicitly in your script: If A useful in-workflow breadcrumb is to print both the actual UTC time and the intended slot at the top of the first job: - name: Record schedule timing
run: |
date -u +'%Y-%m-%dT%H:%M:%SZ'
echo "Expected cron slot: 00:40 UTC"That will not reveal GitHub's internal queued time, but it makes future logs easier to audit. Moving to 14:37 UTC is a good experiment. If the drift collapses, it strongly suggests cron backlog before run creation. If it stays large, then I would start suspecting a broader scheduler issue and open a support ticket with the run IDs and your computed drift table. |
Beta Was this translation helpful? Give feedback.
-
|
The additional data you collected is actually very informative. The interesting signal here is that That shifts suspicion toward the period before the workflow run object even exists. A few observations:
One thing I would be careful about though: I would avoid assuming that Your experiment with If I were measuring this, I would compute: expected_time = scheduled_cron_time
scheduler_drift = created_at - expected_time
runner_delay = started_at - created_atFrom the data shown so far: If moving to If it remains in the multi-hour range even outside common scheduling windows, I would start suspecting either:
The nice thing is that your change only requires patience now. You already set up what is essentially a controlled experiment. |
Beta Was this translation helpful? Give feedback.
-
|
The first run after moving the schedule at 14:37 started at ... 17:46:12. So, the midnight traffic jam was not the culprit. Do you use scheduled jobs? Do you experience such delays? Is it possible that GitHub prioritises scheduled jobs by average duration of previous runs? This job runs for 1h 50mn when there is something to build and only a few seconds if the project was not modified in the last 24 hours. |
Beta Was this translation helpful? Give feedback.
-
|
Some tests from other times of the day:
The time of the day definitely has an influence, but the delay is still important at all times. I have opened a support issue here: actions/runner#4468 |
Beta Was this translation helpful? Give feedback.
-
|
Hey y'all, sorry I missed you raising this. Keep poking me here, I am here and reading/listening. We are working on this as part of the wider work <3 |
Beta Was this translation helpful? Give feedback.
-
|
@hzijad had the right idea with external triggers, and @johan-lindqvist's concern about a vicious cycle is fair, but I don't think it applies here lol.
Use cron-job.org to POST to the dispatch endpoint on whatever schedule you need. Headers: Body: A nicely grained PAT with Actions read/write on the repo is all you need. If you want it more auditable, a GitHub App with actions: write is cleaner, but for a personal project the PAT is fine in my opinion. On the account type question: scheduled jobs don't get prioritized by plan. It's queue position. The only way around it is dedicated runners, which enterprise orgs can set up, but that's not an option for most people here. Hope this answers the question! |
Beta Was this translation helpful? Give feedback.
-
|
What you’re seeing is most likely scheduler queueing rather than runner queueing. A common misconception is that the cron trigger fires exactly on schedule and then waits for a runner. In reality, GitHub’s scheduled workflows are best-effort and may be delayed before the workflow run is even created. A few observations from your data:
This suggests the bottleneck is probably in GitHub’s scheduling infrastructure rather than runner availability. Some possibilities:
One thing worth checking is the distinction between:
Using the Actions API, you may be able to determine whether the delay occurs before the workflow run is created or after the run enters the queue. If the run itself is not created until several hours after the scheduled cron time, that would strongly indicate scheduler-side backlog. If the run is created on time but the first job starts hours later, that would point more toward runner capacity. Given the steady increase from roughly 2 hours in mid-2025 to over 4 hours recently, my guess would be platform-level scheduling backlog rather than anything specific to your workflow definition. I’d be interested to know whether other maintainers of large open-source projects are seeing similar trends in scheduled workflow latency during the same period. |
Beta Was this translation helpful? Give feedback.
-
Maybe, maybe not. It would make sense to assign some slightly lower priority to recurrent jobs which regularly use a lot of resources. My job has a consistent delay. Other users report consistent delays as well, but not as long as mine. So, there must be something attached to each job. Since mine runs during 1h40 most of the time and just a few seconds the rest of the time, that could be a possibility since 1h40 is maybe a lot for a free public repo.
My repo has pushes almost every day. I wouldn't call this low activity.
No, we can't, that was explained in previous posts of that thread. The only dates you can get from the API are "created" and "started" which are exactly the same all the time. If they referred to distinct events, sometimes there would be a difference of at least a few seconds. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
🏷️ Discussion Type
Question
💬 Feature/Topic Area
Schedule & Cron Jobs
Discussion Details
Hi,
I have observed a regular drift in scheduled workflows on GitHub runners. I know that the scheduled time is a minimal time and the workflow starts some time after the scheduled time.
However, in the past few months, there is an increasing drift with an average delay of more than 4 hours now. In 2025, the average delay was 1h 40mn, already quite a lot, but stable. Since then, the delay is regularly increasing, reaching 4h 30mn now.
I have a "nightly build" workflow which is scheduled at 00:40 UTC (to avoid peak traffic at 00:00):
Here are some actual starting dates and times, UTC, in the last year:
2026-05-25 05:08:22
2026-05-01 04:37:13
2026-04-01 03:52:31
2026-03-01 03:32:43
2026-02-01 03:39:12
2026-01-01 02:57:05
2025-12-01 02:57:02
2025-11-01 02:22:19
2025-10-01 02:21:29
2025-09-01 02:31:11
2025-08-01 03:00:41
2025-07-01 02:39:34
2025-06-01 02:50:41
2025-05-01 02:27:41
The initial job of the workflow runs on a
ubuntu-latestGitHub runner. Then, jobs are dispatched onubuntu-latestandwindows-latest. The workflow is https://github.com/tsduck/tsduck/actions/workflows/nightly-build.ymlAny explanation? I don't complain because this is free CI for an open source project. However, the instability and continuously increasing delay is worrying.
Beta Was this translation helpful? Give feedback.
All reactions