r/wg21
P3941R2 — Scheduler Affinity WG21
Posted by u/concurrency_paperchaser · 6 hr. ago

Document: P3941R2
Author: Dietmar Kühl
Date: 2026-02-23
Audience: SG1, LEWG, LWG

If you've been using std::execution coroutines — specifically std::execution::task — and wondering how affine_on actually guarantees a coroutine resumes on the correct scheduler after every co_await, this paper is the answer. Or rather, it's the answer to five US National Body ballot comments (US 232–236) pointing out that the current C++26 specification doesn't deliver the guarantee it advertises.

The headline change: affine_on loses its explicit scheduler parameter. Instead of affine_on(sndr, sch) you write affine_on(sndr), and the scheduler is obtained from the receiver's environment at connect-time. That's the right time — when you're building a work graph the receiver isn't known yet, so the old signature was pushing the scheduler in too early. Small fix, correct semantics.

The more interesting changes are downstream. A new get_start_scheduler query is introduced, cleanly separating "what scheduler should new work go to" from "what scheduler started this operation" — two questions that get_scheduler currently conflates. A hard infallibility requirement is added: schedulers used with affine_on must have no set_error completions and no set_stopped completions when given an unstoppable token, and this is checked statically. And change_coroutine_scheduler — the mechanism for switching execution contexts mid-coroutine — is removed entirely.

That last one is the spicy bit. The paper argues the removal is correct: objects created before a scheduler change have their destructors run on the wrong scheduler, which is unsound. The recommended replacement is co_await on(sch, nested-task). The correctness argument holds. The ergonomic cost is real. Worth reading if you have coroutines touching multiple schedulers.

▲ 287 points (91% upvoted) · 22 comments
sorted by: best
u/STL_Moderator 1 point 6 hr. agoModerator

Reminder: paper authors occasionally read these threads. Keep it technical and constructive.

u/daily_lewg_watcher 134 points 5 hr. ago

P2300 has more follow-on papers than most languages have standard library entries. We are never going to reach inbox zero on sender/receiver.

u/undefined_behavior_enjoyer_42 78 points 4 hr. ago

The P2300 cinematic universe. P2300 is Iron Man. P3941 is, I don't know, the fourth Ant-Man. Necessary for canon. Watched by nine people.

u/just_use_rust_lol 218 points 5 hr. ago

Tokio figured this out. spawn_on plus task-local storage. Zero papers, zero NB ballot comments, ships in a library version. Asking for a friend: why does this require a wording paper.

u/embedded_for_20_years 56 points 4 hr. ago

Tokio's executor model gives you 'this task runs on a specific worker thread.' That's a weaker property than scheduler affinity in a typed execution framework — you're not statically guaranteeing which kind of execution agent you resume on, just which instance of one thread pool. Different problem space. But I understand the exhaustion.

u/just_use_rust_lol 34 points 4 hr. ago

different language, different memory model, different committee politics, I remain unmoved

u/senior_coroutine_sufferer 167 points 5 hr. ago

Great, another paper addressing NB ballot comments on a coroutine scheduling primitive that no major compiler has shipped an implementation of yet. The abstraction is being standardized before the implementation exists and nobody at the WG21 level sees a problem with that.

u/stdexec_implementer 298 points 4 hr. ago

The infallibility requirement is doing a lot of work here, and the consequence is worth spelling out.

The paper requires that schedulers used with affine_on be infallible: no set_error completions, and no set_stopped completions when given an unstoppable token. Of the four standard schedulers, three can satisfy this: inline_scheduler, task_scheduler, and run_loop::scheduler. The one that cannot is parallel_scheduler.

Here's the irony: parallel_scheduler is the scheduler you'd actually reach for when CPU affinity matters. NUMA-aware computation, pinning work to a specific core family for cache locality, game engine systems that must stay on their designated thread — all of these reach for a parallel thread pool. The schedulers that can satisfy infallibility are either doing no scheduling at all (inline_scheduler, which runs inline on whatever thread calls start) or are single-threaded event loops (run_loop::scheduler). You get a statically-correct affinity guarantee in exactly the scenarios where you either don't need it or have already achieved it by other means.

// Works: infallible, but pointless for CPU affinity
std::execution::affine_on(task_body)
    | std::execution::on(inline_scheduler{});

// The thing you actually want, excluded by design:
std::execution::affine_on(task_body)
    | std::execution::on(parallel_scheduler{});  // static error: not infallible

The correctness argument for the constraint is sound — an affine_on that might fail to reschedule provides no guarantee at all, and silent wrong-scheduler execution is worse than a compile error. But the practical consequence is that the feature is only available for the trivial cases, and production users with bounded thread pools have no specified path until someone writes the wrapper the paper mentions but doesn't define.

This is what 'ships correct but incomplete' looks like.

u/template_archaeologist 41 points 3 hr. ago

Wait — so the primary use case for scheduler affinity (keeping parallel compute on the right CPU topology) is precisely what's excluded by the infallibility requirement? I must be misreading the paper.

u/async_infra_dev 67 points 3 hr. ago

You're reading it correctly. The constraint follows from a real correctness requirement: if start() of the rescheduling operation can fail, control could return on an arbitrary execution agent and the affinity guarantee collapses entirely. I've hit this exact failure mode with an earlier stdexec build where a bounded thread pool was under heavy load — silent scheduler changes, no diagnostics, genuinely nasty bug to track down.

The paper's position is: ship a correct feature that covers the cases we can guarantee, note that fallible schedulers can be handled later with an explicit opt-out parameter. That's defensible. My concern is whether LEWG accepts that trajectory or asks for the opt-out now before C++26 closes.

u/infallible_or_bust 29 points 2 hr. ago

I'd push back on framing the exclusion as a limitation rather than a feature. A scheduler that can fail under resource pressure cannot satisfy scheduler affinity — period. 'Best-effort affinity' is a different, weaker contract, and conflating the two in a single API name would be worse than what's here.

What I'd have liked to see is an explicit best_effort_affine_on or a template parameter marking the weaker contract. But collapsing both behaviors into one name and adding a runtime fallback would mean users can't reason about which guarantee they actually have. The hard exclusion is honest. Grumpy about parallel_scheduler, fine. But the alternative — a silently weakened guarantee — would be worse.

u/async_infra_dev 44 points 2 hr. ago

A separate name for the weaker guarantee is reasonable and I don't entirely disagree. But the paper's own future directions section says:

The affine_on algorithm can be relaxed later by adding an explicit opt-out argument to both affine_on and the task's environment template parameter.

So they know the constraint is tight. They're taking the safe path now, expecting it to loosen. Fine as a strategy — unless 'later' means C++29 and parallel_scheduler users have no specified path for an entire standard cycle. These NB ballot comments are against the C++26 working draft. That's a timeline issue, not a design purity issue.

u/infallible_or_bust 19 points 1 hr. ago

Okay, that's the one thing I'll concede. The design is correct. The timeline is the problem. If the infallibility constraint ships in C++26 and the relaxation mechanism doesn't arrive until C++29, then parallel_scheduler users have three years without a specified path forward. That's a gap in the committee's work plan, not in the paper's logic. Worth tracking.

u/stdexec_implementer 54 points 3 hr. ago

The change_coroutine_scheduler removal is getting less attention than it deserves. The correctness argument is real — if you change the scheduler mid-coroutine, destructors of objects created before the change execute on the new scheduler, which is wrong. But the recommended replacement isn't equivalent for all use cases.

// Old (removed) — flat sequential code across schedulers:
co_await change_coroutine_scheduler(io_sched);
auto a = co_await operation_a();
auto b = co_await operation_b();
co_await change_coroutine_scheduler(original_sched);

// New (recommended) — structurally correct, but:
auto [a, b] = co_await on(io_sched, []() -> task<std::pair<A,B>> {
    auto a = co_await operation_a();
    auto b = co_await operation_b();
    co_return {a, b};
}());

The nested version scopes the scheduler change properly. But it forces you to restructure flat sequential code into nested lambdas, which interacts with local variable capture, error propagation, and coroutine stack depth. A flat coroutine doing a dozen operations across two or three schedulers becomes a tree of lambdas. Not wrong. But the migration path for existing code that used change_coroutine_scheduler is non-trivial, and the paper doesn't provide a guide.

u/constexpr_coroutine_throwaway 7 points 2 hr. ago

I originally read this as change_coroutine_scheduler being fixed rather than removed, which changed my whole take. Ignore what I said in the other thread.

Edit: Also misread the infallibility requirement as applying to the coroutine rather than the scheduler. This is what skimming at midnight produces. My analysis was wrong in two independent ways simultaneously.

u/template_archaeologist 89 points 4 hr. ago

affine_on is genuinely one of the worst names in the proposal. 'Affine' in type theory means a resource that can be used at most once. 'Affine' in geometry means a linear transform without the origin constraint. Here it means 'has affinity to a scheduler.' The Venn diagram of people who know what scheduler affinity means and people who will be confused by the word 'affine' is precisely the LEWG attendee list.

u/daily_lewg_watcher 34 points 3 hr. ago

The paper lists it as an open issue:

The name affine_on is not ideal, and better suggestions would be welcome.

So everyone knows. There is no replacement proposal. WG21 naming discussions are their own sub-genre of infinite regress — see also: mdspan, stop_source, every ranges:: adapter.

u/affinity_skeptic_embedded 37 points 3 hr. ago

One angle nobody's hit yet: the infallibility check is purely static. It verifies completion signatures at compile time. A scheduler that wraps a fallible thread pool but lies about its completion signatures — advertising no set_error in its type while calling set_error at runtime — passes the check and violates the contract silently.

This isn't unique to this paper; the entire sender/receiver model trusts that types accurately describe their behavior. But for affine_on specifically, the infallibility guarantee is load-bearing in a way that other sender properties aren't. If it fails you don't get a compile error or a runtime exception — you get silent execution on the wrong scheduler. In embedded contexts where every execution-context transition gets audited, I'd want a debug-mode runtime assertion path, not just a type-level check. The paper doesn't mention this failure mode.

u/definitely_not_a_build_victim 63 points 2 hr. ago

requires infallible_scheduler<std::decay_t<Sch>> — I love it when correctness requirements become template constraints that generate 47 lines of error output explaining that your scheduler does not meet the infallibility criterion because its schedule sender's completion signatures include... honestly I stopped reading at line 23. The constraint is right. The diagnostic is going to be spectacular.

u/daily_lewg_watcher 28 points 47 minutes ago

Coming back to this after thinking about it more: the get_start_scheduler addition is the quietest change in the paper but probably the most load-bearing one for future compatibility. Every scheduler-aware algorithm that needs to answer 'where did this operation originate' will depend on this query. Currently get_scheduler serves two conflicting purposes — 'post new work here' and 'this operation was started here' — and the conflation has caused problems across multiple P2300 follow-on papers.

Adding a distinct query now, before more algorithms take a dependency on the ambiguous behavior, is the right call even if it looks like a footnote in this revision. Future-Dietmar will thank current-Dietmar.