Document: P3864R1
Authors: Guy Davidson, Jan Schultke
Date: 2026-02-22
Audience: SG6 (Numerics), SG22 (Compatibility)
Five new standard library functions — cr_add, cr_sub, cr_mul, cr_div, cr_sqrt — that guarantee IEEE 754 roundTiesToEven regardless of the current floating-point environment. They live in <cmath>, are constexpr noexcept, and only apply to types where numeric_limits<T>::is_iec559 is true. The naming follows a reservation in C’s Annex F that has been sitting unused for roughly twenty-five years.
The problem is narrow but real: if your code runs in a non-default FP rounding mode — interval arithmetic, rigorous error bounds, directed-rounding loops — you currently have to manually save the mode, compute, and restore. Every library function you call carries the same obligation or it silently corrupts the caller’s environment. cr_add(x, y) says "I want roundTiesToEven here, always, regardless of MXCSR." The caller’s rounding mode is irrelevant.
For the 99% of code that never touches fesetround, cr_add(x, y) compiles to exactly x + y. Either you need it — in which case it’s genuinely useful — or you don’t, and you’ll never notice it.
This is a companion to P3375 (reproducible floating-point results), which tackles the same design space at a higher level. P3864 is the narrower, more surgical proposal: five functions, one rounding guarantee, currently targeting SG6 and SG22. Canonical paper: https://wg21.link/p3864r1
Reminder: be civil. The paper authors sometimes read these threads.
So
cr_add(x, y)isx + ywith a longer name? I want to make sure I understand what we’re adding to the standard before I have opinions about it.Only if you’ve never called
fesetround. If your code is running in roundTowardZero (interval arithmetic, rigorous bounds),x + yrounds toward zero.cr_add(x, y)rounds to nearest even, always. That’s the entire product.Okay, noted. But then the declaration is:
constexpr. Soconstexpr float z = cr_add(1.0f/3.0f, 2.0f/3.0f);must produce a roundTiesToEven result. The paper doesn’t say how. C++ constant evaluation of floating-point has no mandated rounding mode — it’s whatever the compiler’s constant evaluator does. On x86 hosts the FPU defaults to roundTiesToEven, so this works in practice. But the paper’s guarantee collapses if you compile on a host with a non-default FP environment, or if a future implementation’s constant evaluator uses a different strategy.The guarantee is stronger at runtime (you can enforce the mode or use hardware support) than at compile time (you’re hoping the evaluator’s default matches). That asymmetry should at least be noted in the wording.
Edit: looked more into the consteval FP space — proposals like P3375 are pushing more numerical work into constant evaluation. The gap I’m describing isn’t hypothetical for long.
Edit2: apparently this is already a concern that SG6 has discussed for other proposals. Probably on their radar. Still worth explicit wording.
Nobody doing numerics uses
constexprfloats. That’s what compile-time integers are for. Non-issue.The concern is valid but the fix is simple wording: specify that constant evaluation of these functions must also produce roundTiesToEven results. Most implementations will do the right thing anyway because the host FPU defaults to roundTiesToEven and the constant evaluator uses it.
The more interesting question is whether
constexpris even useful here. The main ergonomic win is at runtime — you’re calling this precisely because the FP environment is in some non-default state. At compile time the FP environment is whatever the compiler decides, so the function’s guarantee is doing different work.Rust sidesteps this entire problem class by not exposing a mutable global FP rounding mode. You can debate whether that’s a feature or a limitation, but it does mean Rust programmers will never need to write this paper.
Rust also can’t easily interface with numerical code that depends on directed rounding. LAPACK, interval arithmetic libraries, certified numerical solvers — these exist in C and Fortran and need interop. Different trade-off, not clearly better for scientific computing.
There it is. Comment four.
Thinking through what the codegen looks like for the directed-rounding use case on x86-64.
Basic FP operations don’t have per-instruction rounding mode encoding on x86 without AVX-512 EVEX — you change MXCSR globally. So if your code is running in roundTowardZero and calls
cr_add, the implementation must: save MXCSR → set roundTiesToEven → do the add → restore MXCSR. Three memory operations per arithmetic call.That’s correct behavior. It’s also what you’d write by hand today. The ergonomics win is real. But the paper presents this as purely a library ergonomics improvement without discussing implementation cost, which will be the first question from anyone shipping performance-sensitive numerical code in SG6.
The use case isn’t "mix one correctly-rounded add into performance-critical code." It’s directed-rounding loops where you need a handful of roundTiesToEven operations per iteration — Kahan summation correction steps, certified error bound computations, the correction term in a two-sum algorithm.
Also: RISC-V has rounding mode bits in the instruction encoding, per-instruction.
cr_addon RV64GC compiles to a singlefadd.s rm=rne— zero overhead. AArch64’s FPCR is per-thread and context-switch safe. The save-MXCSR path is a quality-of-implementation issue on x86, not a fundamental constraint.The paper targets IEC 559 types generically and wisely leaves implementation strategy to QoI. Whether GCC/Clang/MSVC will do the smart thing on each target is a separate question from whether the abstraction is correct.
Fair point on RISC-V — frm encoding is per-instruction and a compiler targeting RV64GC can emit
fadd.s rm=rnedirectly. Zero-overhead abstraction on that ISA.My concern is that the paper says nothing about implementation expectations on ISAs that can’t do this, and the first question in SG6 will be "what does this compile to on x86." Answering "save-restore MXCSR" is correct, but saying it explicitly in the paper would preempt half the discussion.
Agreed on that narrow point. There’s also an argument the paper is missing:
cr_*calls are visible to the optimizer in a way thatfesetroundcalls are not. The compiler must assumefesetroundinvalidates all FP assumptions — it’s an opaque call to an external function with side effects on a global. Butcr_addhas known semantics, and a sufficiently smart compiler could batch the mode changes across a sequence of adjacentcr_*calls, doing one save and one restore for the whole block instead of one per operation.Nobody will implement this in the first five years. But it’s the right abstraction for eventual optimization, and it’s a better argument for the library approach than “ergonomics.”
The optimizer visibility argument is the one I hadn’t considered. With
fesetroundthe optimizer sees an opaque call and has to fence all FP operations around it. Withcr_addas a known intrinsic it could in principle fold adjacentcr_*calls into a single mode-switch bracket. That’s genuinely better than the hand-written alternative, not just ergonomically equivalent.Okay. I’ve talked myself into liking this more than I did twenty minutes ago. Still think the paper should add a note on x86 codegen expectations.
I have been writing
_controlfp_sandfesetroundwrappers with RAII save-restore guards in our interval arithmetic library for ten years. This paper is personally meaningful to me. Tracking every revision.C’s Annex F has reserved the
cr_prefix for correctly-rounded functions since C99. They were supposed to arrive eventually. Twenty-five years later C++ might actually ship them. The standardization timeline is a rich tapestry.CORE-MATH (https://core-math.gitlabpages.inria.fr/) has open-source C implementations of correctly-rounded math functions, benchmarked against glibc. For basic operations like addition and multiplication, correctly-rounded implementations can be within 10–20% of the speed of non-guaranteed implementations on modern hardware — often less on hardware that supports directed rounding natively.
The paper doesn’t reference it, which is a missed opportunity. SG6 will ask “what does a good implementation look like” and the answer already exists, is public, and is actively maintained.
Working on deterministic sensor fusion on bare-metal Cortex-M4. The FPSCR rounding mode field exists on M4F but touching it is expensive — pipeline drain. A compiler that could implement
cr_addinline without touching FPSCR by emitting the right instruction sequence would actually matter to us.The
is_iec559gate is where we hit a wall. Cortex-M4 FPU is a partial IEC 559 implementation — no signaling NaN handling, no denormals in the hardware unit on M4 (DAZ/FTZ always on). So we’re in the “use the software fallback” camp regardless of what the compiler tries to do.Not blocking the paper. The right long-term answer is for embedded FPU vendors to improve IEC 559 conformance, not to weaken
is_iec559. Just noting that the deployment picture on embedded is more complicated than the paper implies.[removed by moderator]
What did they say?
Something about mastering floating-point in 30 days. The usual.
SG6 reviews this. SG22 signs off on C compatibility of the
cr_naming. It goes to LEWG for API review. LWG for wording. Plenary. If we’re lucky it ships in C++29.committee gonna committee
C++29 is optimistic. C2y might reserve
cr_for something slightly different than what P3864 expects, SG22 will want a revision, LEWG will want the naming revisited, and then—See what your cr_add actually compiles to. godbolt.org — because you need to see the assembly.
CppCon 2026 — Aurora, CO — Early bird ends May 15. The conference for the C++ community.