r/wg21
P3864R1 — Correctly rounded floating-point maths functions WG21
Posted by u/numerics_nerd_404 · 5 hr. ago

Document: P3864R1

Authors: Guy Davidson, Jan Schultke

Date: 2026-02-22

Audience: SG6 (Numerics), SG22 (Compatibility)

Five new standard library functions — cr_add, cr_sub, cr_mul, cr_div, cr_sqrt — that guarantee IEEE 754 roundTiesToEven regardless of the current floating-point environment. They live in <cmath>, are constexpr noexcept, and only apply to types where numeric_limits<T>::is_iec559 is true. The naming follows a reservation in C’s Annex F that has been sitting unused for roughly twenty-five years.

The problem is narrow but real: if your code runs in a non-default FP rounding mode — interval arithmetic, rigorous error bounds, directed-rounding loops — you currently have to manually save the mode, compute, and restore. Every library function you call carries the same obligation or it silently corrupts the caller’s environment. cr_add(x, y) says "I want roundTiesToEven here, always, regardless of MXCSR." The caller’s rounding mode is irrelevant.

For the 99% of code that never touches fesetround, cr_add(x, y) compiles to exactly x + y. Either you need it — in which case it’s genuinely useful — or you don’t, and you’ll never notice it.

This is a companion to P3375 (reproducible floating-point results), which tackles the same design space at a higher level. P3864 is the narrower, more surgical proposal: five functions, one rounding guarantee, currently targeting SG6 and SG22. Canonical paper: https://wg21.link/p3864r1

▲ 47 points (81% upvoted) · 23 comments
sorted by: best
u/r_wg21_janitor 1 point 5 hr. agoModerator

Reminder: be civil. The paper authors sometimes read these threads.

u/just_an_app_dev_lol 89 points 4 hr. ago

So cr_add(x, y) is x + y with a longer name? I want to make sure I understand what we’re adding to the standard before I have opinions about it.

u/fp_mode_throwaway 134 points 4 hr. ago

Only if you’ve never called fesetround. If your code is running in roundTowardZero (interval arithmetic, rigorous bounds), x + y rounds toward zero. cr_add(x, y) rounds to nearest even, always. That’s the entire product.

u/constexpr_fp_skeptic 203 points 3 hr. ago

The parameter type must satisfy the type trait std::numeric_limits<T>::is_iec559.

Okay, noted. But then the declaration is:

constexpr iec-559-type cr_add(iec-559-type x, iec-559-type y) noexcept;

constexpr. So constexpr float z = cr_add(1.0f/3.0f, 2.0f/3.0f); must produce a roundTiesToEven result. The paper doesn’t say how. C++ constant evaluation of floating-point has no mandated rounding mode — it’s whatever the compiler’s constant evaluator does. On x86 hosts the FPU defaults to roundTiesToEven, so this works in practice. But the paper’s guarantee collapses if you compile on a host with a non-default FP environment, or if a future implementation’s constant evaluator uses a different strategy.

The guarantee is stronger at runtime (you can enforce the mode or use hardware support) than at compile time (you’re hoping the evaluator’s default matches). That asymmetry should at least be noted in the wording.

Edit: looked more into the consteval FP space — proposals like P3375 are pushing more numerical work into constant evaluation. The gap I’m describing isn’t hypothetical for long.

Edit2: apparently this is already a concern that SG6 has discussed for other proposals. Probably on their radar. Still worth explicit wording.

u/template_wizard_throwaway 12 points 3 hr. ago

Nobody doing numerics uses constexpr floats. That’s what compile-time integers are for. Non-issue.

u/actually_reads_proposals 67 points 2 hr. ago

The concern is valid but the fix is simple wording: specify that constant evaluation of these functions must also produce roundTiesToEven results. Most implementations will do the right thing anyway because the host FPU defaults to roundTiesToEven and the constant evaluator uses it.

The more interesting question is whether constexpr is even useful here. The main ergonomic win is at runtime — you’re calling this precisely because the FP environment is in some non-default state. At compile time the FP environment is whatever the compiler decides, so the function’s guarantee is doing different work.

u/not_a_rust_evangelist_i_promise 71 points 4 hr. ago

Rust sidesteps this entire problem class by not exposing a mutable global FP rounding mode. You can debate whether that’s a feature or a limitation, but it does mean Rust programmers will never need to write this paper.

u/embedded_for_20_years 44 points 3 hr. ago

Rust also can’t easily interface with numerical code that depends on directed rounding. LAPACK, interval arithmetic libraries, certified numerical solvers — these exist in C and Fortran and need interop. Different trade-off, not clearly better for scientific computing.

u/compiler_codegen_realist 58 points 3 hr. ago

Thinking through what the codegen looks like for the directed-rounding use case on x86-64.

Basic FP operations don’t have per-instruction rounding mode encoding on x86 without AVX-512 EVEX — you change MXCSR globally. So if your code is running in roundTowardZero and calls cr_add, the implementation must: save MXCSR → set roundTiesToEven → do the add → restore MXCSR. Three memory operations per arithmetic call.

That’s correct behavior. It’s also what you’d write by hand today. The ergonomics win is real. But the paper presents this as purely a library ergonomics improvement without discussing implementation cost, which will be the first question from anyone shipping performance-sensitive numerical code in SG6.

u/interval_arithmetic_enjoyer 82 points 2 hr. ago

The use case isn’t "mix one correctly-rounded add into performance-critical code." It’s directed-rounding loops where you need a handful of roundTiesToEven operations per iteration — Kahan summation correction steps, certified error bound computations, the correction term in a two-sum algorithm.

Also: RISC-V has rounding mode bits in the instruction encoding, per-instruction. cr_add on RV64GC compiles to a single fadd.s rm=rne — zero overhead. AArch64’s FPCR is per-thread and context-switch safe. The save-MXCSR path is a quality-of-implementation issue on x86, not a fundamental constraint.

The paper targets IEC 559 types generically and wisely leaves implementation strategy to QoI. Whether GCC/Clang/MSVC will do the smart thing on each target is a separate question from whether the abstraction is correct.

u/compiler_codegen_realist 49 points 2 hr. ago

Fair point on RISC-V — frm encoding is per-instruction and a compiler targeting RV64GC can emit fadd.s rm=rne directly. Zero-overhead abstraction on that ISA.

My concern is that the paper says nothing about implementation expectations on ISAs that can’t do this, and the first question in SG6 will be "what does this compile to on x86." Answering "save-restore MXCSR" is correct, but saying it explicitly in the paper would preempt half the discussion.

u/interval_arithmetic_enjoyer 41 points 1 hr. ago

Answering "save-restore MXCSR" is correct, but saying it explicitly in the paper would preempt half the discussion.

Agreed on that narrow point. There’s also an argument the paper is missing: cr_* calls are visible to the optimizer in a way that fesetround calls are not. The compiler must assume fesetround invalidates all FP assumptions — it’s an opaque call to an external function with side effects on a global. But cr_add has known semantics, and a sufficiently smart compiler could batch the mode changes across a sequence of adjacent cr_* calls, doing one save and one restore for the whole block instead of one per operation.

Nobody will implement this in the first five years. But it’s the right abstraction for eventual optimization, and it’s a better argument for the library approach than “ergonomics.”

u/compiler_codegen_realist 37 points 47 minutes ago

The optimizer visibility argument is the one I hadn’t considered. With fesetround the optimizer sees an opaque call and has to fence all FP operations around it. With cr_add as a known intrinsic it could in principle fold adjacent cr_* calls into a single mode-switch bracket. That’s genuinely better than the hand-written alternative, not just ergonomically equivalent.

Okay. I’ve talked myself into liking this more than I did twenty minutes ago. Still think the paper should add a note on x86 codegen expectations.

u/fesetround_victim 178 points 4 hr. ago

I have been writing _controlfp_s and fesetround wrappers with RAII save-restore guards in our interval arithmetic library for ten years. This paper is personally meaningful to me. Tracking every revision.

u/c_library_archaeologist 241 points 5 hr. ago

C’s Annex F has reserved the cr_ prefix for correctly-rounded functions since C99. They were supposed to arrive eventually. Twenty-five years later C++ might actually ship them. The standardization timeline is a rich tapestry.

u/core_math_project_watcher 93 points 4 hr. ago

CORE-MATH (https://core-math.gitlabpages.inria.fr/) has open-source C implementations of correctly-rounded math functions, benchmarked against glibc. For basic operations like addition and multiplication, correctly-rounded implementations can be within 10–20% of the speed of non-guaranteed implementations on modern hardware — often less on hardware that supports directed rounding natively.

The paper doesn’t reference it, which is a missed opportunity. SG6 will ask “what does a good implementation look like” and the answer already exists, is public, and is actively maintained.

u/embedded_fp_veteran 56 points 2 hr. ago

Working on deterministic sensor fusion on bare-metal Cortex-M4. The FPSCR rounding mode field exists on M4F but touching it is expensive — pipeline drain. A compiler that could implement cr_add inline without touching FPSCR by emitting the right instruction sequence would actually matter to us.

The is_iec559 gate is where we hit a wall. Cortex-M4 FPU is a partial IEC 559 implementation — no signaling NaN handling, no denormals in the hardware unit on M4 (DAZ/FTZ always on). So we’re in the “use the software fallback” camp regardless of what the compiler tries to do.

Not blocking the paper. The right long-term answer is for embedded FPU vendors to improve IEC 559 conformance, not to weaken is_iec559. Just noting that the deployment picture on embedded is more complicated than the paper implies.

u/process_cynic_supreme 143 points 4 hr. ago

SG6 reviews this. SG22 signs off on C compatibility of the cr_ naming. It goes to LEWG for API review. LWG for wording. Plenary. If we’re lucky it ships in C++29.

committee gonna committee

u/eternal_standardization_watch 72 points 3 hr. ago

C++29 is optimistic. C2y might reserve cr_ for something slightly different than what P3864 expects, SG22 will want a revision, LEWG will want the naming revisited, and then—