r/wg21
P3876R1 - Extending support to more character types WG21
Posted by u/charconv_gap_watcher · 5 hr. ago

Document: P3876R1

Authors: Jan Schultke, Peter Bindels

Date: 2026-02-22

Audience: SG16 (Unicode and text)

std::to_chars and std::from_chars have existed since C++17, and they are genuinely great — fast, locale-independent, no allocation, no exceptions. There is exactly one problem: they only accept char*. Want to serialize a number into a std::u8string? Want to parse one from a wchar_t buffer on Windows? You either reinterpret_cast your way toward UB or bounce through a temporary char buffer and copy. Neither is great.

P3876R1 proposes adding function template overloads for all five character types: char (the existing overloads, unchanged), wchar_t, char8_t, char16_t, and char32_t. The key insight that makes this clean: every character to_chars ever emits — digits 0–9, letters a–z, the minus sign, a decimal point — lives in the Basic Latin / ASCII block (U+0030–U+007A). Unicode encodings guarantee that code units for non-ASCII code points are always ≥ 0x80, so from_chars can safely ignore them without any encoding-specific logic whatsoever.

The design has one awkward corner: to_chars_result cannot be turned into a class template without breaking ABI and aggregate-initialization syntax for all existing code. The paper's solution is to add four new named result structs (u8to_chars_result, u16to_chars_result, u32to_chars_result, wto_chars_result) plus alias templates (to_chars_result_t<T>) so generic code stays clean. The alias maps char to the existing to_chars_result — no type identity breakage for old code.

SG16 has had an open issue on this gap since issue #38. This revision also coordinates with P3652R1 (constexpr floating-point charconv) and supersedes LWG4522.

▲ 54 points (92% upvoted) · 27 comments
sorted by: best
u/r_cpp_janitor 1 point 5 hr. agoModerator

Reminder: paper authors sometimes read these threads. Keep discussion technical and constructive.

u/windows_wchar_survivor 112 points 4 hr. ago

If you have ever written Windows code interfacing with LPCWSTR APIs and tried to serialize a number into that buffer without an extra allocation — you understand exactly why this paper exists.

// Before: the copy dance
wchar_t buf[32];
char tmp[32];
auto [end, ec] = std::to_chars(tmp, tmp + 32, value);
std::copy(tmp, end, buf); // fine-ish but annoying

// After: just works
wchar_t buf[32];
auto [end, ec] = std::to_chars(buf, buf + 32, value);

The reinterpret_cast<wchar_t*>(char_buf) version is what I see in codebases that gave up. This paper makes it unnecessary.

u/not_a_com_programmer 41 points 4 hr. ago

That is every Windows API ever. You are not alone.

u/overload_resolution_victim 97 points 4 hr. ago

A non-template approach would yield an absurd overload set of 110 overloads

I love that the argument for using templates here is not "templates are elegant" but "the alternative is literally 110 function declarations that grow to 300 with <stdfloat>." Sometimes the committee gets there.

u/char8t_skeptic_throwaway 9 points 3 hr. ago

Wait, why do we have char8_t again? I thought char with UTF-8 source was already fine.

u/sg16_actual_follower 34 points 3 hr. ago

char does not guarantee UTF-8. char8_t does — it is explicitly the type for UTF-8 encoded data. Without it you cannot distinguish a byte array from a UTF-8 string at the type level. P2238 covers the full motivation if you want the deep dive.

u/char8t_skeptic_throwaway -3 points 3 hr. ago

skill issue

u/api_surface_auditor 178 points 3 hr. ago

The result-type design in section 3.5 is worth slowing down on. The paper adds eight new named types — u8to_chars_result, u16to_chars_result, u32to_chars_result, wto_chars_result, and four matching from_chars variants. That looks like an API surface explosion on first read.

The steelman: you cannot make to_chars_result a class template without breaking existing code. The C++17 API is out in the wild with aggregate initialization — to_chars_result{ptr, ec} — and name-mangled into compiled object files. Making it an alias for basic_to_chars_result<char> would change the type identity in every TU that has already compiled a to_chars call. That is an ABI break with no opt-out.

The insight that makes the paper's approach actually work: to_chars_result_t<char> maps to the existing to_chars_result. Not a new type. Not a structural alias for a new type. The actual same concrete struct. This means:

  1. Generic code writing auto r = std::to_chars<CharT>(buf, buf + N, value) and storing the result as to_chars_result_t<CharT> compiles cleanly for all five character types.
  2. Existing code that names to_chars_result directly sees no change whatsoever.
  3. You cannot accidentally mix result types from different character widths — the type system catches it.

The alias template is doing exactly what a type parameter on the original struct would have done if the C++17 designers had thought ahead. It just lives in a different place in the standard.

The verdict: this is the right call given the constraints. The lesson for future API designers: template your result types from day one, even if you ship only one instantiation. The standard is paying a naming tax here because charconv did not.

A non-template approach would yield an absurd overload set of 110 overloads

This line alone in the design section probably saved us from a very bad day at some future mailing.

u/charconv_gap_watcher 29 points 2 hr. ago

Good breakdown. The alias identity question was the first thing I checked when I saw the new result types — if to_chars_result_t<char> were NOT the same type as the existing to_chars_result, you would get silent breakage in any generic helper that already stores results in a concrete to_chars_result variable. The paper threads this correctly.

u/template_everything_2019 22 points 2 hr. ago

I still think they could have done:

template<class T>
struct basic_to_chars_result { T* ptr; std::errc ec; ... };

using to_chars_result = basic_to_chars_result<char>;

and made the template primary. Why not go that route instead of adding eight new named types?

u/abi_constraint_enjoyer 56 points 2 hr. ago

Section 3.5.1 addresses this directly. The problem with the base-class approach — or the using to_chars_result = basic_to_chars_result<char> approach — is that it changes the type identity of to_chars_result. They would be structurally identical but different types. std::is_same_v<to_chars_result, basic_to_chars_result<char>> fails. Any TU that compiled against the original struct and a TU that compiled against the new alias have different mangled names for the return type of to_chars. That is an ODR violation waiting to happen at link time, and a hard ABI break for any precompiled library.

The paper is explicit: even the base-class inheritance route "technically breaks the API" in subtle ways — aggregate status, reflection behavior, type identity in overload resolution. You are not adding a supertype; you are changing what to_chars_result is.

u/template_everything_2019 38 points 1 hr. ago

Wait — if using to_chars_result = basic_to_chars_result<char> makes them different types, that means to_chars_result on one side and basic_to_chars_result<char> on the other never unify in template argument deduction either. So you cannot even write a single generic function that accepts either. That is worse than I thought.

u/abi_constraint_enjoyer 44 points 1 hr. ago

Exactly. Which is why the paper's solution is the only option that does not break anything: keep the five existing struct definitions untouched, add four new named structs for the other character types, and provide to_chars_result_t<T> as an alias template that maps each T to the correct struct. Every type is a distinct, well-named concrete struct. Generic code uses the alias and works. Old code uses the concrete name and works. Nobody's ODR gets violated.

It is more API surface than you would want. It is also the only correct answer given the constraints.

u/[deleted] -19 points 2 hr. ago

[removed by moderator]

u/thread_archaeologist 7 points 2 hr. ago

what did they say

u/r_cpp_janitor 1 point 2 hr. agoModerator

Rule 3.

u/thread_archaeologist 23 points 2 hr. ago

Rust-related, I assume.

u/laughs_in_compile_times 17 points 2 hr. ago

All characters produced and consumed by to_chars and from_chars fall in the Basic Latin (ASCII) block.

Solid paper. My one concern: are the function template instantiations going to meaningfully add to compile times in large TUs that include <charconv>? Five character types across 22+ arithmetic types is a lot of potential instantiation surface.

u/constexpr_charconv_watcher 34 points 1 hr. ago

Templates are only instantiated on use. If your TU never calls to_chars<char8_t>(...), the compiler sees the template definition and moves on. The instantiation surface only materializes if you actually exercise the new overloads. The header overhead from adding template declarations is negligible.

u/laughs_in_compile_times 14 points 1 hr. ago

I am aware of how templates work, thank you.

u/lwg_issue_archaeologist 41 points 2 hr. ago

Worth reading section 3.4 if you track std::format. The paper explicitly supersedes LWG4522 — which Schultke also filed — about whether std::format(wformat_string<...>) transcodes through char or calls the wchar_t overload of to_chars directly. Currently the standard wording is arguably wrong (LWG4522 is the proposed fix). Once to_chars<wchar_t> exists, you call it directly and the transcoding question goes away.

The coordination risk: if LWG adopts 4522's wording before this paper clears SG16 → LEWG → LWG, you have two conflicting wording changes targeting the same paragraph. Schultke calls this out directly in the paper, but it depends on sequencing that the committee does not always get right. Something to watch.

u/sg16_actual_follower 18 points 1 hr. ago

SG16 generally moves faster on these coordination issues than the wider committee does. The realistic risk is if LWG4522 ships in C++26 and then this paper also targets C++26 — but given R1 only just appeared in the 2026-02 mailing, C++29 seems more likely. Plenty of time to sort the sequencing.

u/mainframe_dev_yes_really 33 points 1 hr. ago

Looking at section 5.2 — as someone who has actually shipped C++ on z/OS with EBCDIC: yes, the _Encode<charT>(char32_t code_point) approach from the libc++ review is the right call. On EBCDIC systems, wchar_t does not encode the Basic Latin block at the same code points as Unicode. You cannot just write static_cast<wchar_t>('+') and expect it to be U+002B PLUS SIGN — it is EBCDIC 0x4E on the wire.

The paper's implementation sketch using compile-time dispatch on the character type to invoke an EBCDIC encoder is how you actually make this work portably. It is nice to see it addressed instead of footnoted.

u/charconv_gap_watcher 11 points 58 minutes ago

Appreciate the confirmation from someone who has actually operated in that environment. The paper notes the libc++ EBCDIC path was reviewed separately — good to know the approach holds up.

u/networking_still_not_in_std 104 points 1 hr. ago

Great paper. Now when do we get <networking> so I can actually send the numbers I just serialized somewhere?

u/api_surface_auditor 48 points 47 minutes ago

One thing I missed in my earlier comment: the floating-point overloads have a conditional constexpr dependency on P3652R1. Integer conversion is unconditionally constexpr for all five character types (that already holds for char). But float conversion is constexpr for char only after P3652R1 ships, and the new template overloads follow the same condition.

If P3652R1 stalls, you end up with a split API: integer to_chars is constexpr across all char types, float to_chars is not constexpr for any of them. That is probably fine — it mirrors the existing state — but it means the "upgrade path" for constexpr float formatting in non-char contexts depends on two papers landing in the right order. Worth tracking if you care about compile-time number formatting.

Edit: the paper does acknowledge this dependency explicitly in section 3.6. Not a hidden issue, just easy to miss on first read.