New: iter_graphemes()#165

jquast · 2026-01-14T22:53:01Z

Add iter_graphemes() function for Unicode grapheme cluster iteration following UAX #29. Enables segmentation of "user-perceived" characters: emoji sequences, combining marks, regional indicators, Indic conjuncts.

New file grapheme.py contains core algorithm,
New file table_grapheme.py is auto-generated by bin/update-tables.py
New file bisearch.py extracted from wcwidth.py -- shared by grapheme.py

A few examples from docs/intro.rst:

  >>> # cafe + combining cute accent
  >>> list(iter_graphemes('cafe\u0301'))
  ['c', 'a', 'f', 'é']

Implements Unicode Standard Annex #29 grapheme cluster boundaries. Handles Hangul syllables, emoji ZWJ sequences, regional indicators, combining characters, and Indic scripts. New exports: iter_graphemes, _bisearch

We suggested to use ``wcwidth<2`` for years, when it should have been ``wcwidth<1``, I really hope somebody didn't copy & paste our recommendation .. :(

its a private function, anyway, still ok. Below the turtles, 0/1 is very much the definition of Falsey and Truthy.

- Add new `width()` function for measuring terminal-aware strings, with support for control codes, escape sequences (SGR, OSC, CSI), cursor movement, and tab stops. - Add `iter_sequences()` function to iterate with text containing escape sequences - New file, `control_codes.py` for control characters, categorized - New file `escape_sequences.py` for terminal sequence patterns, categorized - extract `_bisearch` , duplicates #165 A few examples from docs/intro.rst: >>> wcwidth.width('\x1b[38;2;255;150;100mWARN\x1b[0m') 4 >>> list(wcwidth.iter_sequences('\x1b[31mred\x1b[0m')) [('\x1b[31m', True), ('red', False), ('\x1b[0m', True)] >>> wcwidth.width('\U0001F1FF\U0001F1FC') 2

New ``wrap()`` function is an emoji, control and terminal sequence, wide, zero-width, and grapheme-aware version of textwrap.wrap(). This PR builds on #168 and #165 combined >>> # Wrapping CJK text (each character is 2 cells wide) >>> wrap('コンニチハ', 4) ['コン', 'ニチ', 'ハ'] >>> # Text with ANSI color sequences >>> wrap('\x1b[31mhello world\x1b[0m', 5) ['\x1b[31mhello', 'world\x1b[0m']

jquast added 6 commits January 14, 2026 16:33

Add iter_graphemes() for Unicode grapheme cluster iteration

e6c8fa0

Implements Unicode Standard Annex #29 grapheme cluster boundaries. Handles Hangul syllables, emoji ZWJ sequences, regional indicators, combining characters, and Indic scripts. New exports: iter_graphemes, _bisearch

Merge remote-tracking branch 'origin/master' into jq/next-new-grapheme

0dc7128

remove defensive return

c169f90

more reformat

82b9a43

revise tests, prepare history change

0b9ff3f

correct PR link and number

4ed4ca2

jquast marked this pull request as ready for review January 14, 2026 22:59

jquast added 2 commits January 14, 2026 18:10

versions and history changes (and an awful docfix!)

1153c88

We suggested to use ``wcwidth<2`` for years, when it should have been ``wcwidth<1``, I really hope somebody didn't copy & paste our recommendation .. :(

match original behavior, almost changed types!

272ec5d

its a private function, anyway, still ok. Below the turtles, 0/1 is very much the definition of Falsey and Truthy.

jquast mentioned this pull request Jan 15, 2026

New: width() terminal-aware string measurement #166

Merged

jquast changed the title ~~New: iter_graphemes() function~~ New: iter_graphemes() Jan 15, 2026

This was referenced Jan 15, 2026

New: ljust(), rjust(), center() #167

Closed

What would wcwidth look like if it were built-in to Python? #94

Closed

New: wrap() #169

Merged

Bugfix: Variation Selector-16/ZWJ and starting sequences in wrap() jquast/blessed#338

Merged

jquast merged commit 875011d into master Jan 17, 2026
36 checks passed

jquast deleted the jq/next-new-grapheme branch January 17, 2026 17:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New: iter_graphemes()#165

New: iter_graphemes()#165
jquast merged 8 commits intomasterfrom
jq/next-new-grapheme

jquast commented Jan 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jquast commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jquast commented Jan 14, 2026 •

edited

Loading