Pretext Explorer

How Text Layout Actually Works

An interactive deep dive into Pretext's algorithms for measuring and laying out multilingual text without touching the DOM. Every demo on this page is live — drag sliders, edit text, and watch the engine work.

1. The Problem: DOM Measurement Cost

Browsers calculate text height through layout reflow. When your UI has hundreds of text blocks and you need each one's height, you face a choice: measure them all through the DOM (expensive), or guess (inaccurate).

The cost gets worse when reads and writes interleave. Each getBoundingClientRect() after a style mutation forces the browser to recalculate the entire document layout.

Live comparison: DOM vs Pretext

Click "Run" to measure 200 text blocks both ways and see the timing difference.

DOM interleaved
DOM batched
prepare()
layout() ×10

2. The Two-Phase Pipeline

Pretext splits text layout into two phases. prepare() does the expensive one-time work: normalize whitespace, segment the text, apply script-specific rules, and measure each segment using canvas.measureText(). The result is an opaque handle containing cached widths.

layout() then uses that handle with pure arithmetic — no DOM reads, no canvas calls, no string operations. It simply walks the cached widths to count lines and compute height. This is the resize hot path: call it on every frame if you need to.

analyze measure prepared state layout()
import { prepare, layout } from '@chenglou/pretext'

// One-time: segment + measure (~19ms for 500 texts)
const prepared = prepare(text, '16px Inter')

// Hot path: pure arithmetic (~0.09ms for 500 texts)
const { height, lineCount } = layout(prepared, maxWidth, lineHeight)

3. Text Analysis: Segmentation & Break Kinds

The analysis phase turns raw text into a stream of semantically meaningful segments. Each segment has a break kind that tells the line breaker how to handle it. Pretext distinguishes eight break kinds:

text space preserved-space tab glue (NBSP) zero-width-break soft-hyphen hard-break

Type or edit the text below and watch how Pretext segments it. Notice how punctuation gets merged into adjacent words, and how special Unicode characters become explicit engine concepts.

Interactive segmenter

Key insight: Most correctness fixes in Pretext live in analysis, not in the line breaker. Script-specific rules — Arabic punctuation glue, CJK kinsoku, Myanmar medial glue, URL grouping — are all preprocessing decisions that happen before any width is measured.

4. Measurement: Canvas as Ground Truth

After analysis, Pretext measures each segment's width using CanvasRenderingContext2D.measureText(). These widths are cached in a Map<font, Map<segment, metrics>> and reused across texts that share segments.

The bars below show the measured width of each segment from the text above. Taller bars = wider segments.

Segment widths (px)

5. Line Breaking: The Core Algorithm

With cached segment widths, line breaking becomes arithmetic. The engine walks segments left to right, accumulating width. When a segment would overflow, it breaks the line — either at the most recent breakable position (after a space or zero-width break), or inside the word at grapheme boundaries if the word itself is wider than the container.

Drag the width slider and watch lines reflow in real time. The red border marks maxWidth. Notice how trailing whitespace "hangs" past the edge without triggering a break — matching CSS behavior.

Live line breaking
320px

6. CJK & Kinsoku: Punctuation Attachment

CJK scripts (Chinese, Japanese, Korean) break between every character by default. But punctuation has rules: closing punctuation like must not start a line (kinsoku-start), and opening brackets like must not end a line (kinsoku-end).

Pretext handles this during analysis by merging prohibited punctuation into adjacent characters. Watch how the comma stays attached to the preceding character as the width changes.

CJK kinsoku rules
200px

7. Soft Hyphens: Discretionary Breaks

A soft hyphen (&shy;, U+00AD) is invisible until the engine chooses it as a break point. If selected, a visible - appears at the end of the line. The line breaker must decide: can I fit more of this word, or should I break at the soft hyphen and show the dash?

Drag the width to see the soft hyphen activate and deactivate. The word "trans­atlantic" has a soft hyphen between "trans" and "atlantic".

Soft hyphen behavior
180px

8. Shrink Wrap: Finding the Tightest Fit

One of Pretext's most useful tricks: finding the narrowest container width that keeps the same line count. This is "multiline shrink wrap" — missing from web CSS but trivial with layout() since it's pure arithmetic you can call in a binary search.

Click "Shrink" to watch the binary search run. The text box animates to the tightest width that preserves line count.

Shrink wrap binary search
Mina cut the release note to three crisp lines, then realized the support caveat still needed one more sentence.

9. Variable Width: Text Around Obstacles

layoutNextLine() lets you lay out one line at a time with a different width for each. This enables text that flows around images, pull quotes, or any arbitrary shape — with no DOM measurement at all.

The demo below flows text around a rectangular obstacle. Drag the obstacle to see text reflow live. Lines beside the obstacle are narrower.

Variable-width layout with obstacle
drag me

10. Performance: prepare() vs layout()

The architectural bet of Pretext is that prepare() can be relatively expensive (it runs once), as long as layout() stays extremely cheap (it runs on every resize/reflow). Click "Benchmark" to measure both phases on a batch of texts.

Benchmark: 200 texts
prepare() total
layout() total
layout() per text

Why this matters: In a chat app with 500 visible messages, layout() costs ~0.09ms total for the batch. Traditional DOM measurement of the same batch costs 30–150ms and forces full document reflow. That's the difference between smooth 60fps scrolling and visible jank.

Further Reading

Pretext is open source. The repository contains accuracy sweeps, benchmark harnesses, and long-form corpus canaries that keep the engine honest across Chrome, Safari, and Firefox.

← Back to demos