Background
If you've ever built an AI chat interface, you know the drill: tokens stream in from the LLM, the UI needs to update in real time, and everything feels fine — until it doesn't. A long response with code blocks starts to stutter. A conversation with 50+ messages makes scrolling sluggish. The Markdown parser that seemed lightweight enough suddenly becomes a bottleneck.
I was using a custom lightweight Markdown parser (deliberately avoiding the heavy react-markdown + rehype/remark plugin chain), which was a good starting point. But I identified three specific bottlenecks that were still causing jank:
- Every SSE chunk triggered a full React re-render — sometimes dozens per second
- The entire Markdown content was re-parsed on every update — even though 90% of it hadn't changed
- All messages stayed in the DOM forever — long conversations caused DOM bloat
Let me walk through how I solved each one.
Optimization 1: Stream Chunk Merging with requestAnimationFrame
Before: Every Chunk = Every Render
The typical streaming loop reads chunks from a ReadableStream and updates state on each one:
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
accumulated += chunk;
// Every single chunk triggers a state update!
setMessages((prev) => {
const updated = [...prev];
const lastMsg = updated[updated.length - 1];
if (lastMsg && lastMsg.role === 'assistant') {
updated[updated.length - 1] = { ...lastMsg, content: accumulated };
}
return updated;
});
}
SSE chunks can arrive every 10-50ms. Each setMessages call triggers a React re-render, which means the Markdown parser runs, the DOM updates, and the browser recalculates layout — all potentially dozens of times per second. Most of these renders are wasted because the user can't perceive differences at that speed.
After: RAF-Based Throttling
The browser renders at ~60fps (one frame every ~16ms). There's no point updating the UI faster than that. requestAnimationFrame is the perfect primitive — it coalesces updates to once per frame.
const streamingContentRef = useRef('');
const rafIdRef = useRef<number>(0);
// Before entering the streaming loop:
streamingContentRef.current = '';
const flushStreamUpdate = () => {
setMessages((prev) => {
const updated = [...prev];
const lastMsg = updated[updated.length - 1];
if (lastMsg && lastMsg.role === 'assistant') {
updated[updated.length - 1] = {
...lastMsg,
content: streamingContentRef.current,
};
}
return updated;
});
rafIdRef.current = 0;
};
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
accumulated += chunk;
streamingContentRef.current = accumulated;
// Only schedule a render if one isn't already pending
if (rafIdRef.current === 0) {
rafIdRef.current = requestAnimationFrame(flushStreamUpdate);
}
}
// Flush any remaining content after stream ends
if (rafIdRef.current !== 0) {
cancelAnimationFrame(rafIdRef.current);
rafIdRef.current = 0;
flushStreamUpdate();
}
The key insight: streamingContentRef accumulates the latest content synchronously (so we never lose data), but setMessages is only called once per animation frame. If 5 chunks arrive within one frame, only the last one's content gets rendered — which is exactly what we want.
Don't forget cleanup on unmount:
useEffect(() => {
return () => {
if (rafIdRef.current !== 0) {
cancelAnimationFrame(rafIdRef.current);
rafIdRef.current = 0;
}
};
}, []);
Why Not debounce/throttle?
I considered debounce and throttle first, but requestAnimationFrame is strictly better for this use case:
| Approach | Timing | Frame Alignment | Guarantees |
|---|---|---|---|
debounce(16ms) | Fires 16ms after last call | No — may fire mid-frame | Can delay final update |
throttle(16ms) | Fires every 16ms | No — may fire mid-frame | Wastes renders if no new data |
requestAnimationFrame | Fires once per frame | Yes — aligned with paint | Always renders latest data |
Optimization 2: Segmented Markdown Rendering with Cache
Before: Full Re-Parse on Every Update
During streaming, the Markdown content grows incrementally. But the parser re-parses the entire content from scratch on every update. If the AI has already written 500 words and is adding one more, we're re-parsing all 500 words for no reason — only the last segment is actually changing.
After: Split by Block Boundaries + Cache Completed Segments
The core idea: split the Markdown content into segments at block-level boundaries, cache the parsed React nodes for completed segments, and only re-parse the last (active) segment.
Splitting Content into Segments
function splitContentSegments(content: string): string[] {
const segments: string[] = [];
let current = '';
let inCodeBlock = false;
for (const line of content.split('\n')) {
// Respect code block boundaries — never split inside a code block
if (line.startsWith('```')) {
inCodeBlock = !inCodeBlock;
current += (current ? '\n' : '') + line;
if (!inCodeBlock) {
segments.push(current);
current = '';
}
continue;
}
if (inCodeBlock) {
current += (current ? '\n' : '') + line;
continue;
}
// Empty line = block boundary
if (line.trim() === '' && current.trim()) {
segments.push(current);
current = '';
continue;
}
current += (current ? '\n' : '') + line;
}
if (current.trim()) {
segments.push(current);
}
return segments;
}
The splitting logic respects code block boundaries (``` pairs) so we never break a code block in half. Empty lines serve as natural paragraph separators.
Rendering with Cache
const [segmentCache] = useState<Map<string, React.ReactNode>>(() => new Map());
const renderedSegments = useMemo(() => {
const segments = splitContentSegments(content);
const result: React.ReactNode[] = [];
for (let i = 0; i < segments.length; i++) {
const seg = segments[i];
const isLast = i === segments.length - 1;
// Completed segments: check cache first
if (!isLast && segmentCache.has(seg)) {
result.push(segmentCache.get(seg)!);
} else {
// Active segment (or cache miss): parse fresh
const nodes = (
<Fragment key={`seg-${i}`}>
{parseMarkdown(seg, translations, i * 10000)}
</Fragment>
);
// Cache completed segments for future reuse
if (!isLast) {
segmentCache.set(seg, nodes);
}
result.push(nodes);
}
}
// Evict old entries when cache grows too large
if (segmentCache.size > MAX_CACHE_SIZE) {
const entries = [...segmentCache.entries()];
segmentCache.clear();
entries.slice(-MAX_CACHE_SIZE / 2).forEach(([k, v]) => segmentCache.set(k, v));
}
return result;
}, [content, translations, segmentCache]);
The cache key is the raw segment string itself. This works because during streaming, completed segments' text doesn't change — only the last segment grows. When a segment is "completed" (a new segment starts after it), it's cached. On subsequent renders, it's a direct cache hit.
I use useState with a lazy initializer instead of useRef for the cache Map. This avoids a React 19 lint rule (react-hooks/refs) that warns about reading refs during render.
Results
For a 1000-word AI response split into ~15 segments:
| Metric | Before | After |
|---|---|---|
| Segments parsed per update | 15 | 1 (last segment only) |
| Cache hit rate (after first render) | N/A | ~93% |
| Parse work per frame | O(total content) | O(last segment only) |
Optimization 3: Virtual Scrolling for Message List
Before: All Messages in the DOM
A typical chat message list renders all messages with a simple map:
{messages.map((message, index) => (
<ChatMessage key={message.id} message={message} showCursor={...} />
))}
In a 50-message conversation, all 50 message components (each potentially containing a full Markdown renderer) stay in the DOM. This causes:
- DOM bloat: Hundreds of DOM nodes for off-screen messages
- Slow scroll: Browser must calculate layout for all nodes
- Wasted memory: React must maintain fiber nodes for invisible components
After: Custom Virtual Scrolling
I built a virtual message list that only renders messages visible in the viewport (plus a small buffer). Here's the architecture:
┌─────────────────────────┐
│ Top Spacer (height) │ ← Off-screen messages replaced by spacer
├─────────────────────────┤
│ Message #5 (visible) │ ← Actually rendered
│ Message #6 (visible) │ ← Actually rendered
│ Message #7 (visible) │ ← Actually rendered
├─────────────────────────┤
│ Bottom Spacer (height) │ ← Off-screen messages replaced by spacer
└─────────────────────────┘
Height Tracking with ResizeObserver
Chat messages have variable heights (short text vs. long code blocks). I use ResizeObserver to measure actual rendered heights and MutationObserver to watch for new elements being added to the DOM:
useEffect(() => {
const container = scrollRef.current;
if (!container) return;
const resizeObserver = new ResizeObserver((entries) => {
const updates: Record<string, number> = {};
for (const entry of entries) {
const id = (entry.target as HTMLElement).getAttribute('data-msg-id');
if (id) {
updates[id] = entry.contentRect.height + MESSAGE_GAP;
}
}
if (Object.keys(updates).length > 0) {
setHeights((prev) => {
let changed = false;
const next = { ...prev };
for (const [id, height] of Object.entries(updates)) {
if (prev[id] === undefined || Math.abs(prev[id] - height) > 2) {
next[id] = height;
changed = true;
}
}
return changed ? next : prev;
});
}
});
const observeElements = () => {
container.querySelectorAll('[data-msg-id]').forEach((el) => {
resizeObserver.observe(el);
});
};
observeElements();
const mutationObserver = new MutationObserver(() => {
observeElements();
});
mutationObserver.observe(container, { childList: true, subtree: true });
return () => {
resizeObserver.disconnect();
mutationObserver.disconnect();
};
}, []);
The 2px tolerance in height comparison prevents infinite loops from sub-pixel rounding differences.
Visible Range Calculation
With height data available, computing which messages are visible is straightforward:
const visibleRange = useMemo(() => {
if (messages.length === 0) return { start: 0, end: -1 };
let start = 0;
let end = messages.length - 1;
const bufferHeight = OVERSCAN * (ESTIMATED_HEIGHT + MESSAGE_GAP);
const viewTop = scrollTop - bufferHeight;
const viewBottom = scrollTop + viewportHeight + bufferHeight;
// Scan for first visible message
for (let i = 0; i < messages.length; i++) {
const h = heights[messages[i].id] ?? ESTIMATED_HEIGHT + MESSAGE_GAP;
if (offsets[i] + h >= viewTop) {
start = Math.max(0, i - OVERSCAN);
break;
}
}
// Scan for last visible message
for (let i = messages.length - 1; i >= 0; i--) {
if (offsets[i] <= viewBottom) {
end = Math.min(messages.length - 1, i + OVERSCAN);
break;
}
}
// Always include the last message (for streaming visibility)
end = Math.max(end, messages.length - 1);
return { start, end };
}, [scrollTop, viewportHeight, offsets, heights, messages]);
The OVERSCAN = 3 buffer renders 3 extra messages above and below the viewport to ensure smooth scrolling without blank flashes.
Smart Auto-Scroll
One of the trickiest parts of a chat virtual list is auto-scrolling. The behavior should be:
- User is at the bottom → auto-scroll to follow new content
- User has scrolled up → don't force-scroll (they're reading history)
- User sends a new message → always scroll to bottom
const isAtBottomRef = useRef(true);
const handleScroll = useCallback(() => {
const el = scrollRef.current;
if (!el) return;
setScrollTop(el.scrollTop);
// 80px threshold accounts for input area and padding
isAtBottomRef.current = el.scrollHeight - el.scrollTop - el.clientHeight < 80;
}, []);
// Auto-scroll when at bottom
useEffect(() => {
if (!scrollRef.current || !isAtBottomRef.current) return;
requestAnimationFrame(() => {
if (scrollRef.current && isAtBottomRef.current) {
scrollRef.current.scrollTop = scrollRef.current.scrollHeight;
}
});
}, [messages, isLoading]);
// Force scroll to bottom when user sends a message
useEffect(() => {
if (messages.length > prevMsgCountRef.current) {
const lastMsg = messages[messages.length - 1];
if (lastMsg?.role === 'user') {
isAtBottomRef.current = true;
}
}
prevMsgCountRef.current = messages.length;
}, [messages]);
Pitfalls and Lessons Learned
1. React 19's react-hooks/refs Rule
My first implementation used useRef for the segment cache and height tracking, reading refs during render. React 19's new react-hooks/refs lint rule flags this because reading refs during render can lead to stale UI (refs don't trigger re-renders). I switched to:
- Segment cache:
useState<Map>with lazy initializer — the Map persists across renders but is "state" - Height tracking:
useState<Record<string, number>>updated viaResizeObservercallbacks
2. setState Inside useLayoutEffect Triggers Lint Error
The react-hooks/set-state-in-effect rule in React 19 flags synchronous setState calls inside effects. My initial approach used useLayoutEffect to measure heights and call setHeights synchronously. The fix was to use ResizeObserver + MutationObserver instead — the setState calls happen in the observer callbacks (which are asynchronous), not in the effect body itself.
3. Virtual Scrolling + Streaming = Always Render Last Message
A subtle but critical detail: during streaming, the last message is constantly growing. If the user is scrolled to the bottom, the virtual list must always include the last message in the visible range, even if the scroll position would normally exclude it. Without this, the streaming text would disappear from the DOM when the message grows tall enough to push it out of the viewport.
// Always include the last message
end = Math.max(end, messages.length - 1);
4. Cache Eviction Strategy
The segment cache uses a simple "clear half when full" strategy. A more sophisticated LRU would be ideal, but for a chat interface where users typically scroll in one direction, this simple approach works well. A MAX_CACHE_SIZE of 200 is generous enough that eviction rarely triggers.
Summary
| Optimization | Technique | Key Benefit |
|---|---|---|
| Stream throttling | requestAnimationFrame batching | Reduces renders from N/chunks to ~60/sec |
| Segmented rendering | Block-level splitting + segment cache | Avoids re-parsing 90%+ of unchanged content |
| Virtual scrolling | ResizeObserver + visible range calculation | DOM nodes stay constant regardless of conversation length |
These three optimizations work together synergistically: RAF throttling reduces render frequency, segmented caching reduces parse work per render, and virtual scrolling reduces DOM size. The result is a chat interface that stays smooth even with long streaming responses and long conversation histories — all without pulling in heavy dependencies like react-window or react-virtuoso.