Introduction
When building an AI-powered chat interface, one of the most frustrating user experiences is watching a carefully crafted response vanish into thin air because of a brief network hiccup. The stream stops, the error message appears, and the user is left with nothing — not even the partial answer they were already reading.
This was exactly the problem I faced. My chat implementation used standard HTTP streaming to deliver AI responses in real time. It worked beautifully on stable connections, but the moment the network flickered, everything fell apart. The error handling was naive: it simply replaced whatever content had been received with a generic "network error" message. Users lost context, patience, and trust.
This article documents how I rebuilt the error handling layer to be truly resilient. The solution involves two core ideas: preserving partial content when a stream is interrupted, and automatically retrying with exponential backoff to recover from transient failures.
The Problem: Fragile Streaming Error Handling
The original streaming logic was straightforward. A fetch request initiated the stream, a ReadableStream reader consumed chunks, and each chunk was appended to an accumulated string that updated the UI:
let accumulated = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
accumulated += chunk;
// Update UI with accumulated content
setMessages(/* ... accumulated ... */);
}
The catch block, however, was the weak point:
catch (err) {
if (err instanceof DOMException && err.name === 'AbortError') {
// User cancelled — fine, just stop
setIsLoading(false);
return;
}
// ❌ The problem: accumulated content is discarded
setMessages((prev) => {
const updated = [...prev];
const lastMsg = updated[updated.length - 1];
if (lastMsg && lastMsg.role === 'assistant') {
updated[updated.length - 1] = {
...lastMsg,
content: 'Network connection error. Please check your connection and try again.',
};
}
return updated;
});
}
Notice the critical flaw: accumulated held all the content received before the error, but the catch block completely ignored it. The message was overwritten with a static error string. For a response that had been streaming for 10 seconds, this meant 10 seconds of valuable content simply disappeared.
There was also no recovery mechanism. The only option was a manual retry button that re-sent the entire user message, causing the AI to regenerate the response from scratch. This was wasteful and slow.
Core Concept 1: Separating Content from Error State
The first insight was that content and error state are orthogonal. A message can simultaneously contain partial content and be in an error state. These should not be conflated into a single string.
I extended the message type to support an optional error field:
interface ChatMessage {
id: string;
role: 'user' | 'assistant' | 'system';
content: string;
timestamp: number;
mode?: 'resume' | 'general';
error?: string; // NEW: error state separate from content
}
This separation allows the UI to render both the partial content and the error indicator. Users can see what was received before the interruption, rather than staring at a blank error message.
The catch block was rewritten to preserve accumulated:
catch (err) {
if (err instanceof DOMException && err.name === 'AbortError') {
setIsLoading(false);
return;
}
if (accumulated.trim()) {
// ✅ Preserve partial content, attach error separately
setMessages((prev) => {
const updated = [...prev];
const lastMsg = updated[updated.length - 1];
if (lastMsg && lastMsg.role === 'assistant') {
updated[updated.length - 1] = {
...lastMsg,
content: accumulated, // Keep what we got
error: '...', // Error info goes here
};
}
return updated;
});
} else {
// Nothing was received — show generic error
setMessages(/* ... network error ... */);
}
}
On the UI side, the message component now renders the content as usual, and conditionally displays the error in a distinct visual block below it:
┌─────────────────────────────┐
│ This is the partial answer │ ← content (preserved)
│ that was received before │
│ the network interrupted... │
├─────────────────────────────┤
│ ⚠️ Auto-retrying... (1/3) │ ← error (new)
│ [Retry] │
└─────────────────────────────┘
This simple architectural change dramatically improves the user experience. Even if recovery fails, the user hasn't lost the partial response they were reading.
Core Concept 2: Exponential Backoff Auto-Retry
Preserving content is only half the battle. The other half is recovering from the failure automatically when possible.
I implemented an auto-retry mechanism with exponential backoff. The design goals were:
- Automatic: The user shouldn't need to click anything for transient failures.
- Bounded: Don't retry forever. Cap the attempts and the delay.
- Non-intrusive: Don't block the user from sending new messages while retrying.
- Cancellable: If the user interacts with the chat, cancel any pending retry.
The Backoff Algorithm
The retry delay follows an exponential backoff with a ceiling:
const MAX_AUTO_RETRIES = 3;
const INITIAL_RETRY_DELAY_MS = 1000;
const MAX_RETRY_DELAY_MS = 8000;
function calculateBackoffDelay(attempt: number): number {
return Math.min(
INITIAL_RETRY_DELAY_MS * Math.pow(2, attempt),
MAX_RETRY_DELAY_MS
);
}
This produces delays of approximately 1s, 2s, 4s, 8s for successive attempts. The cap at 8 seconds prevents excessive waiting.
Retry State Machine
A ref-based counter tracks retry attempts across renders:
const autoRetryCountRef = useRef(0);
const autoRetryTimerRef = useRef<ReturnType<typeof setTimeout> | null>(null);
When a stream error occurs and partial content exists, the logic checks if more retries are available:
if (accumulated.trim()) {
const canAutoRetry = autoRetryCountRef.current < MAX_AUTO_RETRIES;
if (canAutoRetry) {
autoRetryCountRef.current++;
const errorContent = `Auto-retrying... (${autoRetryCountRef.current}/${MAX_AUTO_RETRIES})`;
// Show retrying status in the error field
setMessages(/* ... content: accumulated, error: errorContent ... */);
// Schedule the retry
const delay = calculateBackoffDelay(autoRetryCountRef.current - 1);
autoRetryTimerRef.current = setTimeout(() => {
// Re-send the last user message with trimmed context
// ...
}, delay);
} else {
// All retries exhausted
setMessages(/* ... error: "Auto-retry failed. Please retry manually." ... */);
}
}
The Retry Action
When the timer fires, the retry logic needs to reconstruct the conversation state. The key challenge is avoiding duplicate user messages. The original retry function had a subtle bug where it would sometimes leave the original user message in the array, causing the AI to see it twice.
The corrected approach removes both the failed assistant message and its preceding user message, then re-sends:
autoRetryTimerRef.current = setTimeout(() => {
const currentMessages = messagesRef.current;
const lastUserMessage = [...currentMessages]
.reverse()
.find((m) => m.role === 'user');
if (!lastUserMessage) return;
// Remove the failed assistant message
let trimmedMessages = currentMessages.slice(0, -1);
// Also remove the user message to prevent duplication
const lastTrimmed = trimmedMessages[trimmedMessages.length - 1];
if (lastTrimmed?.role === 'user' && lastTrimmed.content === lastUserMessage.content) {
trimmedMessages = trimmedMessages.slice(0, -1);
}
setMessages(trimmedMessages);
setTimeout(() => {
sendMessageRef.current(lastUserMessage.content, trimmedMessages);
}, 0);
}, delay);
Notice the use of sendMessageRef — a mutable ref that always points to the latest sendMessage function. This is crucial because the setTimeout callback closes over the ref value, not a stale function instance.
Cancellation Safety
Retries must not outlive their relevance. The cancelAutoRetry function is called in every scenario that invalidates a pending retry:
- User sends a new message
- User clicks manual retry
- User switches chat mode
- User clears messages
- Component unmounts
const cancelAutoRetry = useCallback(() => {
if (autoRetryTimerRef.current) {
clearTimeout(autoRetryTimerRef.current);
autoRetryTimerRef.current = null;
}
autoRetryCountRef.current = 0;
}, []);
Additionally, the timer callback validates that the message still has an error field before proceeding. If the user has already interacted with the chat (e.g., sent a new message), the error field will be gone, and the retry aborts.
Pitfalls and Lessons Learned
Pitfall 1: Stale Closures in setTimeout
My first attempt at auto-retry captured the sendMessage function directly in the setTimeout callback. Because sendMessage was a useCallback with many dependencies, the closure would reference an old version of the function after state changes. The retry would use stale messages and produce incorrect context.
The solution was the sendMessageRef pattern:
const sendMessageRef = useRef<(content: string, overrideMessages?: ChatMessage[]) => void>(() => {});
// ...
sendMessageRef.current = sendMessage;
// ...
setTimeout(() => {
sendMessageRef.current(lastUserMessage.content, trimmedMessages);
}, delay);
Refs are mutable and don't trigger re-renders, making them perfect for accessing the "latest" version of a callback from asynchronous contexts.
Pitfall 2: Duplicate User Messages on Retry
The original manual retry function had a subtle bug. It removed the last assistant message but left the user message in place, then called sendMessage which appended a new user message. The AI would see the same user message twice.
The fix removes both the assistant and user messages before re-sending:
let trimmedMessages = currentMessages.slice(0, -1); // Remove assistant
const lastTrimmed = trimmedMessages[trimmedMessages.length - 1];
if (lastTrimmed?.role === 'user') {
trimmedMessages = trimmedMessages.slice(0, -1); // Remove user too
}
Pitfall 3: Orphaned Timers
Without proper cleanup, auto-retry timers could fire after the user had moved on to a new conversation. This would cause confusing behavior where an old message suddenly reappeared.
The comprehensive cleanup strategy involves:
- Calling
cancelAutoRetry()on every user-initiated state change - Checking
lastMsg?.errorin the timer callback before acting - Cleaning up in the component unmount effect
Summary
Building resilient streaming requires thinking beyond the happy path. The key takeaways from this implementation:
| Technique | Purpose |
|---|---|
Separate error field | Preserve partial content while indicating failure |
| Exponential backoff | Retry transient failures without overwhelming the server |
sendMessageRef pattern | Avoid stale closures in asynchronous callbacks |
| Dual message cleanup | Prevent duplicate user messages on retry |
| Comprehensive cancellation | Prevent orphaned retries from causing confusion |
The result is a chat interface that degrades gracefully under poor network conditions. Users see their partial answers preserved, watch automatic recovery attempts, and always retain the option to retry manually if all else fails.
For AI applications where responses can take significant time to generate, preserving partial progress isn't just a nice-to-have — it's essential for maintaining user trust.