Building Resilient AI Chat Streaming: Preserving Partial Content and Auto-Retry with Exponential Backoff

Introduction

When building an AI-powered chat interface, one of the most frustrating user experiences is watching a carefully crafted response vanish into thin air because of a brief network hiccup. The stream stops, the error message appears, and the user is left with nothing — not even the partial answer they were already reading.

This was exactly the problem I faced. My chat implementation used standard HTTP streaming to deliver AI responses in real time. It worked beautifully on stable connections, but the moment the network flickered, everything fell apart. The error handling was naive: it simply replaced whatever content had been received with a generic "network error" message. Users lost context, patience, and trust.

This article documents how I rebuilt the error handling layer to be truly resilient. The solution involves two core ideas: preserving partial content when a stream is interrupted, and automatically retrying with exponential backoff to recover from transient failures.

The Problem: Fragile Streaming Error Handling

The original streaming logic was straightforward. A fetch request initiated the stream, a ReadableStream reader consumed chunks, and each chunk was appended to an accumulated string that updated the UI:

let accumulated = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value, { stream: true });
  accumulated += chunk;

  // Update UI with accumulated content
  setMessages(/* ... accumulated ... */);
}

The catch block, however, was the weak point:

catch (err) {
  if (err instanceof DOMException && err.name === 'AbortError') {
    // User cancelled — fine, just stop
    setIsLoading(false);
    return;
  }

  // ❌ The problem: accumulated content is discarded
  setMessages((prev) => {
    const updated = [...prev];
    const lastMsg = updated[updated.length - 1];
    if (lastMsg && lastMsg.role === 'assistant') {
      updated[updated.length - 1] = {
        ...lastMsg,
        content: 'Network connection error. Please check your connection and try again.',
      };
    }
    return updated;
  });
}

Notice the critical flaw: accumulated held all the content received before the error, but the catch block completely ignored it. The message was overwritten with a static error string. For a response that had been streaming for 10 seconds, this meant 10 seconds of valuable content simply disappeared.

There was also no recovery mechanism. The only option was a manual retry button that re-sent the entire user message, causing the AI to regenerate the response from scratch. This was wasteful and slow.

Core Concept 1: Separating Content from Error State

The first insight was that content and error state are orthogonal. A message can simultaneously contain partial content and be in an error state. These should not be conflated into a single string.

I extended the message type to support an optional error field:

interface ChatMessage {
  id: string;
  role: 'user' | 'assistant' | 'system';
  content: string;
  timestamp: number;
  mode?: 'resume' | 'general';
  error?: string; // NEW: error state separate from content
}

This separation allows the UI to render both the partial content and the error indicator. Users can see what was received before the interruption, rather than staring at a blank error message.

The catch block was rewritten to preserve accumulated:

catch (err) {
  if (err instanceof DOMException && err.name === 'AbortError') {
    setIsLoading(false);
    return;
  }

  if (accumulated.trim()) {
    // ✅ Preserve partial content, attach error separately
    setMessages((prev) => {
      const updated = [...prev];
      const lastMsg = updated[updated.length - 1];
      if (lastMsg && lastMsg.role === 'assistant') {
        updated[updated.length - 1] = {
          ...lastMsg,
          content: accumulated,      // Keep what we got
          error: '...',              // Error info goes here
        };
      }
      return updated;
    });
  } else {
    // Nothing was received — show generic error
    setMessages(/* ... network error ... */);
  }
}

On the UI side, the message component now renders the content as usual, and conditionally displays the error in a distinct visual block below it:

┌─────────────────────────────┐
│  This is the partial answer │  ← content (preserved)
│  that was received before   │
│  the network interrupted... │
├─────────────────────────────┤
│  ⚠️  Auto-retrying... (1/3) │  ← error (new)
│  [Retry]                    │
└─────────────────────────────┘

This simple architectural change dramatically improves the user experience. Even if recovery fails, the user hasn't lost the partial response they were reading.

Core Concept 2: Exponential Backoff Auto-Retry

Preserving content is only half the battle. The other half is recovering from the failure automatically when possible.

I implemented an auto-retry mechanism with exponential backoff. The design goals were:

Automatic: The user shouldn't need to click anything for transient failures.
Bounded: Don't retry forever. Cap the attempts and the delay.
Non-intrusive: Don't block the user from sending new messages while retrying.
Cancellable: If the user interacts with the chat, cancel any pending retry.

The Backoff Algorithm

The retry delay follows an exponential backoff with a ceiling:

const MAX_AUTO_RETRIES = 3;
const INITIAL_RETRY_DELAY_MS = 1000;
const MAX_RETRY_DELAY_MS = 8000;

function calculateBackoffDelay(attempt: number): number {
  return Math.min(
    INITIAL_RETRY_DELAY_MS * Math.pow(2, attempt),
    MAX_RETRY_DELAY_MS
  );
}

This produces delays of approximately 1s, 2s, 4s, 8s for successive attempts. The cap at 8 seconds prevents excessive waiting.

Retry State Machine

A ref-based counter tracks retry attempts across renders:

const autoRetryCountRef = useRef(0);
const autoRetryTimerRef = useRef<ReturnType<typeof setTimeout> | null>(null);

When a stream error occurs and partial content exists, the logic checks if more retries are available:

if (accumulated.trim()) {
  const canAutoRetry = autoRetryCountRef.current < MAX_AUTO_RETRIES;

  if (canAutoRetry) {
    autoRetryCountRef.current++;
    const errorContent = `Auto-retrying... (${autoRetryCountRef.current}/${MAX_AUTO_RETRIES})`;

    // Show retrying status in the error field
    setMessages(/* ... content: accumulated, error: errorContent ... */);

    // Schedule the retry
    const delay = calculateBackoffDelay(autoRetryCountRef.current - 1);
    autoRetryTimerRef.current = setTimeout(() => {
      // Re-send the last user message with trimmed context
      // ...
    }, delay);
  } else {
    // All retries exhausted
    setMessages(/* ... error: "Auto-retry failed. Please retry manually." ... */);
  }
}

The Retry Action

When the timer fires, the retry logic needs to reconstruct the conversation state. The key challenge is avoiding duplicate user messages. The original retry function had a subtle bug where it would sometimes leave the original user message in the array, causing the AI to see it twice.

The corrected approach removes both the failed assistant message and its preceding user message, then re-sends:

autoRetryTimerRef.current = setTimeout(() => {
  const currentMessages = messagesRef.current;
  const lastUserMessage = [...currentMessages]
    .reverse()
    .find((m) => m.role === 'user');

  if (!lastUserMessage) return;

  // Remove the failed assistant message
  let trimmedMessages = currentMessages.slice(0, -1);

  // Also remove the user message to prevent duplication
  const lastTrimmed = trimmedMessages[trimmedMessages.length - 1];
  if (lastTrimmed?.role === 'user' && lastTrimmed.content === lastUserMessage.content) {
    trimmedMessages = trimmedMessages.slice(0, -1);
  }

  setMessages(trimmedMessages);
  setTimeout(() => {
    sendMessageRef.current(lastUserMessage.content, trimmedMessages);
  }, 0);
}, delay);

Notice the use of sendMessageRef — a mutable ref that always points to the latest sendMessage function. This is crucial because the setTimeout callback closes over the ref value, not a stale function instance.

Cancellation Safety

Retries must not outlive their relevance. The cancelAutoRetry function is called in every scenario that invalidates a pending retry:

User sends a new message
User clicks manual retry
User switches chat mode
User clears messages
Component unmounts

const cancelAutoRetry = useCallback(() => {
  if (autoRetryTimerRef.current) {
    clearTimeout(autoRetryTimerRef.current);
    autoRetryTimerRef.current = null;
  }
  autoRetryCountRef.current = 0;
}, []);

Additionally, the timer callback validates that the message still has an error field before proceeding. If the user has already interacted with the chat (e.g., sent a new message), the error field will be gone, and the retry aborts.

Pitfalls and Lessons Learned

Pitfall 1: Stale Closures in setTimeout

My first attempt at auto-retry captured the sendMessage function directly in the setTimeout callback. Because sendMessage was a useCallback with many dependencies, the closure would reference an old version of the function after state changes. The retry would use stale messages and produce incorrect context.

The solution was the sendMessageRef pattern:

const sendMessageRef = useRef<(content: string, overrideMessages?: ChatMessage[]) => void>(() => {});
// ...
sendMessageRef.current = sendMessage;
// ...
setTimeout(() => {
  sendMessageRef.current(lastUserMessage.content, trimmedMessages);
}, delay);

Refs are mutable and don't trigger re-renders, making them perfect for accessing the "latest" version of a callback from asynchronous contexts.

Pitfall 2: Duplicate User Messages on Retry

The original manual retry function had a subtle bug. It removed the last assistant message but left the user message in place, then called sendMessage which appended a new user message. The AI would see the same user message twice.

The fix removes both the assistant and user messages before re-sending:

let trimmedMessages = currentMessages.slice(0, -1); // Remove assistant
const lastTrimmed = trimmedMessages[trimmedMessages.length - 1];
if (lastTrimmed?.role === 'user') {
  trimmedMessages = trimmedMessages.slice(0, -1); // Remove user too
}

Pitfall 3: Orphaned Timers

Without proper cleanup, auto-retry timers could fire after the user had moved on to a new conversation. This would cause confusing behavior where an old message suddenly reappeared.

The comprehensive cleanup strategy involves:

Calling cancelAutoRetry() on every user-initiated state change
Checking lastMsg?.error in the timer callback before acting
Cleaning up in the component unmount effect

Summary

Building resilient streaming requires thinking beyond the happy path. The key takeaways from this implementation:

Technique	Purpose
Separate `error` field	Preserve partial content while indicating failure
Exponential backoff	Retry transient failures without overwhelming the server
`sendMessageRef` pattern	Avoid stale closures in asynchronous callbacks
Dual message cleanup	Prevent duplicate user messages on retry
Comprehensive cancellation	Prevent orphaned retries from causing confusion

The result is a chat interface that degrades gracefully under poor network conditions. Users see their partial answers preserved, watch automatic recovery attempts, and always retain the option to retry manually if all else fails.

For AI applications where responses can take significant time to generate, preserving partial progress isn't just a nice-to-have — it's essential for maintaining user trust.

Introduction

The Problem: Fragile Streaming Error Handling

let accumulated = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value, { stream: true });
  accumulated += chunk;

  // Update UI with accumulated content
  setMessages(/* ... accumulated ... */);
}

The catch block, however, was the weak point:

catch (err) {
  if (err instanceof DOMException && err.name === 'AbortError') {
    // User cancelled — fine, just stop
    setIsLoading(false);
    return;
  }

  // ❌ The problem: accumulated content is discarded
  setMessages((prev) => {
    const updated = [...prev];
    const lastMsg = updated[updated.length - 1];
    if (lastMsg && lastMsg.role === 'assistant') {
      updated[updated.length - 1] = {
        ...lastMsg,
        content: 'Network connection error. Please check your connection and try again.',
      };
    }
    return updated;
  });
}

Core Concept 1: Separating Content from Error State

I extended the message type to support an optional error field:

interface ChatMessage {
  id: string;
  role: 'user' | 'assistant' | 'system';
  content: string;
  timestamp: number;
  mode?: 'resume' | 'general';
  error?: string; // NEW: error state separate from content
}

This separation allows the UI to render both the partial content and the error indicator. Users can see what was received before the interruption, rather than staring at a blank error message.

The catch block was rewritten to preserve accumulated:

catch (err) {
  if (err instanceof DOMException && err.name === 'AbortError') {
    setIsLoading(false);
    return;
  }

  if (accumulated.trim()) {
    // ✅ Preserve partial content, attach error separately
    setMessages((prev) => {
      const updated = [...prev];
      const lastMsg = updated[updated.length - 1];
      if (lastMsg && lastMsg.role === 'assistant') {
        updated[updated.length - 1] = {
          ...lastMsg,
          content: accumulated,      // Keep what we got
          error: '...',              // Error info goes here
        };
      }
      return updated;
    });
  } else {
    // Nothing was received — show generic error
    setMessages(/* ... network error ... */);
  }
}

On the UI side, the message component now renders the content as usual, and conditionally displays the error in a distinct visual block below it:

┌─────────────────────────────┐
│  This is the partial answer │  ← content (preserved)
│  that was received before   │
│  the network interrupted... │
├─────────────────────────────┤
│  ⚠️  Auto-retrying... (1/3) │  ← error (new)
│  [Retry]                    │
└─────────────────────────────┘

This simple architectural change dramatically improves the user experience. Even if recovery fails, the user hasn't lost the partial response they were reading.

Core Concept 2: Exponential Backoff Auto-Retry

Preserving content is only half the battle. The other half is recovering from the failure automatically when possible.

I implemented an auto-retry mechanism with exponential backoff. The design goals were:

Automatic: The user shouldn't need to click anything for transient failures.
Bounded: Don't retry forever. Cap the attempts and the delay.
Non-intrusive: Don't block the user from sending new messages while retrying.
Cancellable: If the user interacts with the chat, cancel any pending retry.

The Backoff Algorithm

The retry delay follows an exponential backoff with a ceiling:

const MAX_AUTO_RETRIES = 3;
const INITIAL_RETRY_DELAY_MS = 1000;
const MAX_RETRY_DELAY_MS = 8000;

function calculateBackoffDelay(attempt: number): number {
  return Math.min(
    INITIAL_RETRY_DELAY_MS * Math.pow(2, attempt),
    MAX_RETRY_DELAY_MS
  );
}

This produces delays of approximately 1s, 2s, 4s, 8s for successive attempts. The cap at 8 seconds prevents excessive waiting.

Retry State Machine

A ref-based counter tracks retry attempts across renders:

const autoRetryCountRef = useRef(0);
const autoRetryTimerRef = useRef<ReturnType<typeof setTimeout> | null>(null);

When a stream error occurs and partial content exists, the logic checks if more retries are available:

if (accumulated.trim()) {
  const canAutoRetry = autoRetryCountRef.current < MAX_AUTO_RETRIES;

  if (canAutoRetry) {
    autoRetryCountRef.current++;
    const errorContent = `Auto-retrying... (${autoRetryCountRef.current}/${MAX_AUTO_RETRIES})`;

    // Show retrying status in the error field
    setMessages(/* ... content: accumulated, error: errorContent ... */);

    // Schedule the retry
    const delay = calculateBackoffDelay(autoRetryCountRef.current - 1);
    autoRetryTimerRef.current = setTimeout(() => {
      // Re-send the last user message with trimmed context
      // ...
    }, delay);
  } else {
    // All retries exhausted
    setMessages(/* ... error: "Auto-retry failed. Please retry manually." ... */);
  }
}

The Retry Action

The corrected approach removes both the failed assistant message and its preceding user message, then re-sends:

autoRetryTimerRef.current = setTimeout(() => {
  const currentMessages = messagesRef.current;
  const lastUserMessage = [...currentMessages]
    .reverse()
    .find((m) => m.role === 'user');

  if (!lastUserMessage) return;

  // Remove the failed assistant message
  let trimmedMessages = currentMessages.slice(0, -1);

  // Also remove the user message to prevent duplication
  const lastTrimmed = trimmedMessages[trimmedMessages.length - 1];
  if (lastTrimmed?.role === 'user' && lastTrimmed.content === lastUserMessage.content) {
    trimmedMessages = trimmedMessages.slice(0, -1);
  }

  setMessages(trimmedMessages);
  setTimeout(() => {
    sendMessageRef.current(lastUserMessage.content, trimmedMessages);
  }, 0);
}, delay);

Cancellation Safety

Retries must not outlive their relevance. The cancelAutoRetry function is called in every scenario that invalidates a pending retry:

User sends a new message
User clicks manual retry
User switches chat mode
User clears messages
Component unmounts

const cancelAutoRetry = useCallback(() => {
  if (autoRetryTimerRef.current) {
    clearTimeout(autoRetryTimerRef.current);
    autoRetryTimerRef.current = null;
  }
  autoRetryCountRef.current = 0;
}, []);

Pitfalls and Lessons Learned

Pitfall 1: Stale Closures in setTimeout

The solution was the sendMessageRef pattern:

const sendMessageRef = useRef<(content: string, overrideMessages?: ChatMessage[]) => void>(() => {});
// ...
sendMessageRef.current = sendMessage;
// ...
setTimeout(() => {
  sendMessageRef.current(lastUserMessage.content, trimmedMessages);
}, delay);

Refs are mutable and don't trigger re-renders, making them perfect for accessing the "latest" version of a callback from asynchronous contexts.

Pitfall 2: Duplicate User Messages on Retry

The fix removes both the assistant and user messages before re-sending:

let trimmedMessages = currentMessages.slice(0, -1); // Remove assistant
const lastTrimmed = trimmedMessages[trimmedMessages.length - 1];
if (lastTrimmed?.role === 'user') {
  trimmedMessages = trimmedMessages.slice(0, -1); // Remove user too
}

Pitfall 3: Orphaned Timers

Without proper cleanup, auto-retry timers could fire after the user had moved on to a new conversation. This would cause confusing behavior where an old message suddenly reappeared.

The comprehensive cleanup strategy involves:

Calling cancelAutoRetry() on every user-initiated state change
Checking lastMsg?.error in the timer callback before acting
Cleaning up in the component unmount effect

Summary

Building resilient streaming requires thinking beyond the happy path. The key takeaways from this implementation:

Technique	Purpose
Separate `error` field	Preserve partial content while indicating failure
Exponential backoff	Retry transient failures without overwhelming the server
`sendMessageRef` pattern	Avoid stale closures in asynchronous callbacks
Dual message cleanup	Prevent duplicate user messages on retry
Comprehensive cancellation	Prevent orphaned retries from causing confusion

For AI applications where responses can take significant time to generate, preserving partial progress isn't just a nice-to-have — it's essential for maintaining user trust.

Building Resilient AI Chat Streaming: Preserving Partial Content and Auto-Retry with Exponential Backoff

Table of Contents

Introduction

The Problem: Fragile Streaming Error Handling

Core Concept 1: Separating Content from Error State

Core Concept 2: Exponential Backoff Auto-Retry

The Backoff Algorithm

Retry State Machine

The Retry Action

Cancellation Safety

Pitfalls and Lessons Learned

Pitfall 1: Stale Closures in setTimeout

Pitfall 2: Duplicate User Messages on Retry

Pitfall 3: Orphaned Timers

Summary

Building Resilient AI Chat Streaming: Preserving Partial Content and Auto-Retry with Exponential Backoff

Table of Contents

Introduction

The Problem: Fragile Streaming Error Handling

Core Concept 1: Separating Content from Error State

Core Concept 2: Exponential Backoff Auto-Retry

The Backoff Algorithm

Retry State Machine

The Retry Action

Cancellation Safety

Pitfalls and Lessons Learned

Pitfall 1: Stale Closures in setTimeout

Pitfall 2: Duplicate User Messages on Retry

Pitfall 3: Orphaned Timers

Summary