Streaming API Implementation Guide
Overview
The streaming APIs allow the frontend to receive real-time updates as agents process queries via Server-Sent Events (SSE). Instead of waiting for a complete response, the frontend receives events as they occur:
- Token-by-token streaming from the LLM (real-time typing effect)
- Tool execution notifications (show "searching...", "processing...", etc.)
- Agent thinking/processing status
- Final response with full context
- Error events for graceful handling
Architecture:
- SSE via API routes: Real-time streaming during generation
- Socket.io: Continues to emit completed messages (existing behavior preserved)
This enables a responsive UI where users see the agent "thinking" and generating responses in real-time, while your existing Socket.io notification system stays intact.
Quick Start
Basic Text Streaming
const streamChat = async (message: string, chatInstanceId: string, chatModelId: string) => {
const response = await fetch(`/api/chat?stream=true`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message,
chatInstanceId,
chatModelId,
}),
});
const reader = response.body?.getReader();
const decoder = new TextDecoder();
while (reader) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const event = JSON.parse(line.slice(6));
handleStreamEvent(event);
}
}
}
};
API Endpoints
1. Text Chat Streaming
Endpoint: POST /api/chat?stream=true
Request Body:
{
message: string; // User message
chatInstanceId: string; // Chat instance ID
chatModelId: string; // Chat model ID
stream?: boolean; // Can also be passed in body (defaults to query param)
includeTokens?: boolean; // Include token events (default: true)
}
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
stream | "true" | - | Enable streaming mode |
includeTokens | "true" / "false" | "true" | Include token-level events |
Example:
// Full example with all options
const response = await fetch('/api/chat?stream=true&includeTokens=true', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer YOUR_TOKEN' // If auth required
},
body: JSON.stringify({
message: 'Hello, how can you help me?',
chatInstanceId: 'ci_xxxxx',
chatModelId: 'cm_xxxxx',
}),
});
2. Voice Mode Streaming (Text Response)
Endpoint: POST /api/chat/voice (with stream: true in form data)
This endpoint transcribes audio and streams the text response (no TTS audio generation).
Request (multipart/form-data):
| Field | Type | Required | Description |
|---|---|---|---|
audio | File | Yes | Audio file (webm, mp3, wav, etc.) |
chatModelId | string | Yes | Chat model ID |
chatInstanceId | string | Yes | Chat instance ID |
stream | "true" | No | Enable streaming mode |
Example:
const formData = new FormData();
formData.append('audio', audioBlob, 'recording.webm');
formData.append('chatModelId', 'cm_xxxxx');
formData.append('chatInstanceId', 'ci_xxxxx');
formData.append('stream', 'true');
const response = await fetch('/api/chat/voice', {
method: 'POST',
body: formData,
});
// Parse SSE events
const reader = response.body?.getReader();
const decoder = new TextDecoder();
while (reader) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const event = JSON.parse(line.slice(6));
// Handle: transcription, stream_start, token, agent_response, message_saved, stream_end
handleStreamEvent(event);
}
}
}
Non-streaming fallback: Omit stream field to get the standard JSON response.
3. Voice Response Streaming (with TTS Audio)
Endpoint: POST /api/chat/voice-response (with stream: true in form data)
Request (multipart/form-data):
| Field | Type | Required | Description |
|---|---|---|---|
audio | File | Yes | Audio file (webm, mp3, wav, etc.) |
chatModelId | string | Yes | Chat model ID |
chatInstanceId | string | Yes | Chat instance ID |
stream | "true" | No | Enable streaming mode |
voice | string | No | TTS voice preference (default: "nova") |
ttsEnabled | "true" / "false" | No | Generate TTS audio (default: true) |
voiceConversation | "true" / "false" | No | Voice conversation mode |
Example:
const formData = new FormData();
formData.append('audio', audioBlob, 'recording.webm');
formData.append('chatModelId', 'cm_xxxxx');
formData.append('chatInstanceId', 'ci_xxxxx');
formData.append('stream', 'true');
formData.append('voice', 'nova');
const response = await fetch('/api/chat/voice-response', {
method: 'POST',
body: formData,
});
// Parse SSE events same as text streaming
Stream Events
All events follow a base structure:
interface BaseStreamEvent {
type: StreamEventType; // Event type identifier
timestamp: Date; // When the event occurred
conversationId: string; // Chat instance ID
agentId: string; // Agent that emitted this event
}
Event Types Reference
| Event Type | Description | When Emitted |
|---|---|---|
stream_start | Stream has begun | First event |
token | New token from LLM | During generation |
tool_start | Tool execution started | When agent calls a tool |
tool_end | Tool execution completed | When tool returns |
agent_thinking | Agent is processing | During reasoning |
agent_response | Final agent response | When complete |
message_saved | Message saved to DB | After agent responds |
stream_end | Stream ended | Last event |
error | Error occurred | On error |
Voice-Specific Events
| Event Type | Description |
|---|---|
transcription | STT transcription result |
tts_start | TTS generation started |
tts_complete | TTS audio ready |
tts_error | TTS generation failed |
Event Type Definitions
stream_start
interface StreamStartEvent {
type: 'stream_start';
timestamp: Date;
conversationId: string;
agentId: string;
query: string; // Original user query
}
token
interface TokenEvent {
type: 'token';
timestamp: Date;
conversationId: string;
agentId: string;
content: string; // The new token
cumulativeContent: string; // All tokens so far
}
tool_start
interface ToolStartEvent {
type: 'tool_start';
timestamp: Date;
conversationId: string;
agentId: string;
toolName: string; // Name of the tool
toolInput: any; // Input passed to tool
toolCallId: string; // Unique ID for this call
}
tool_end
interface ToolEndEvent {
type: 'tool_end';
timestamp: Date;
conversationId: string;
agentId: string;
toolName: string;
toolInput: any;
toolOutput: any; // Tool result
toolCallId: string;
durationMs: number; // Execution time
}
agent_thinking
interface AgentThinkingEvent {
type: 'agent_thinking';
timestamp: Date;
conversationId: string;
agentId: string;
description?: string; // What the agent is doing
}
agent_response
interface AgentResponseEvent {
type: 'agent_response';
timestamp: Date;
conversationId: string;
agentId: string;
content: string; // Complete response content
response?: any; // Full response object
}
message_saved
interface MessageSavedEvent {
type: 'message_saved';
timestamp: Date;
conversationId: string;
agentId: string;
messageId: string; // Saved message ID
message: {
id: string;
text: string;
created_at: Date;
updated_at: Date;
isSent: boolean;
chatInstanceId: string;
user: {
id: string;
email: string;
name: string;
image: string | null;
};
};
}
stream_end
interface StreamEndEvent {
type: 'stream_end';
timestamp: Date;
conversationId: string;
agentId: string;
success: boolean;
totalDurationMs?: number;
reason?: string; // If success is false
}
error
interface ErrorEvent {
type: 'error';
timestamp: Date;
conversationId: string;
agentId: string;
message: string; // Error message
code?: string; // Error code
}
transcription (Voice only)
interface TranscriptionEvent {
type: 'transcription';
timestamp: Date;
conversationId: string;
agentId: 'stt';
transcription: {
text: string;
language?: string;
duration?: number;
confidence?: number;
};
performance: {
transcriptionMs: number;
};
}