Streaming API Implementation Guide
Overview
The streaming APIs allow the frontend to receive real-time updates as agents process queries via Server-Sent Events (SSE). Instead of waiting for a complete response, the frontend receives events as they occur:
- Token-by-token streaming from the LLM (real-time typing effect)
- Tool execution notifications (show "searching...", "processing...", etc.)
- Agent thinking/processing status
- Final response with full context
- Error events for graceful handling
Architecture:
- SSE via API routes: Real-time streaming during generation
- Socket.io: Continues to emit completed messages (existing behavior preserved)
This enables a responsive UI where users see the agent "thinking" and generating responses in real-time, while your existing Socket.io notification system stays intact.
Quick Start
Basic Text Streaming
const streamChat = async (message: string, chatInstanceId: string, chatModelId: string) => {
const response = await fetch(`/api/chat?stream=true`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message,
chatInstanceId,
chatModelId,
}),
});
const reader = response.body?.getReader();
const decoder = new TextDecoder();
while (reader) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const event = JSON.parse(line.slice(6));
handleStreamEvent(event);
}
}
}
};
API Endpoints
1. Text Chat Streaming
Endpoint: POST /api/chat?stream=true
Request Body:
{
message: string; // User message
chatInstanceId: string; // Chat instance ID
chatModelId: string; // Chat model ID
stream?: boolean; // Can also be passed in body (defaults to query param)
includeTokens?: boolean; // Include token events (default: true)
}
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
stream | "true" | - | Enable streaming mode |
includeTokens | "true" / "false" | "true" | Include token-level events |
Example:
// Full example with all options
const response = await fetch('/api/chat?stream=true&includeTokens=true', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer YOUR_TOKEN' // If auth required
},
body: JSON.stringify({
message: 'Hello, how can you help me?',
chatInstanceId: 'ci_xxxxx',
chatModelId: 'cm_xxxxx',
}),
});
2. Voice Mode Streaming (Text Response)
Endpoint: POST /api/chat/voice (with stream: true in form data)
This endpoint transcribes audio and streams the text response (no TTS audio generation).
Request (multipart/form-data):
| Field | Type | Required | Description |
|---|---|---|---|
audio | File | Yes | Audio file (webm, mp3, wav, etc.) |
chatModelId | string | Yes | Chat model ID |
chatInstanceId | string | Yes | Chat instance ID |
stream | "true" | No | Enable streaming mode |
Example:
const formData = new FormData();
formData.append('audio', audioBlob, 'recording.webm');
formData.append('chatModelId', 'cm_xxxxx');
formData.append('chatInstanceId', 'ci_xxxxx');
formData.append('stream', 'true');
const response = await fetch('/api/chat/voice', {
method: 'POST',
body: formData,
});
// Parse SSE events
const reader = response.body?.getReader();
const decoder = new TextDecoder();
while (reader) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const event = JSON.parse(line.slice(6));
// Handle: transcription, stream_start, token, agent_response, message_saved, stream_end
handleStreamEvent(event);
}
}
}
Non-streaming fallback: Omit stream field to get the standard JSON response.