AI editor SSE stream cut off — Gemini quota and timeout fixes
The AI writing assistant uses Server-Sent Events to stream content from Gemini. If the stream cuts off mid-sentence, the cause is usually Gemini's per-minute token quota, a Railway request timeout, or a client-side fetch abort.
The VeloCMS AI writing assistant streams responses from Google Gemini using Server-Sent Events. If the stream cuts off mid-sentence and the editor shows a partial response, there are three likely causes: Gemini's per-minute request quota exhausted, Railway's 60-second request timeout reached, or the browser's EventSource connection dropped due to a network interruption.
Cause 1 — Gemini per-minute quota
Google Gemini 2.0 Flash has a default quota of 15 RPM (requests per minute) and 1,000,000 TPM (tokens per minute) on the free tier. On a shared VeloCMS SaaS plan where your GEMINI_API_KEY is the platform key, this quota is shared across all your AI-assist requests. If you're writing a long article and making multiple AI continuations in quick succession, you can hit the RPM limit. The symptom is a stream that starts and then cuts off after 2–3 seconds with no error message in the editor.
If you're on the Pro or Business plan and have set your own BYOK Gemini API key in Admin Settings AI, your quota is separate from other VeloCMS users. Set up a BYOK key to get dedicated quota — the setup guide is at 'How to set up a BYOK AI key'.
Cause 2 — Railway 60-second timeout
Railway's HTTP proxy enforces a 60-second request timeout. The Gemini SSE streaming route is a long-lived HTTP connection — for a 2,000-token response at Gemini's token emission rate, the stream can run for 30–90 seconds. Requests that take over 60 seconds hit Railway's proxy timeout, which closes the connection without a clean SSE done event. The fix is a streaming keepalive: the VeloCMS route handler sends a SSE comment every 15 seconds while waiting for Gemini tokens, which resets Railway's timeout counter.
// In /api/ai/generate/route.ts -- keepalive pattern:
const keepalive = setInterval(() => {
controller.enqueue(
new TextEncoder().encode(": keep-alive\n\n")
);
}, 15_000);
try {
for await (const chunk of geminiStream) {
controller.enqueue(
encoder.encode(`data: ${JSON.stringify({ text: chunk })}\n\n`)
);
}
} finally {
clearInterval(keepalive);
controller.close();
}Cause 3 — Client-side fetch abort
Modern browsers abort fetch requests and EventSource connections when the user navigates away from the page, the browser tab becomes idle for an extended period on mobile, or a Service Worker intercepts the request. The VeloCMS editor handles this with a reconnect strategy — when the EventSource fires an error event, the editor waits 2 seconds and reconnects, appending to the partial text already received. If reconnection fails after 3 attempts, the editor shows an error callout with a Retry button.
Diagnosing with browser DevTools
Open Chrome DevTools Network Filter by EventStream. Click on the /api/ai/generate request and open the EventStream tab. You'll see each SSE event as a row. If the stream ends cleanly, the last row is a done event. If it cuts off without a done event, the connection was dropped externally — check the timing column to see if it stopped at exactly 60 seconds (Railway timeout) or at a random point (quota or network issue).