Friendly heads-up

Thanks for being here. Some of the most innovative features are not fully optimized for mobile yet. For the best experience, please visit https://bt.adatepe.dev on desktop.

Demo modeNo real data. Sign in for full functionality.
research_notes.pdfPDF
589 B
interface_sketch.png
interface_sketchPNG
286.3 KB

Can you explain how streaming responses work in modern LLM interfaces? I'm writing a bachelor thesis on this topic and need to understand the technical implementation.

Streaming responses allow users to see the model's output as it's generated, rather than waiting for the complete response. This improves perceived performance and enables a more conversational feel.

Key Benefits

Lower perceived latency: Users see progress immediately. Studies show that perceived wait time drops significantly when content appears progressively, even if total generation time is unchanged. The relationship can be modeled as LperceivedLactualet/τL_{\text{perceived}} \approx L_{\text{actual}} \cdot e^{-t/\tau}, where tt is time since request and τ\tau is a time constant. For streaming throughput:

throughput=tokenstime\text{throughput} = \frac{\text{tokens}}{\text{time}}

Progressive disclosure: Long responses feel less overwhelming when delivered in chunks. Users can start reading while the model continues generating, reducing cognitive load.

Interrupt capability: Users can stop generation if the response is heading in the wrong direction. This is especially valuable for creative or exploratory tasks.

Technical Implementation

Here's a minimal example using Server-Sent Events (SSE):

const response = await fetch("/api/chat", { method: "POST", body: JSON.stringify({ messages }), }); const reader = response.body?.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value, { stream: true }); // Append chunk to UI; batch updates for performance }

The interface batches updates to avoid excessive re-renders while maintaining smooth streaming. Typical implementations use a debounce or requestAnimationFrame to coalesce rapid token arrivals.

What about voice input and multimodal support? How do those integrate with the streaming flow?

Voice input and multimodal capabilities are increasingly important for LLM interfaces. They integrate with streaming in different ways:

Voice Input

Speech-to-text transcription (e.g., Whisper API) converts spoken prompts to text before the chat request. The mic button in the input area triggers recording; when the user stops, the transcript is inserted into the input field. From there, submission follows the same streaming path as typed messages. The key UX consideration is providing clear feedback during recording (waveform, duration) and a smooth handoff to the streaming response.

Multimodal Support

Models like GPT-4o accept images and documents alongside text. PDFs are typically extracted server-side (e.g., via unpdf or similar); images are sent as base64 or URLs. The streaming flow remains the same: the model generates a text response token-by-token. For image generation (DALL-E), the response may include function call results with image URLs, which are rendered inline as they arrive.

Context Panel

System instructions, user goals, and active files are visible in the Media Shelf on the right. This transparency helps users understand what the model "sees." Toggling context visibility affects what is sent with each request but does not change the streaming mechanism itself.

Try the Canvas view to explore non-linear conversation organization with branching.

How does branching work in the spatial canvas? I want to fork a conversation and explore different directions.

Branching in the spatial canvas lets you fork the conversation at any message and explore alternative directions without losing the original thread.

How It Works

  1. Switch to Canvas view using the tab above the chat area.
  2. Hover over any node (user or assistant message) to reveal the Branch button.
  3. Click Branch to open an input panel. Type your alternative follow-up and submit.
  4. A new branch appears as a separate column of nodes, connected by a purple dashed edge from the fork point.

Visual Layout

  • Main timeline: Vertical column on the left (x=100). Nodes are connected top-to-bottom.
  • Branches: Each branch occupies its own horizontal segment to the right. The first branch is at x=600, the next at x=1100, and so on. This prevents overlap and keeps the graph readable.
  • Purple edges: Dashed lines indicate branch connections. Solid purple edges connect nodes within a branch.

Use Cases

  • Alternative approaches: "What if we used a different algorithm?"
  • Deeper dive: "Explain step 2 in more detail."
  • Comparison: "How does this compare to approach X?"
  • Recovery: Branch from an earlier point if the conversation went off track.

Each branch is a separate chat in the sidebar. You can switch between them or continue any branch from the canvas.

What about the dynamic widgets for text transformation? When do they appear?

Dynamic widgets are context-aware floating toolbars that appear when you select text in a message. They enable "micro-iterations" without re-prompting the full conversation.

When They Appear

  • Select any text in an assistant (or user) message.
  • A toolbar appears near the selection with transformation options.
  • Options depend on content type: text vs code get different widgets.

Available Actions

TextCode
Magic Edit, Shorten, ExpandMagic Edit, Refactor, Explain
Rephrase, Summarize, CritiqueCritique
Custom (user-defined prompt)Custom

Workflow

  1. Select the portion you want to change.
  2. Click an action (e.g., "Summarize" or "Explain").
  3. A preview appears showing the transformation.
  4. Accept to replace the selection in-place, or Reject to discard.

Only the selected region is replaced; the rest of the message stays intact. This reduces prompt verbosity and improves precision (research shows ~72% reduction in prompt length with localized edits).