Streaming LLM responses to Shiny apps
Source:vignettes/streaming_shiny_ipc.Rmd
streaming_shiny_ipc.RmdThis vignette shows a minimal example of how to stream a LLM response gathered with ‘tidyprompt’ to a Shiny app in real-time.
Shiny apps run on a single R process, meaning that when you call an
LLM synchronously, the UI will be blocked until the response is
complete. Therefore, when integrating LLM calls into Shiny apps, you
typically want to use send_prompt() asynchronously (e.g.,
with the ‘future’ and ‘promises’ packages).
If you just want to show the final LLM response after it is complete, this is fairly straightforward. But if you want to show the LLM response as it streams in token-by-token, this is a bit more complex. The asynchronous process will have to somehow communicate back to the main Shiny R process to update the UI in real-time as new tokens arrive.
Therefore, for LLM providers created with ‘tidyprompt’ that support
streaming responses, you can set the stream_callback field
of your llm_provider object. This is a function that is
called for each token (or text chunk) as it arrives from the LLM. You
can then leverage this to push the tokens into the Shiny app in
real-time. The ‘ipc’ package can facilitate herein, allowing
inter-process communication between the background R process (where the
LLM call runs) and the main Shiny R process (where the UI runs). Each
time a new output token arrives, the stream_callback can
push then push the token into a reactive value in the main Shiny
process.
Below is a minimal example which shows how to achieve this. We will:
- create an
llm_providerwith streaming enabled; - define a
stream_callbackthat writes tokens into anipc::shinyQueue; - start a
futurein a separate R process (future::plan(multisession)) where we callsend_prompt(); - consume the queue from the Shiny main process to update the UI, showing a live stream of LLM output.
Example
# Install required packages if not already installed
packages <- c("shiny", "ipc", "future", "tidyprompt")
for (pkg in packages) {
if (!requireNamespace(pkg, quietly = TRUE)) {
install.packages(pkg)
}
}
# Load required packages
library(shiny)
library(ipc)
library(future)
library(promises)
library(tidyprompt)
# Enable asynchronous processing
future::plan(future::multisession)
# Base provider (OpenAI; has streaming enabled by default)
base_provider <- llm_provider_openai()
ui <- fluidPage(
titlePanel("tidyprompt streaming demo"),
sidebarLayout(
sidebarPanel(
textInput(
"prompt",
"Prompt",
value = "Tell me a short story about a cat and a robot."
),
actionButton("run", "Ask model"),
helpText("Tokens will appear below as they stream in.")
),
mainPanel(
verbatimTextOutput("partial_response")
)
)
)
server <- function(input, output, session) {
# Queue to bridge async future back into Shiny
queue <- shinyQueue()
queue$consumer$start(100) # process queue every 100 ms
# Reactive that holds the accumulated streamed text
partial_response <- reactiveVal("")
# Streaming callback run inside the provider
stream_cb <- function(token, meta) {
# meta$partial_response is the accumulated text so far
queue$producer$fireAssignReactive(
"partial_response",
meta$partial_response
)
invisible(TRUE)
}
# Clone provider for this session and attach callback + streaming
provider <- base_provider$clone()
provider$parameters$stream <- TRUE
provider$stream_callback <- stream_cb
# Expose the reactive value to the UI
output$partial_response <- renderText({
req(partial_response())
partial_response()
})
observeEvent(input$run, {
# Reset current streamed text on each run
partial_response("")
user_prompt <- input$prompt
future_promise(
{
tidyprompt::send_prompt(
prompt = user_prompt,
llm_provider = provider,
return_mode = "only_response"
)
},
globals = list(
user_prompt = user_prompt,
provider = provider
)
) %>%
then(
onFulfilled = function(value) {
# Final response once streaming finishes
partial_response(value)
},
onRejected = function(error) {
showNotification(
paste("Error:", error$message),
type = "error"
)
print(error)
}
)
NULL
})
}
shinyApp(ui = ui, server = server)