Make LLM answer as a data frame via structured output
Source:R/answer_as_dataframe.R
answer_as_dataframe.RdThis function builds on answer_as_json() to extract a data frame from an
LLM response using structured output. The supplied schema should describe a
single row of the desired data frame, or an array of such rows. Internally,
answer_as_dataframe() standardizes the schema to a JSON object with a
rows field containing an array of row objects. This shape works well with
both text-based JSON extraction and native structured-output backends,
including 'ellmer', where arrays of objects are converted to data frames.
Arguments
- prompt
A single string or a
tidyprompt()object- schema
A JSON schema list or an 'ellmer' type definition describing a single row, an array of rows, or a wrapper object containing a
rowsarray.- min_rows
(optional) Minimum number of rows required in the returned data frame
- max_rows
(optional) Maximum number of rows allowed in the returned data frame
- schema_strict
If TRUE, the wrapped schema will be strictly enforced. Passed through to
answer_as_json()- schema_in_prompt_as
Passed through to
answer_as_json()when using a text-based JSON path- type
Passed through to
answer_as_json()to control the structured output backend
Value
A tidyprompt() with an added prompt_wrap() which will ensure
that the LLM response is returned as a data frame.
Details
Prefer supplying an 'ellmer' row schema created with
ellmer::type_object(...) when possible. This is usually the clearest way to
describe the columns you want, and it maps cleanly to native 'ellmer'
structured output. These 'ellmer' schema definitions can also be used with
non-'ellmer' LLM providers, because 'tidyprompt' converts between 'ellmer'
schema definitions and JSON-schema representations as needed.
answer_as_dataframe() accepts the following schema shapes:
A single row schema, such as
ellmer::type_object(...)or a JSON schema object whose properties describe the columns of one row.An array-of-rows schema, such as
ellmer::type_array(row_schema)or a JSON schema withtype = "array"and row objects underitems.A wrapper object whose only property is a
rowsfield containing an array of row objects (matching the shape produced internally byanswer_as_dataframe()). Schemas with additional sibling properties alongsiderowsare treated as row schemas, not wrappers.
Regardless of which of these forms you supply, answer_as_dataframe()
normalizes it to a row-oriented structured-output schema before delegating to
answer_as_json().
See also
Other pre_built_prompt_wraps:
add_image(),
add_text(),
answer_as_boolean(),
answer_as_category(),
answer_as_integer(),
answer_as_json(),
answer_as_list(),
answer_as_multi_category(),
answer_as_named_list(),
answer_as_numeric(),
answer_as_regex_match(),
answer_as_text(),
answer_by_chain_of_thought(),
answer_by_react(),
answer_using_r(),
answer_using_sql(),
answer_using_tools(),
prompt_wrap(),
quit_if(),
set_system_prompt()
Other answer_as_prompt_wraps:
answer_as_boolean(),
answer_as_category(),
answer_as_integer(),
answer_as_json(),
answer_as_list(),
answer_as_multi_category(),
answer_as_named_list(),
answer_as_numeric(),
answer_as_regex_match(),
answer_as_text()
Examples
# `answer_as_dataframe()` accepts multiple schema shapes.
# Prefer an ellmer row schema when possible, because it is concise and maps
# cleanly to native ellmer structured output.
# These ellmer schema definitions also work with non-ellmer LLM providers,
# because tidyprompt converts between ellmer schemas and JSON schemas for you.
if (requireNamespace("ellmer", quietly = TRUE)) {
person_row_schema_ellmer <- ellmer::type_object(
name = ellmer::type_string(),
age = ellmer::type_integer(),
city = ellmer::type_string()
)
# Also accepted: an array of row objects.
person_array_schema_ellmer <- ellmer::type_array(person_row_schema_ellmer)
}
# Also accepted: a JSON schema describing one row.
person_row_schema_json <- list(
type = "object",
properties = list(
name = list(type = "string"),
age = list(type = "integer"),
city = list(type = "string")
),
required = c("name", "age", "city"),
additionalProperties = FALSE
)
# Also accepted: a wrapper object with a `rows` array.
person_wrapper_schema_json <- list(
type = "object",
properties = list(
rows = list(
type = "array",
items = person_row_schema_json
)
),
required = "rows",
additionalProperties = FALSE
)
if (FALSE) { # \dontrun{
prompt <- paste(
"Extract the people in the following notes as a table:",
"Alice (32, Berlin), Bob (28, Utrecht)."
)
# Preferred: ellmer row schema.
# This works both with ellmer-backed providers and with regular tidyprompt
# providers, because tidyprompt converts the schema when needed.
if (requireNamespace("ellmer", quietly = TRUE)) {
prompt |>
answer_as_dataframe(person_row_schema_ellmer) |>
send_prompt()
# name age city
# 1 Alice 32 Berlin
# 2 Bob 28 Utrecht
# Also works: ellmer array-of-rows schema.
prompt |>
answer_as_dataframe(person_array_schema_ellmer) |>
send_prompt()
}
# Also works: JSON schema for one row.
prompt |>
answer_as_dataframe(person_row_schema_json, type = "text-based") |>
send_prompt()
# Also works: JSON wrapper schema with a `rows` array.
prompt |>
answer_as_dataframe(person_wrapper_schema_json, type = "text-based") |>
send_prompt()
} # }