close
close

first Drop

Com TW NOw News 2024

How to Get JSON Output from LLMs: A Practical Guide
news

How to Get JSON Output from LLMs: A Practical Guide

Tutorial on forcing JSON output using Llama.cpp or Gemini’s API

How to Get JSON Output from LLMs: A Practical GuidePhoto by Etienne Girardet on Unsplash

Large Language Models (LLMs) are great at generating text, but getting structured output like JSON usually requires smart prompting and hoping the LLM understands it. Fortunately, JSON mode is becoming increasingly common in LLM frameworks and services. It allows you to define the exact output schema you want.

This post is about sparse generation using JSON mode. We will use a complex, nested and realistic JSON schema example to guide LLM frameworks/APIs like Llama.cpp or Gemini API to generate structured data, specifically tourist location information. This builds on a previous post about sparse generation using Accompanimentbut focuses on the more widely used JSON mode.

Save time and effort building LLM apps using guided generation

Although more limited than AccompanimentThe broader support of JSON mode makes it more accessible, especially among cloud-based LLM providers.

During a personal project, I discovered that while JSON mode was easy to use with Llama.cpp, working with Gemini API required some extra steps. This post shares those solutions to help you use JSON mode effectively.

Our JSON schema: a document about a tourist location

Our sample schema represents a TouristLocation. It is a non-trivial structure with nested objects, lists, enums and various data types such as strings and numbers.

Here’s a simplified version:

{
"name": "string",
"location_long_lat": ("number", "number"),
"climate_type": {"type": "string", "enum": ("tropical", "desert", "temperate", "continental", "polar")},
"activity_types": ("string"),
"attraction_list": (
{
"name": "string",
"description": "string"
}
),
"tags": ("string"),
"description": "string",
"most_notably_known_for": "string",
"location_type": {"type": "string", "enum": ("city", "country", "establishment", "landmark", "national park", "island", "region", "continent")},
"parents": ("string")
}

You can write this type of schema manually or you can generate it using the Pydantic library. Here is how you can do it with a simplified example:

from typing import List
from pydantic import BaseModel, Field

class TouristLocation(BaseModel):
"""Model for a tourist location"""

high_season_months: List(int) = Field(
(), description="List of months (1-12) when the location is most visited"
)

tags: List(str) = Field(
...,
description="List of tags describing the location (e.g. accessible, sustainable, sunny, cheap, pricey)",
min_length=1,
)
description: str = Field(..., description="Text description of the location")

# Example usage and schema output
location = TouristLocation(
high_season_months=(6, 7, 8),
tags=("beach", "sunny", "family-friendly"),
description="A beautiful beach with white sand and clear blue water.",
)

schema = location.model_json_schema()
print(schema)

This code defines a simplified version of the TouristLocation data class using Pydantic. It has three fields:

  • high_season_months: A list of integers representing the months of the year (1-12) in which the location is most visited. By default, this is an empty list.
  • tags: A list of strings describing the location with tags such as “accessible”, “sustainable”, etc. This field is required (…) and must contain at least one element (min_length=1).
  • description: A string field containing a textual description of the location. This field is also required.

The code then creates an instance of the TouristLocation class and uses model_json_schema() to obtain the JSON Schema representation of the model. This schema defines the structure and types of the expected data for this class.

model_json_schema() returns:

{'description': 'Model for a tourist location',
'properties': {'description': {'description': 'Text description of the '
'location',
'title': 'Description',
'type': 'string'},
'high_season_months': {'default': (),
'description': 'List of months (1-12) '
'when the location is '
'most visited',
'items': {'type': 'integer'},
'title': 'High Season Months',
'type': 'array'},
'tags': {'description': 'List of tags describing the location '
'(e.g. accessible, sustainable, sunny, '
'cheap, pricey)',
'items': {'type': 'string'},
'minItems': 1,
'title': 'Tags',
'type': 'array'}},
'required': ('tags', 'description'),
'title': 'TouristLocation',
'type': 'object'}

Now that we have our schema, let’s see how we can enforce it. First in Llama.cpp with its Python wrapper and second using Gemini’s API.

Method 1: The Simple Approach with Llama.cpp

Llama.cpp, a C++ library for running Llama models locally. It is beginner-friendly and has an active community. We will use it via the Python wrapper.

Here’s how to generate TouristLocation data:

# Imports and stuff

# Model init:
checkpoint = "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF"

model = Llama.from_pretrained(
repo_id=checkpoint,
n_gpu_layers=-1,
filename="*Q4_K_M.gguf",
verbose=False,
n_ctx=12_000,
)

messages = (
{
"role": "system",
"content": "You are a helpful assistant that outputs in JSON."
f"Follow this schema {TouristLocation.model_json_schema()}",
},
{"role": "user", "content": "Generate information about Hawaii, US."},
{"role": "assistant", "content": f"{location.model_dump_json()}"},
{"role": "user", "content": "Generate information about Casablanca"},
)
response_format = {
"type": "json_object",
"schema": TouristLocation.model_json_schema(),
}

start = time.time()

outputs = model.create_chat_completion(
messages=messages, max_tokens=1200, response_format=response_format
)

print(outputs("choices")(0)("message")("content"))

print(f"Time: {time.time() - start}")

The code first imports the necessary libraries and initializes the LLM model. It then defines a list of messages for a conversation with the model, including a system message instructing the model to output in JSON format according to a specific schema, user requests for information about Hawaii and Casablanca, and an assistant response using the specified schema.

Llama.cpp uses context-free grammars to constrain the structure and generate valid JSON output for a new city.

In the output we get the following generated string:

{'activity_types': ('shopping', 'food and wine', 'cultural'),
'attraction_list': ({'description': 'One of the largest mosques in the world '
'and a symbol of Moroccan architecture',
'name': 'Hassan II Mosque'},
{'description': 'A historic walled city with narrow '
'streets and traditional shops',
'name': 'Old Medina'},
{'description': 'A historic square with a beautiful '
'fountain and surrounding buildings',
'name': 'Mohammed V Square'},
{'description': 'A beautiful Catholic cathedral built in '
'the early 20th century',
'name': 'Casablanca Cathedral'},
{'description': 'A scenic waterfront promenade with '
'beautiful views of the city and the sea',
'name': 'Corniche'}),
'climate_type': 'temperate',
'description': 'A large and bustling city with a rich history and culture',
'location_type': 'city',
'most_notably_known_for': 'Its historic architecture and cultural '
'significance',
'name': 'Casablanca',
'parents': ('Morocco', 'Africa'),
'tags': ('city', 'cultural', 'historical', 'expensive')}

Which can then be parsed into an instance of our Pydantic class.

Method 2: Overcoming the Gemini API’s Quirks

Gemini API, Google’s managed LLM service, claims limited JSON mode support for Gemini Flash 1.5 in its documentation. However, it can be made to work with a few tweaks.

Here are the general instructions to make it work:

schema = TouristLocation.model_json_schema()
schema = replace_value_in_dict(schema.copy(), schema.copy())
del schema("$defs")
delete_keys_recursive(schema, key_to_delete="title")
delete_keys_recursive(schema, key_to_delete="location_long_lat")
delete_keys_recursive(schema, key_to_delete="default")
delete_keys_recursive(schema, key_to_delete="default")
delete_keys_recursive(schema, key_to_delete="minItems")

print(schema)

messages = (
ContentDict(
role="user",
parts=(
"You are a helpful assistant that outputs in JSON."
f"Follow this schema {TouristLocation.model_json_schema()}"
),
),
ContentDict(role="user", parts=("Generate information about Hawaii, US.")),
ContentDict(role="model", parts=(f"{location.model_dump_json()}")),
ContentDict(role="user", parts=("Generate information about Casablanca")),
)

genai.configure(api_key=os.environ("GOOGLE_API_KEY"))

# Using `response_mime_type` with `response_schema` requires a Gemini 1.5 Pro model
model = genai.GenerativeModel(
"gemini-1.5-flash",
# Set the `response_mime_type` to output JSON
# Pass the schema object to the `response_schema` field
generation_config={
"response_mime_type": "application/json",
"response_schema": schema,
},
)

response = model.generate_content(messages)
print(response.text)

Here’s how to overcome Gemini’s limitations:

  1. Replace $ref with full definitions: Gemini stumbles over schema references ($ref). These are used when you have a nested object definition. Replace them with the full definition of your schema.
def replace_value_in_dict(item, original_schema):
# Source: https://github.com/pydantic/pydantic/issues/889
if isinstance(item, list):
return (replace_value_in_dict(i, original_schema) for i in item)
elif isinstance(item, dict):
if list(item.keys()) == ("$ref"):
definitions = item("$ref")(2:).split("/")
res = original_schema.copy()
for definition in definitions:
res = res(definition)
return res
else:
return {
key: replace_value_in_dict(i, original_schema)
for key, i in item.items()
}
else:
return item
  1. Remove unsupported keys: Gemini does not yet handle keys like “title”, “AnyOf” or “minItems”. Remove these from your schema. This will make the schema less readable and less restrictive, but we have no choice if we want to use Gemini.
def delete_keys_recursive(d, key_to_delete):
if isinstance(d, dict):
# Delete the key if it exists
if key_to_delete in d:
del d(key_to_delete)
# Recursively process all items in the dictionary
for k, v in d.items():
delete_keys_recursive(v, key_to_delete)
elif isinstance(d, list):
# Recursively process all items in the list
for item in d:
delete_keys_recursive(item, key_to_delete)
  1. One-time or single-shot prompts for enums: Gemini sometimes struggles with enums, where all possible values ​​are listed instead of a single selection. The values ​​are also separated by “|” in a single string, making them invalid according to our scheme. Use one-shot prompting, where you provide a properly formatted example, to guide it to the desired behavior.

By applying these transformations and providing clear examples, you can successfully generate structured JSON output using the Gemini API.

Conclusion

JSON mode allows you to extract structured data directly from your LLMs, making them more useful for practical applications. While frameworks like Llama.cpp provide simple implementations, you may run into issues with cloud services like Gemini API.

Hopefully this blog has given you a better practical understanding of how JSON mode works and how you can use it, even when using Gemini’s API, which is only partially supported so far.

Now that I have gotten Gemini to work somewhat with JSON mode, I can finish implementing my LLM workflow, which requires structuring data in a specific way.

The main code of this post can be found here: https://gist.github.com/CVxTz/8eace07d9bd2c5123a89bf790b5cc39e


How to Get JSON Output from LLMs: A Practical Guide was originally published in Towards Data Science on Medium, where people continued the conversation by bookmarking and commenting on this story.