JSON Mode in LLMs: How to Get Structured Outputs Every Time
Getting an LLM to return valid, parseable JSON consistently is one of those problems that sounds trivial until you're debugging a production outage at 2am because the model returned "Here is the JSON you requested:" followed by a Markdown code fence around what you needed.
This guide covers every reliable approach: the json_object response format, JSON schema-constrained outputs, and the instructor library with Pydantic models. By the end you'll know which technique fits your situation and how to implement each one.
Why Plain Prompting Fails
If you've tried asking a model to "respond only in JSON," you already know the failure modes:
- The model wraps JSON in a Markdown code fence (
```json ... ```) - It adds a sentence before or after ("Sure! Here's the JSON:")
- It generates JSON with the wrong keys or structure
- It produces valid JSON but with hallucinated extra fields
- It returns valid JSON 95% of the time and broken JSON the other 5%
That 5% is what kills you in production. Prompt engineering can reduce failures but can't eliminate them. JSON mode and schema constraints are the right tool.
Approach 1: JSON Object Mode
The simplest way to force JSON output is response_format={"type": "json_object"}. The API guarantees the response will be a valid JSON object -- no prose, no code fences, no preamble.
from openai import OpenAI import json client = OpenAI( api_key="your_generalcompute_api_key", base_url="https://api.generalcompute.com/v1", ) response = client.chat.completions.create( model="llama-4-maverick", messages=[ { "role": "system", "content": "You are a data extraction assistant. Always respond with JSON.", }, { "role": "user", "content": "Extract the company name, founding year, and headquarters city from: 'Acme Corp was founded in 1987 in Austin, Texas.'", }, ], response_format={"type": "json_object"}, ) data = json.loads(response.choices[0].message.content) print(data) # {"company_name": "Acme Corp", "founding_year": 1987, "headquarters_city": "Austin"}
A few things to keep in mind with json_object mode:
You still need to describe the shape in your prompt. The API guarantees syntactically valid JSON but doesn't constrain which keys appear or what types the values have. If you ask for company info, the model picks the field names. This is fine for one-off tasks but unreliable for anything that parses downstream.
Include the word "JSON" in your system prompt. Some implementations require this for the mode to activate correctly. A safe pattern: "Respond only with a JSON object containing..." followed by a description of the expected fields.
The model can return nested objects and arrays. There's no depth limit -- you can get back complex structures as long as you describe them in your prompt.
Approach 2: JSON Schema Mode
JSON schema mode goes further: you provide a JSON schema, and the model is constrained to return output that matches it exactly. This is the approach to use when you need guaranteed field names, types, and structure.
response = client.chat.completions.create( model="llama-4-maverick", messages=[ { "role": "user", "content": "Extract structured data from: 'Sarah Chen, senior software engineer at DataFlow, reached out via sarah@dataflow.io to discuss the Q3 integration project.'", } ], response_format={ "type": "json_schema", "json_schema": { "name": "contact_extraction", "strict": True, "schema": { "type": "object", "properties": { "name": {"type": "string"}, "title": {"type": "string"}, "company": {"type": "string"}, "email": {"type": "string"}, "topic": {"type": "string"}, }, "required": ["name", "title", "company", "email", "topic"], "additionalProperties": False, }, }, }, ) data = json.loads(response.choices[0].message.content) # Guaranteed: data["name"], data["email"], etc. always present and typed correctly
With strict: True, the model won't add fields not in the schema and won't omit required fields. This is what you want for ETL pipelines, API responses, and anything that feeds into typed application code.
Building Schemas for Common Patterns
Arrays of objects:
schema = { "type": "object", "properties": { "items": { "type": "array", "items": { "type": "object", "properties": { "product_name": {"type": "string"}, "quantity": {"type": "integer"}, "unit_price": {"type": "number"}, }, "required": ["product_name", "quantity", "unit_price"], "additionalProperties": False, }, } }, "required": ["items"], "additionalProperties": False, }
Enums for categorical fields:
"sentiment": { "type": "string", "enum": ["positive", "negative", "neutral", "mixed"] }
Optional fields using anyOf:
"middle_name": { "anyOf": [ {"type": "string"}, {"type": "null"} ] }
Nested objects:
"address": { "type": "object", "properties": { "street": {"type": "string"}, "city": {"type": "string"}, "zip": {"type": "string"}, "country": {"type": "string"}, }, "required": ["street", "city", "zip", "country"], "additionalProperties": False, }
Nesting works to arbitrary depth. The constraint engine handles it correctly as long as your schema is valid JSON Schema.
Approach 3: Pydantic + Instructor
Writing JSON schemas by hand gets tedious fast. The instructor library lets you define your output structure as a Pydantic model and handles the schema generation, API call, and response parsing automatically.
pip install instructor pydantic
import instructor from openai import OpenAI from pydantic import BaseModel, Field from typing import Optional raw_client = OpenAI( api_key="your_generalcompute_api_key", base_url="https://api.generalcompute.com/v1", ) client = instructor.from_openai(raw_client) class ContactInfo(BaseModel): name: str title: str company: str email: str topic: Optional[str] = None contact = client.chat.completions.create( model="llama-4-maverick", response_model=ContactInfo, messages=[ { "role": "user", "content": "Extract: 'Sarah Chen, senior engineer at DataFlow, reached out via sarah@dataflow.io about the Q3 integration project.'", } ], ) print(contact.name) # "Sarah Chen" print(contact.company) # "DataFlow" print(contact.email) # "sarah@dataflow.io"
Instructor returns a fully validated Pydantic instance. If the model returns something that doesn't match -- wrong type, missing required field -- instructor automatically retries with a correction prompt. The retry budget is configurable.
Nested Models
Pydantic handles nested models cleanly:
from pydantic import BaseModel from typing import List class LineItem(BaseModel): description: str quantity: int unit_price: float @property def total(self) -> float: return self.quantity * self.unit_price class Invoice(BaseModel): invoice_number: str customer_name: str items: List[LineItem] notes: Optional[str] = None @property def subtotal(self) -> float: return sum(item.total for item in self.items) invoice = client.chat.completions.create( model="llama-4-maverick", response_model=Invoice, messages=[ { "role": "user", "content": """ Extract the invoice data: Invoice #2847 for Acme Corp. - 3x Widget A at $12.50 each - 1x Widget B at $87.00 Note: Net 30 payment terms. """, } ], ) print(f"Invoice: {invoice.invoice_number}") print(f"Customer: {invoice.customer_name}") for item in invoice.items: print(f" {item.description}: {item.quantity} x ${item.unit_price:.2f}") print(f"Subtotal: ${invoice.subtotal:.2f}")
Instructor translates the Pydantic model's type hints and field annotations into a JSON schema it passes to the model. The result comes back as a validated Python object, not a dict you have to type-check yourself.
Field Descriptions for Better Results
Pydantic's Field(description=...) becomes part of the schema that instructor sends to the model. This is worth using -- descriptions guide the model when field names are ambiguous:
from pydantic import BaseModel, Field from typing import Literal class SupportTicket(BaseModel): summary: str = Field(description="One-sentence summary of the issue") category: Literal["billing", "technical", "account", "feature_request"] urgency: Literal["low", "medium", "high", "critical"] = Field( description="Urgency based on business impact. Critical = service down." ) affected_users: int = Field( description="Estimated number of users affected. Use 1 if only the reporter." ) needs_escalation: bool = Field( description="True if the ticket requires a senior engineer or manager." )
The descriptions become part of the prompt context. "Use 1 if only the reporter" is the kind of disambiguation that would otherwise require careful prompt engineering.
Approach 4: Manual Schema + JSON Parse (Fallback Pattern)
Sometimes you want schema validation without the instructor dependency, or you're working in a language without a good instructor port. Here's a minimal validation pattern in Python:
import json from jsonschema import validate, ValidationError schema = { "type": "object", "properties": { "score": {"type": "integer", "minimum": 1, "maximum": 10}, "reasoning": {"type": "string"}, "pass": {"type": "boolean"}, }, "required": ["score", "reasoning", "pass"], "additionalProperties": False, } response = client.chat.completions.create( model="llama-4-maverick", messages=[{"role": "user", "content": "Rate this code review comment: 'LGTM'"}], response_format={"type": "json_object"}, ) raw = response.choices[0].message.content try: data = json.loads(raw) validate(instance=data, schema=schema) except (json.JSONDecodeError, ValidationError) as e: raise ValueError(f"Model returned invalid output: {e}")
jsonschema is a standard library that validates a Python dict against a JSON Schema. Combined with json_object mode (which guarantees parseable JSON), you get validation without much overhead.
Node.js Example
The same approaches work in Node.js with the OpenAI SDK:
import OpenAI from "openai"; import { z } from "zod"; const client = new OpenAI({ apiKey: process.env.GENERALCOMPUTE_API_KEY, baseURL: "https://api.generalcompute.com/v1", }); // JSON object mode const response = await client.chat.completions.create({ model: "llama-4-maverick", messages: [ { role: "user", content: "Return JSON with fields: language, framework, purpose for: 'A Next.js app for scheduling appointments'", }, ], response_format: { type: "json_object" }, }); const data = JSON.parse(response.choices[0].message.content ?? "{}"); // Validate with Zod const AppInfoSchema = z.object({ language: z.string(), framework: z.string(), purpose: z.string(), }); const parsed = AppInfoSchema.parse(data); console.log(parsed.framework); // "Next.js"
For Node.js, Zod fills the role that Pydantic fills in Python. There's no Node.js port of instructor with the same retry mechanics, but combining json_schema mode with Zod validation covers most use cases.
When to Use Each Approach
| Situation | Recommended approach |
|---|---|
| Simple one-off extraction, shape doesn't matter much | json_object mode |
| Fixed schema, no external deps | json_schema mode with hand-written schema |
| Python app, complex nested models | instructor + Pydantic |
| Node.js app | json_schema mode + Zod validation |
| Need retry on validation failure | instructor (handles this automatically) |
| Batch processing with strict contracts | json_schema mode, strict: True |
A Note on Model Choice
Not all models handle structured outputs equally well. Larger models follow complex nested schemas more reliably. For simple flat objects, a smaller fast model works fine. For deep nesting, multiple optional fields, or schemas with many enum constraints, you'll get better accuracy from a larger model.
If you're running high-volume extraction where cost matters, test your schema against both a smaller and larger model before committing. The difference in schema compliance rate often justifies either direction depending on the task.
Getting Started
The GeneralCompute API is OpenAI-compatible, so all the examples above work by pointing your client at https://api.generalcompute.com/v1. JSON mode and schema-constrained outputs are supported across all chat completion models.
If you're building a data pipeline, extraction service, or any workflow where downstream code depends on the LLM's output shape, schema constraints are worth using from the start. The few minutes it takes to write a Pydantic model or JSON schema will save you from debugging malformed outputs in production.
Check the GeneralCompute docs for the full list of supported models and their structured output capabilities.