JSON Mode in LLMs: How to Get Structured Outputs Every Time

Getting an LLM to return valid, parseable JSON consistently is one of those problems that sounds trivial until you're debugging a production outage at 2am because the model returned "Here is the JSON you requested:" followed by a Markdown code fence around what you needed.

This guide covers every reliable approach: the json_object response format, JSON schema-constrained outputs, and the instructor library with Pydantic models. By the end you'll know which technique fits your situation and how to implement each one.

Why Plain Prompting Fails

If you've tried asking a model to "respond only in JSON," you already know the failure modes:

The model wraps JSON in a Markdown code fence (```json ... ```)
It adds a sentence before or after ("Sure! Here's the JSON:")
It generates JSON with the wrong keys or structure
It produces valid JSON but with hallucinated extra fields
It returns valid JSON 95% of the time and broken JSON the other 5%

That 5% is what kills you in production. Prompt engineering can reduce failures but can't eliminate them. JSON mode and schema constraints are the right tool.

Approach 1: JSON Object Mode

The simplest way to force JSON output is response_format={"type": "json_object"}. The API guarantees the response will be a valid JSON object -- no prose, no code fences, no preamble.

from openai import OpenAI
import json

client = OpenAI(
    api_key="your_generalcompute_api_key",
    base_url="https://api.generalcompute.com/v1",
)

response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=[
        {
            "role": "system",
            "content": "You are a data extraction assistant. Always respond with JSON.",
        },
        {
            "role": "user",
            "content": "Extract the company name, founding year, and headquarters city from: 'Acme Corp was founded in 1987 in Austin, Texas.'",
        },
    ],
    response_format={"type": "json_object"},
)

data = json.loads(response.choices[0].message.content)
print(data)
# {"company_name": "Acme Corp", "founding_year": 1987, "headquarters_city": "Austin"}

A few things to keep in mind with json_object mode:

You still need to describe the shape in your prompt. The API guarantees syntactically valid JSON but doesn't constrain which keys appear or what types the values have. If you ask for company info, the model picks the field names. This is fine for one-off tasks but unreliable for anything that parses downstream.

Include the word "JSON" in your system prompt. Some implementations require this for the mode to activate correctly. A safe pattern: "Respond only with a JSON object containing..." followed by a description of the expected fields.

The model can return nested objects and arrays. There's no depth limit -- you can get back complex structures as long as you describe them in your prompt.

Approach 2: JSON Schema Mode

JSON schema mode goes further: you provide a JSON schema, and the model is constrained to return output that matches it exactly. This is the approach to use when you need guaranteed field names, types, and structure.

response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=[
        {
            "role": "user",
            "content": "Extract structured data from: 'Sarah Chen, senior software engineer at DataFlow, reached out via sarah@dataflow.io to discuss the Q3 integration project.'",
        }
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "contact_extraction",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "title": {"type": "string"},
                    "company": {"type": "string"},
                    "email": {"type": "string"},
                    "topic": {"type": "string"},
                },
                "required": ["name", "title", "company", "email", "topic"],
                "additionalProperties": False,
            },
        },
    },
)

data = json.loads(response.choices[0].message.content)
# Guaranteed: data["name"], data["email"], etc. always present and typed correctly

With strict: True, the model won't add fields not in the schema and won't omit required fields. This is what you want for ETL pipelines, API responses, and anything that feeds into typed application code.

Building Schemas for Common Patterns

Arrays of objects:

schema = {
    "type": "object",
    "properties": {
        "items": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "product_name": {"type": "string"},
                    "quantity": {"type": "integer"},
                    "unit_price": {"type": "number"},
                },
                "required": ["product_name", "quantity", "unit_price"],
                "additionalProperties": False,
            },
        }
    },
    "required": ["items"],
    "additionalProperties": False,
}

Enums for categorical fields:

"sentiment": {
    "type": "string",
    "enum": ["positive", "negative", "neutral", "mixed"]
}

Optional fields using anyOf:

"middle_name": {
    "anyOf": [
        {"type": "string"},
        {"type": "null"}
    ]
}

Nested objects:

"address": {
    "type": "object",
    "properties": {
        "street": {"type": "string"},
        "city": {"type": "string"},
        "zip": {"type": "string"},
        "country": {"type": "string"},
    },
    "required": ["street", "city", "zip", "country"],
    "additionalProperties": False,
}

Nesting works to arbitrary depth. The constraint engine handles it correctly as long as your schema is valid JSON Schema.

Approach 3: Pydantic + Instructor

Writing JSON schemas by hand gets tedious fast. The instructor library lets you define your output structure as a Pydantic model and handles the schema generation, API call, and response parsing automatically.

pip install instructor pydantic

import instructor
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Optional

raw_client = OpenAI(
    api_key="your_generalcompute_api_key",
    base_url="https://api.generalcompute.com/v1",
)

client = instructor.from_openai(raw_client)

class ContactInfo(BaseModel):
    name: str
    title: str
    company: str
    email: str
    topic: Optional[str] = None

contact = client.chat.completions.create(
    model="llama-4-maverick",
    response_model=ContactInfo,
    messages=[
        {
            "role": "user",
            "content": "Extract: 'Sarah Chen, senior engineer at DataFlow, reached out via sarah@dataflow.io about the Q3 integration project.'",
        }
    ],
)

print(contact.name)        # "Sarah Chen"
print(contact.company)     # "DataFlow"
print(contact.email)       # "sarah@dataflow.io"

Instructor returns a fully validated Pydantic instance. If the model returns something that doesn't match -- wrong type, missing required field -- instructor automatically retries with a correction prompt. The retry budget is configurable.

Nested Models

Pydantic handles nested models cleanly:

from pydantic import BaseModel
from typing import List

class LineItem(BaseModel):
    description: str
    quantity: int
    unit_price: float

    @property
    def total(self) -> float:
        return self.quantity * self.unit_price

class Invoice(BaseModel):
    invoice_number: str
    customer_name: str
    items: List[LineItem]
    notes: Optional[str] = None

    @property
    def subtotal(self) -> float:
        return sum(item.total for item in self.items)

invoice = client.chat.completions.create(
    model="llama-4-maverick",
    response_model=Invoice,
    messages=[
        {
            "role": "user",
            "content": """
            Extract the invoice data:
            Invoice #2847 for Acme Corp.
            - 3x Widget A at $12.50 each
            - 1x Widget B at $87.00
            Note: Net 30 payment terms.
            """,
        }
    ],
)

print(f"Invoice: {invoice.invoice_number}")
print(f"Customer: {invoice.customer_name}")
for item in invoice.items:
    print(f"  {item.description}: {item.quantity} x ${item.unit_price:.2f}")
print(f"Subtotal: ${invoice.subtotal:.2f}")

Instructor translates the Pydantic model's type hints and field annotations into a JSON schema it passes to the model. The result comes back as a validated Python object, not a dict you have to type-check yourself.

Field Descriptions for Better Results

Pydantic's Field(description=...) becomes part of the schema that instructor sends to the model. This is worth using -- descriptions guide the model when field names are ambiguous:

from pydantic import BaseModel, Field
from typing import Literal

class SupportTicket(BaseModel):
    summary: str = Field(description="One-sentence summary of the issue")
    category: Literal["billing", "technical", "account", "feature_request"]
    urgency: Literal["low", "medium", "high", "critical"] = Field(
        description="Urgency based on business impact. Critical = service down."
    )
    affected_users: int = Field(
        description="Estimated number of users affected. Use 1 if only the reporter."
    )
    needs_escalation: bool = Field(
        description="True if the ticket requires a senior engineer or manager."
    )

The descriptions become part of the prompt context. "Use 1 if only the reporter" is the kind of disambiguation that would otherwise require careful prompt engineering.

Approach 4: Manual Schema + JSON Parse (Fallback Pattern)

Sometimes you want schema validation without the instructor dependency, or you're working in a language without a good instructor port. Here's a minimal validation pattern in Python:

import json
from jsonschema import validate, ValidationError

schema = {
    "type": "object",
    "properties": {
        "score": {"type": "integer", "minimum": 1, "maximum": 10},
        "reasoning": {"type": "string"},
        "pass": {"type": "boolean"},
    },
    "required": ["score", "reasoning", "pass"],
    "additionalProperties": False,
}

response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=[{"role": "user", "content": "Rate this code review comment: 'LGTM'"}],
    response_format={"type": "json_object"},
)

raw = response.choices[0].message.content

try:
    data = json.loads(raw)
    validate(instance=data, schema=schema)
except (json.JSONDecodeError, ValidationError) as e:
    raise ValueError(f"Model returned invalid output: {e}")

jsonschema is a standard library that validates a Python dict against a JSON Schema. Combined with json_object mode (which guarantees parseable JSON), you get validation without much overhead.

Node.js Example

The same approaches work in Node.js with the OpenAI SDK:

import OpenAI from "openai";
import { z } from "zod";

const client = new OpenAI({
  apiKey: process.env.GENERALCOMPUTE_API_KEY,
  baseURL: "https://api.generalcompute.com/v1",
});

// JSON object mode
const response = await client.chat.completions.create({
  model: "llama-4-maverick",
  messages: [
    {
      role: "user",
      content:
        "Return JSON with fields: language, framework, purpose for: 'A Next.js app for scheduling appointments'",
    },
  ],
  response_format: { type: "json_object" },
});

const data = JSON.parse(response.choices[0].message.content ?? "{}");

// Validate with Zod
const AppInfoSchema = z.object({
  language: z.string(),
  framework: z.string(),
  purpose: z.string(),
});

const parsed = AppInfoSchema.parse(data);
console.log(parsed.framework); // "Next.js"

For Node.js, Zod fills the role that Pydantic fills in Python. There's no Node.js port of instructor with the same retry mechanics, but combining json_schema mode with Zod validation covers most use cases.

When to Use Each Approach

| Situation | Recommended approach | |---|---| | Simple one-off extraction, shape doesn't matter much | json_object mode | | Fixed schema, no external deps | json_schema mode with hand-written schema | | Python app, complex nested models | instructor + Pydantic | | Node.js app | json_schema mode + Zod validation | | Need retry on validation failure | instructor (handles this automatically) | | Batch processing with strict contracts | json_schema mode, strict: True |

A Note on Model Choice

Not all models handle structured outputs equally well. Larger models follow complex nested schemas more reliably. For simple flat objects, a smaller fast model works fine. For deep nesting, multiple optional fields, or schemas with many enum constraints, you'll get better accuracy from a larger model.

If you're running high-volume extraction where cost matters, test your schema against both a smaller and larger model before committing. The difference in schema compliance rate often justifies either direction depending on the task.

Getting Started

The GeneralCompute API is OpenAI-compatible, so all the examples above work by pointing your client at https://api.generalcompute.com/v1. JSON mode and schema-constrained outputs are supported across all chat completion models.

If you're building a data pipeline, extraction service, or any workflow where downstream code depends on the LLM's output shape, schema constraints are worth using from the start. The few minutes it takes to write a Pydantic model or JSON schema will save you from debugging malformed outputs in production.

Check the GeneralCompute docs for the full list of supported models and their structured output capabilities.