Perplexity Sonar

Summary

Perplexity Sonar is the search-grounded LLM family powering Perplexity's AI-native search, built on Llama 3.3 with proprietary retrieval orchestration. The lineup spans Sonar, Sonar Pro, Sonar Reasoning, and Online Models — optimized for citation-first synthesis at $1–$3/$15 per million tokens.

Overview

Perplexity Sonar is the search-grounded LLM family powering Perplexity's AI-native search products and developer API. Sonar is built on Meta's Llama 3.3 foundation with proprietary tuning and retrieval orchestration optimized specifically for the synthesis-with-citations workflow that defines Perplexity. Rather than competing as a general-purpose frontier base model, Sonar is purpose-built for one job: take a user query, retrieve relevant web context, and produce a synthesized answer with explicit source citations.

The Sonar lineup includes a base Sonar model ($1 per million tokens for both input and output), Sonar Pro ($3 input / $15 output per million), Sonar Reasoning (reasoning-tuned variant), and Sonar Online Models (continuously web-grounded for real-time queries). Perplexity's $750M Microsoft Azure infrastructure commitment (January 2026) supports the scale at which these models serve Perplexity's consumer product (~$200M ARR by February 2026, up from ~$80M ARR in late 2024) and the growing third-party developer API.

Specifications

Developer: Perplexity
Foundation: Meta Llama 3.3 (base) with Perplexity proprietary tuning + retrieval orchestration
Type: Search-grounded large language model with citation-first design
Variants: Sonar (base), Sonar Pro, Sonar Reasoning, Sonar Online Models
Pricing:
- Sonar (base): $1 per million tokens (in/out)
- Sonar Pro: $3 per million input / $15 per million output
Distribution: Perplexity consumer app + Perplexity developer API + Sonar API
Strategic Positioning: Search-grounded synthesis with explicit source citations

Capabilities

Search-Grounded Synthesis (defining capability): Sonar is built specifically for the workflow of retrieving web content, synthesizing across sources, and producing answers with explicit citations. This is the technical foundation of Perplexity's product differentiation.

Citation-First Output: Outputs include citation markers tied to retrieved source content — a workflow that has become the reference template for AI-native search.

Real-Time Web Grounding (Online Models): Sonar Online Models maintain continuous web grounding — useful for queries about current events, breaking news, and recently updated information.

Reasoning Variant: Sonar Reasoning extends the base model with deeper deliberation for queries that require multi-step inference rather than pure retrieval-and-synthesis.

Cost-Effective Pricing: At $1/M tokens for the base model, Sonar is among the cheapest production LLM APIs — explicitly designed for high-volume search-style workloads where latency and cost matter as much as raw capability.

Llama 3.3 Foundation: Inherits Llama 3.3's general-purpose capabilities (instruction following, coding, multilingual) as a baseline, with retrieval orchestration layered on top.

Limitations

Not a Frontier Base Model: Sonar is purpose-built for search-and-synthesis. For tasks that don't benefit from retrieval grounding (creative writing, novel reasoning problems, code generation in unfamiliar contexts), frontier base models (GPT-5.5, Claude Opus 4.7, Gemini 3.1 Pro) generally outperform Sonar.

Retrieval Quality Bound: Sonar's outputs are bounded by the quality and relevance of retrieved sources. Queries about content not well-covered on the open web often produce weaker results than queries with strong retrieval matches.

Citation Reliability: Although Sonar produces citations, the citations don't guarantee that the cited sources actually support the synthesized claim. Verification is still required for high-stakes use cases.

Copyright Litigation: Perplexity's citation-and-synthesis approach has been contested by publishers as substituting for source publishers' own audiences. Ongoing legal disputes may affect how Sonar can retrieve and cite copyrighted news content over time.

Foundation Model Dependency: Sonar is built on Meta's Llama foundation, which Perplexity does not control. If Meta changes Llama licensing or capabilities, or if Perplexity wants to switch foundations, the migration cost would be meaningful.

Recent Developments

$750M Azure Infrastructure (January 2026): Perplexity committed $750M to Microsoft Azure infrastructure, signaling aggressive scaling of Sonar inference capacity.
Pricing Stability: Base Sonar at $1/M tokens has positioned the model as one of the cheapest production LLM APIs, supporting Perplexity's consumer search workflow at scale.
API Adoption: Sonar API for third-party developers launched in early 2025; the API has continued to grow alongside the consumer Perplexity product.
Industry Position: Sonar represents the canonical example of "purpose-built model on top of foundation weights" — Perplexity's success demonstrates that the application layer can build durable competitive position without training a base model from scratch.

Last Updated

May 7, 2026

→ Back to Models