Perplexity Sonar is the search-grounded LLM family powering Perplexity's AI-native search, built on Llama 3.3 with proprietary retrieval orchestration. The lineup spans Sonar, Sonar Pro, Sonar Reasoning, and Online Models — optimized for citation-first synthesis at $1–$3/$15 per million tokens.
Perplexity Sonar is the search-grounded LLM family powering Perplexity's AI-native search products and developer API. Sonar is built on Meta's Llama 3.3 foundation with proprietary tuning and retrieval orchestration optimized specifically for the synthesis-with-citations workflow that defines Perplexity. Rather than competing as a general-purpose frontier base model, Sonar is purpose-built for one job: take a user query, retrieve relevant web context, and produce a synthesized answer with explicit source citations.
The Sonar lineup includes a base Sonar model ($1 per million tokens for both input and output), Sonar Pro ($3 input / $15 output per million), Sonar Reasoning (reasoning-tuned variant), and Sonar Online Models (continuously web-grounded for real-time queries). Perplexity's $750M Microsoft Azure infrastructure commitment (January 2026) supports the scale at which these models serve Perplexity's consumer product (~$200M ARR by February 2026, up from ~$80M ARR in late 2024) and the growing third-party developer API.
Search-Grounded Synthesis (defining capability): Sonar is built specifically for the workflow of retrieving web content, synthesizing across sources, and producing answers with explicit citations. This is the technical foundation of Perplexity's product differentiation.
Citation-First Output: Outputs include citation markers tied to retrieved source content — a workflow that has become the reference template for AI-native search.
Real-Time Web Grounding (Online Models): Sonar Online Models maintain continuous web grounding — useful for queries about current events, breaking news, and recently updated information.
Reasoning Variant: Sonar Reasoning extends the base model with deeper deliberation for queries that require multi-step inference rather than pure retrieval-and-synthesis.
Cost-Effective Pricing: At $1/M tokens for the base model, Sonar is among the cheapest production LLM APIs — explicitly designed for high-volume search-style workloads where latency and cost matter as much as raw capability.
Llama 3.3 Foundation: Inherits Llama 3.3's general-purpose capabilities (instruction following, coding, multilingual) as a baseline, with retrieval orchestration layered on top.
Not a Frontier Base Model: Sonar is purpose-built for search-and-synthesis. For tasks that don't benefit from retrieval grounding (creative writing, novel reasoning problems, code generation in unfamiliar contexts), frontier base models (GPT-5.5, Claude Opus 4.7, Gemini 3.1 Pro) generally outperform Sonar.
Retrieval Quality Bound: Sonar's outputs are bounded by the quality and relevance of retrieved sources. Queries about content not well-covered on the open web often produce weaker results than queries with strong retrieval matches.
Citation Reliability: Although Sonar produces citations, the citations don't guarantee that the cited sources actually support the synthesized claim. Verification is still required for high-stakes use cases.
Copyright Litigation: Perplexity's citation-and-synthesis approach has been contested by publishers as substituting for source publishers' own audiences. Ongoing legal disputes may affect how Sonar can retrieve and cite copyrighted news content over time.
Foundation Model Dependency: Sonar is built on Meta's Llama foundation, which Perplexity does not control. If Meta changes Llama licensing or capabilities, or if Perplexity wants to switch foundations, the migration cost would be meaningful.
May 7, 2026