Grok Imagine is xAI's flagship text-to-video and image-to-video model with synchronized audio — ranked #1 on public text-to-video leaderboards as of May 2026. v1.0 (Feb 2026) delivers 10-second 720p clips, and Extend from Frame (Mar 2026) chains 15-second continuous sequences.
Grok Imagine is xAI's flagship text-to-video, image-to-video, and synchronized-audio video generation model. Originally launched in July 2025 with six-second text-to-video clips that included audio, Grok Imagine evolved rapidly through 2026: an API launched January 28, 2026 ($0.05/second), v1.0 shipped February 3, 2026 (10-second 720p clips with what xAI called its "biggest leap yet" in prompt-following accuracy), and "Extend from Frame" chaining shipped March 2, 2026 — enabling sequential 15-second clips that share visual continuity through their final-frame handoff.
As of May 2026, Grok Imagine is the #1 model on public text-to-video leaderboards (arena score 724, ahead of Google's Veo 3.1 at 618 and Alibaba's WAN Video 2.6 at 577), and generated approximately 1.25 billion videos in January 2026 alone — a scale of consumer adoption matched by few generative-video models.
Text-to-Video: Generate short clips with synchronized audio from natural-language prompts.
Image-to-Video: Animate static images using the same prompt-driven control surface.
Video Editing: Restyle scenes, add or remove objects, control motion across clips.
Best-in-Class Instruction Following: xAI describes Grok Imagine as having best-in-class prompt-following among generative video models — with v1.0 specifically marketed as the largest single improvement on this dimension.
Synchronized Audio: Generates audio (ambient sound, effects, dialogue) aligned to the visual content rather than as a separate post-process.
Extend from Frame: Final frame of one clip becomes the first frame of the next, enabling longer, continuous sequences while preserving character / scene continuity.
10-second base clip duration trails Google Veo 4 (15–30 seconds) and Luma Ray3 (HDR / longer outputs). 720p is below the 4K offerings of Veo 4 and LTX-2. Public leaderboard scores are an aggregate of human preference judgments and don't speak directly to physical-realism or character-consistency limits. Grok Imagine's volume use case is consumer-social rather than professional production — pro-grade controllability still favors Runway Gen-4.5 and Veo 4 in studio workflows.
May 8, 2026