Wan 2.7 is Alibaba Tongyi Lab's flagship open-source video generation model, released late March 2026. It produces 15-second 1080p clips with industry-first first/last-frame control, native synchronized audio, multi-reference conditioning (up to 5 videos), and instruction-based editing.
Wan 2.7 is Alibaba's flagship open-source video generation model, publicly released in late March 2026 (Wan 2.7-Video and Wan 2.7-Image), with broader API and DashScope availability rolled out in early April 2026. It is the third major iteration of the Wan video family from Alibaba's Tongyi Lab and the first to introduce native audio output, first/last frame control, video-to-video editing, and a "thinking mode" for image generation.
Earlier in April 2026, Alibaba made waves by topping the Artificial Analysis text-to-video leaderboard under the codename "HappyHorse-1.0" — that model was Alibaba's stealth-tested precursor to Wan 2.7. Combined with the Wan 2.2 series (the first open-source video model with MoE architecture), Alibaba is now arguably the top open-weight video generation lab globally.
First-and-Last-Frame Control: Specify both opening and closing frames; the model generates everything in between. Industry-first capability for an open-source video model.
15-Second Clips: 3× the duration of earlier Wan generations; competitive with frontier closed models.
Multi-Reference Conditioning: Up to 5 reference video inputs to guide character continuity, environment style, and motion patterns simultaneously.
Native Audio: Audio output baked into the generation pipeline — synchronized with the video, removing the need for a separate post-hoc audio model.
Instruction-Based Video Editing: Change backgrounds, lighting, or style via natural language; full inpaint/outpaint and re-style of existing video.
Thinking Mode (Image): Wan 2.7-Image incorporates a reasoning step before generation, improving prompt adherence and composition logic.
Open-Source Distribution: Free to run locally on consumer hardware for many tier variants — a significant differentiator vs. closed competitors like Veo 4, Runway Gen-4.5, Kling 2.6, and Hailuo 2.3.
While Wan 2.7 reaches 1080p / 15 seconds, [[Google DeepMind/Veo 4|Veo 4]] still leads on absolute frontier specs (4K @ 120fps, 15–30s clips). Audio quality and speech-sync are competitive but trail Veo 4 and [[Kuaishou/Kling 2.6|Kling 2.6]] on cinematic dialog scenes. As with other Chinese-origin models, U.S. enterprise adoption involves additional review around data handling and compliance.
May 9, 2026