/ Introduction to Wan 2.1 Models
By Wan 2.1
07 Mar 2025
09
1203
Wan 2.1 represents a groundbreaking suite of open-source video foundation models that sets new standards in video generation technology. This article explores its key features and capabilities.
There is a new leader in open source video generation! Alibaba's new Wan 2.1 model is now the leading open weights model in the Artificial Analysis Video Arena, surpassing former titleholder Mochi 1
Wan 2.1 is a 14B parameter model (1.3B variant also released) and stands out for its ability to generate realistic looking video with high-fidelity motion.
Key details regarding Wan 2.1: ➤ The 14B model is available in image to video, and text to video variants. The 1.3B model only supports text to video ➤ The 14B parameter model supports 720p output while the 1.3B model outputs at 480p ➤ Generates natively at 16 fps. Compared to other models that generate at 24 fps, this can result in a slight stuttering effect ➤ Supports multilingual text input in both English and Chinese ➤ The 1.3B model only requires 8.2GB of VRAM, allowing many consumer grade GPUs to support inferencing the model. Alibaba claims a RTX 4090 can generate a 5 second 480p video in ~4 minutes
See thread below for comparisons between Wan 2.1, Veo 2 and other leading models in our arena 🧵
Wan 2.1 consistently outperforms both existing open-source models and commercial solutions across multiple benchmarks. Its comprehensive evaluation across 14 major dimensions and 26 sub-dimensions demonstrates superior capabilities in motion quality, visual quality, style rendering, and multi-targeting scenarios.
One of the most remarkable aspects of Wan 2.1 is its accessibility. The T2V-1.3B model requires only 8.19 GB VRAM, making it compatible with consumer-grade GPUs. On an RTX 4090, it can generate a 5-second 480P video in approximately 4 minutes without any optimization techniques.
Wan 2.1 excels in multiple tasks including:
A unique feature of Wan 2.1 is its ability to generate both Chinese and English text within videos, making it the first video model with bilingual text generation capabilities.
The Wan-VAE component delivers exceptional efficiency in:
Wan 2.1 represents a significant advancement in video generation technology, offering state-of-the-art performance while maintaining accessibility for consumer-grade hardware. Its comprehensive feature set and multiple model variants make it a versatile solution for various video generation needs.
A comprehensive overview of Wan 2.1 video foundation models
A comprehensive guide on how to generate AI videos with Wan 2.1
An in-depth analysis of two leading video generation models