Introducing VACE for Wan 2.1 Open-Source Model

  • Home
  • / Introducing VACE for Wan 2.1 Open-Source Model

image

13 Mar 2025

12

2341

What is VACE?

VACE (Visual and Audio-Visual Creation and Editing) is a unified framework developed by Alibaba’s Tongyi Lab, designed to streamline video generation, editing, and enhancement through a single model. It introduces multi-modal flexibility and task versatility , supporting operations like:

  • Move-Anything : Dynamically relocate objects within a video (e.g., a boy chasing adventure across a sunlit frame) .

<video src="https://ali-vilab.github.io/VACE-Page/assets/videos/MoveAnything_1.mp4" className="w-96 max-w-full rounded-lg border border-gray-200 dark:border-gray-700"

  • Swap-Anything : Replace backgrounds or subjects seamlessly (e.g., transforming a desert into a snowy landscape) .

<video src="https://ali-vilab.github.io/VACE-Page/assets/videos/SwapAnything_2.mp4" className="w-96 max-w-full rounded-lg border border-gray-200 dark:border-gray-700"

  • Animate-Anything : Convert static images into cinematic animations (e.g., a French café scene with a lion sipping coffee) .

<video src="https://ali-vilab.github.io/VACE-Page/assets/videos/AnimateAnything_1.mp4" className="w-96 max-w-full rounded-lg border border-gray-200 dark:border-gray-700"

  • Expand-Anything : Extend video boundaries for enhanced storytelling (e.g., panoramic vineyard views) .

<video src="https://ali-vilab.github.io/VACE-Page/assets/videos/ExpandAnything_1.mp4" className="w-96 max-w-full rounded-lg border border-gray-200 dark:border-gray-700"

  • Reference-Anything : Generate videos based on reference images (e.g., a cat playing with a ball) .

<video src="https://ali-vilab.github.io/VACE-Page/assets/videos/ReferenceAnything_1.mp4" className="w-96 max-w-full rounded-lg border border-gray-200 dark:border-gray-700"

  • Swap-Anything : Replace backgrounds or subjects seamlessly (e.g., transforming a desert into a snowy landscape) .

<video src="https://ali-vilab.github.io/VACE-Page/assets/videos/SwapAnything_1.mp4" className="w-96 max-w-full rounded-lg border border-gray-200 dark:border-gray-700"

The video shows an old movie-style scene in retro tones, with a little penguin

and a kitten having a joyous bike race. Little penguins and kittens, both dressed in orange-and-red race suits, ride vintage multi-wheeled bikes on a nostalgic dirt road flanked by spectators...

Wan 2.1: The Technical Powerhouse

At VACE’s core lies Wan 2.1 , a state-of-the-art video generation model with:

  • Hybrid Architecture :
  • 3D Variational Autoencoder (Wan-VAE) : Enables efficient encoding/decoding of 1080p videos while preserving temporal coherence .
  • Diffusion Transformer (DiT) : Enhances multi-modal fusion for generating high-fidelity video, audio, and text effects .
  • Performance Breakthroughs :
    • Consumer-Grade GPU Support : Runs smoothly on RTX 4070-class hardware, democratizing access to advanced video tools .
    • SOTA Results : Outperforms models like Sora in motion handling and physical realism, as measured by Vbench .

Key Features for Creators

  • Video Rerendering : Retain content, structure, or motion while altering styles (e.g., retro film effects for penguin-kitten bike races) .
  • Multi-Language Text Generation : Create dynamic text effects in Chinese and English (e.g., oil-painting-style narratives) .
  • Cross-Modal Workflows : Generate videos from text, images, or existing footage with precision .

<video src="https://ali-vilab.github.io/VACE-Page/assets/videos/Depth_src_1.mp4" className="w-96 max-w-full rounded-lg border border-gray-200 dark:border-gray-700"

The camera begins intimately focused on a cluster of grapevines, a close-up

showcasing the ripe, plump grapes, sunlight filtering through the leaves and illuminating their amber translucence. The camera slowly moves forward and initiates a gentle upward rotation, gradually revealing the rolling hills of a vast vineyard, rows of vines stretching in neat lines towards distant hills...

<video src="https://ali-vilab.github.io/VACE-Page/assets/videos/Pose_src_1.mp4" className="w-96 max-w-full rounded-lg border border-gray-200 dark:border-gray-700"

In a documentary style, a row of four meerkats dances together in the African

savanna at noon...

<video src="https://ali-vilab.github.io/VACE-Page/assets/videos/Inpainting_src_2.mp4" className="w-96 max-w-full rounded-lg border border-gray-200 dark:border-gray-700"

The video showcases a charming French-style café in Paris, where a lion

dressed in a suit elegantly sips coffee. The lion leisurely holds a coffee cup in one hand and takes a sip to savor it. The café is elegantly decorated, with soft tones and gentle lighting illuminating the space around the lion...

<video src="https://ali-vilab.github.io/VACE-Page/assets/videos/Inpainting_src_1.mp4" className="w-96 max-w-full rounded-lg border border-gray-200 dark:border-gray-700"

A person is painting on a canvas outdoors, using a palette with various colors

of paint. The person is wearing a dark blue jacket and a matching beret, and is seated on a wooden chair. The canvas depicts a landscape with a body of water and mountains in the background. The person is carefully applying...

<video src="https://ali-vilab.github.io/VACE-Page/assets/videos/Outpainting_src_1.mp4" className="w-96 max-w-full rounded-lg border border-gray-200 dark:border-gray-700"

An elegant lady is passionately playing the violin, with an entire symphony

orchestra behind her...

<video src="https://ali-vilab.github.io/VACE-Page/assets/videos/Layout_src_1.mp4" className="w-96 max-w-full rounded-lg border border-gray-200 dark:border-gray-700"

An eagle is flying over a calm blue ocean under a clear sky. The eagle, with

its brown and white feathers and yellow beak, descends towards the water, its wings spread wide. As it approaches the surface, it dives into the water, creating a splash...

Applications Across Industries

  • Film & Advertising : Rapid prototyping of commercials (e.g., a mature woman showcasing luxury handbags) .
  • Education : Transform static lessons into engaging animations (e.g., a painter creating landscapes) .
  • Gaming & Art : Produce anime-style scenes (e.g., a surfer facing turbulent seas) .

Why VACE Stands Out?

  • Open-Source Flexibility : Access training code, inference pipelines, and pre-trained models on GitHub (coming soon) .
  • Domain Generalization : Robust performance across unseen scenarios, from meerkat documentaries to symphony orchestras .

Final Thoughts

VACE and Wan 2.1 mark a leap forward in AI video technology, blending academic innovation (accepted at ICLR 2025) with practical usability . Whether you’re a developer, artist, or marketer, this framework empowers you to push creative boundaries.

Get Started : Explore the VACE paper, GitHub repo, and ACE++ extensions to revolutionize your video workflows.

Model is Coming Soon, Stay Tuned!

Related Articles

WAN 2.5 Preview Launched! image
24 Sep 2025

WAN 2.5 Preview Launched!

Alibaba has officially launched its next-generation AI Model, WAN 2.5 Preview. This release marks a significant step forward for AI in video and image generation, with its new architecture and powerful features set to revolutionize how we create and edit visual content.
WAN 2.2 Speech to Video: The Ultimate Audio to Video Platform for High-Quality AI Content Creation image
27 Aug 2025

WAN 2.2 Speech to Video: The Ultimate Audio to Video Platform for High-Quality AI Content Creation

Transform audio into stunning videos with WAN 2.2 Speech to Video. Fast, user-friendly platform for creators, businesses, and educators.
Introducing Qwen-Image - Advanced Text Rendering and Image Editing Model image
05 Aug 2025

Introducing Qwen-Image - Advanced Text Rendering and Image Editing Model

A comprehensive overview of Qwen-Image, a 20B MMDiT image foundation model that excels in complex text rendering and precise image editing