/ Introducing Qwen-Image - Advanced Text Rendering and Image Editing Model
By Alibaba
05 Aug 2025
0
0
We are thrilled to announce the release of Qwen-Image, a groundbreaking 20B MMDiT image foundation model that achieves significant advances in complex text rendering and precise image editing. This innovative model represents a major leap forward in AI-powered image generation, offering unprecedented capabilities in handling multilingual text rendering and sophisticated image manipulation tasks.
Qwen-Image excels at complex text rendering, including multi-line layouts, paragraph-level semantics, and fine-grained details. The model supports both alphabetic languages (e.g., English) and logographic languages (e.g., Chinese) with exceptional fidelity, making it ideal for creating content that requires accurate text representation.
Through an enhanced multi-task training paradigm, Qwen-Image achieves exceptional performance in preserving both semantic meaning and visual realism during editing operations. This capability enables users to perform precise modifications while maintaining the integrity of the original image.
Evaluated on multiple public benchmarks, Qwen-Image consistently outperforms existing models across diverse generation and editing tasks, establishing a strong foundation model for image generation that sets new standards in the field.
Qwen-Image has been comprehensively evaluated across multiple public benchmarks:
The model achieves state-of-the-art performance on all benchmarks, demonstrating its strong capabilities in both image generation and editing. Particularly noteworthy is its exceptional performance in Chinese text generation, outperforming existing state-of-the-art models by a significant margin.
One of Qwen-Image's outstanding capabilities is its ability to achieve high-fidelity text rendering in different scenarios. Let's examine a Chinese rendering case:
Miyazaki Anime Style Scene
Scene featuring shop signs like "云存储", "云计算", and "云模型" with realistic depth of field and accurate character poses.
The model not only accurately captures Miyazaki's anime style but also features shop signs like "云存储", "云计算", and "云模型" as well as "千问" on wine jars, all rendered realistically and accurately with proper depth of field. The poses and expressions of the characters are also perfectly preserved.
Let's look at another example of Chinese rendering:
Traditional Chinese Couplets
Accurately rendered couplets with calligraphy effects and Yueyang Tower in the center.
The model accurately drew the left and right couplets and the horizontal scroll, applied calligraphy effects, and accurately generated the Yueyang Tower in the middle. The blue and white porcelain on the table also looked very realistic.
So, how does the model perform on English? Let's look at an English rendering example:
Bookstore Window Display
Accurately generated "New Arrivals This Week" and book titles including "The light between worlds", "When stars are scattered", "The silent patient", and "The night circus".
In this example, the model not only accurately outputs "New Arrivals This Week" but also accurately generates the cover text of four books: "The light between worlds", "When stars are scattered", "The silent patient", and "The night circus".
Let's look at a more complex case of English rendering:
Emotional Wellbeing Infographic
Complex layout with 6 submodules, each with icons, titles, and descriptive text.
In this case, the model needs to generate 6 submodules, each with its own icon, title, and corresponding introductory text. Qwen-Image has completed the layout beautifully.
What about smaller text? Let us test it:
Handwritten Poetry on Paper
Accurate rendering of handwritten poetry despite the text occupying less than one-tenth of the image.
In this case, the paper is less than one-tenth of the entire image, and the paragraph of text is relatively long, but the model still accurately generates the text on the paper.
What if it's bilingual? For the same scenario, let's try this prompt:
Bilingual Handwritten Content
Seamless switching between English and Chinese in handwritten text rendering.
As you can see, the model can switch between two languages at any time when rendering text.
Qwen-Image's text capabilities make it easy to create posters, such as:
Movie Poster: "Imagination Unleashed"
Complex poster with multiple text elements including title, subtitle, cast, director, and release information.
Since we can make posters, of course we can also make PPTs directly. Let's look at a case of making PPTs in Chinese:
Professional PPT Design
Enterprise-quality PPT with Alibaba branding, technical content, and traditional Chinese cultural elements.
Beyond text processing, Qwen-Image also excels at general image generation, supporting a wide range of artistic styles. From photorealistic scenes to impressionistic paintings, from anime styles to minimalist designs, the model flexibly responds to a wide range of creative prompts, becoming a versatile tool for artists, designers, and storytellers.
In terms of image editing, Qwen-Image supports a variety of operations, including:
This allows even ordinary users to easily achieve professional-level image editing.
Superior Text Rendering
Advanced Image Editing
Cross-Benchmark Excellence
User-Friendly Interface
To try the latest Qwen-Image model:
Visit Qwen Chat
Experiment with Text Rendering
Explore Image Editing
Qwen-Image represents a significant advancement in AI-powered image generation technology. Its exceptional text rendering capabilities, combined with powerful image editing features and strong cross-benchmark performance, make it a valuable tool for creators, businesses, and developers alike.
The model's ability to handle complex multilingual text rendering while maintaining high-quality image generation opens up new possibilities for content creation across various industries. From marketing materials to educational content, from entertainment to commercial applications, Qwen-Image provides the tools needed to create professional-quality visual content with unprecedented accuracy and flexibility.
We hope that Qwen-Image can further promote the development of image generation, lower the technical barriers to visual content creation, and inspire more innovative applications. At the same time, we also look forward to the active participation and feedback of the community to jointly build an open, transparent, and sustainable generative AI ecosystem.