Alibaba Wan AI | Unified Image Generation & Editing

Wan 2.7 Image

One model. Two superpowers. Generate stunning images from text or edit existing ones — with built-in Thinking Mode, 9-reference inputs, and 4K Pro output.

Create with Wan 2.7 Image

Two Core Workflows, One Unified Model

No app switching. Generate or edit — the same model handles both with equal precision.

Text-to-Image

Describe what you want and get a high-quality image. Thinking Mode reasons through complex prompts for better composition, accurate spatial relationships, and fewer artifacts. Supports up to 5,000 characters of prompt detail.

Up to 4K resolution (Pro)

Thinking Mode for complex prompts

Image set mode (up to 12 images)

Seed-based reproducibility

Flexible aspect ratios

Image Editing

Upload up to 9 reference images and describe your changes in plain words. The model edits, restyls, or fuses images together — style transfer, element swapping, background replacement, or blending multiple references into one output.

Up to 9 reference images

Style transfer & element swapping

Plain language instructions

Multi-reference fusion

Subject & structure preservation

Standard vs Pro

Choose the tier that fits your quality and output requirements

Feature	Wan 2.7 Image	Wan 2.7 Image Pro
Max Resolution	2K (2048×2048)	4K (4096×4096)
Thinking Mode
4K Text-to-Image
Image Editing	Up to 9 images	Up to 9 images
Image Set Mode	Up to 12 images	Up to 12 images
Best For	Everyday creation, fast iteration	Print, large-format, commercial

Key Features

Thinking Mode

Built-in chain-of-thought reasoning that works before generating the image. When enabled, the model reasons through spatial relationships, composition balance, and multi-element prompts before committing to output — delivering better prompt adherence and fewer artifacts on complex requests. Enabled by default for text-to-image.

9-Reference Image Editing

Upload up to 9 reference images in a single editing call. The model uses them for style guidance, subject reference, background replacement, or multi-image fusion. Describe the change in plain language and the model applies it while preserving structure and subject consistency across all inputs.

4K Resolution (Pro)

Wan 2.7 Image Pro outputs up to 4096×4096 pixels — print-ready resolution for large-format assets, commercial campaigns, and high-DPI display. The standard model reaches 2K (2048×2048), which covers most digital and social media use cases. 4K is available for text-to-image generation.

Image Set Generation

Generate up to 12 coherent, related images from a single prompt in one request. Ideal for creating a character across different scenes, product shots from multiple angles, or storyboard sequences. All images in a set share consistent style, lighting, and subject identity — no manual consistency work needed.

12-Language Text Rendering

Text inside images renders crisply and print-ready across 12 languages, with support for up to 3,000 tokens of text content. Precise HEX color matching lets you specify exact brand colors, and palette extraction from reference photos ensures color accuracy across design systems.

Refined Persona Sculpting

Fine-grained facial control lets you dial in every detail — bone structure, eye shape, skin contours, and texture. Wan 2.7 Image excels at generating realistic faces with instruction-following precision, making it particularly strong for portrait work, character design, and brand persona creation.

Seed-Based Reproducibility

Set a seed value to reproduce any generation exactly. Combined with flexible sizing options (1K, 2K, 4K, or custom dimensions), this makes Wan 2.7 Image ideal for iterative workflows — test variations, lock in the best result, and reproduce it at any scale without losing the composition.

Precise Color Control

Specify exact HEX color codes in your prompt or extract palettes directly from reference photos. The model applies these colors with precision across the generated image — critical for brand consistency in commercial work, product photography, and design systems where color accuracy is non-negotiable.

Style Transfer & Fusion

Pass multiple reference images and describe how to blend them. The model can apply the style of one image to the subject of another, swap elements between references, or fuse multiple inputs into a single coherent output. Particularly effective for adapting existing assets to new visual contexts.

Technical Specifications

Model Architecture

Type: Unified Image Generation & Editing

Reasoning: Built-in Thinking Mode (CoT)

Developer: Alibaba (Wan AI)

Release: April 2026

License: Commercial cloud API

Resolution & Output

Standard: Up to 2K (2048×2048)

Pro: Up to 4K (4096×4096)

Size Options: 1K, 2K, 4K, custom

Batch Output: 1–4 (standard), 1–12 (set mode)

Formats: JPG, PNG, WebP

Input Capabilities

Prompt Length: Up to 5,000 characters

Reference Images: Up to 9 per request

Image Formats: JPG, PNG, BMP, WebP

Text in Image: Up to 3,000 tokens

Languages: 12 languages supported

Thinking Mode

Type: Chain-of-Thought reasoning

Default: Enabled for text-to-image

Applies to: Text-to-image only

Trade-off: Higher quality, longer generation

Best for: Complex, multi-element prompts

Image Set Mode

Max Images: Up to 12 per request

Consistency: Shared style, lighting, subject

Use Cases: Character sheets, product angles

Storyboards: Sequential scene generation

Prompt: Structured multi-image descriptions

Control & Precision

Seed Range: 0–2,147,483,647

Color Control: HEX codes, palette extraction

Facial Detail: Bone structure, skin, eye shape

Editing: Plain language instructions

Visual Pointing: Point-and-describe editing

Use Cases

Commercial & Brand Assets

Generate brand-consistent imagery with exact HEX color matching and palette extraction from reference photos. 4K Pro output ensures assets stay sharp across all display formats — from social media to billboard-scale print.

Product & E-commerce

Use image set mode to generate product shots from multiple angles in a single request. Style transfer lets you adapt existing product photos to new backgrounds or contexts without reshooting. Consistent lighting and color across all outputs.

Portrait & Character Design

Fine-grained facial control makes Wan 2.7 Image exceptional for portrait work. Dial in bone structure, eye shape, and skin contours with instruction precision. Create consistent character sheets across multiple poses and expressions using image set mode.

Presentations & Infographics

Generate slides, infographics, and data visualizations with crisp multi-language text rendering. Thinking Mode handles complex layout prompts — describe a slide structure and the model reasons through composition before generating. Up to 3,000 tokens of text content per image.

Storyboarding & Concept Art

Generate up to 12 sequential scene images in one request with image set mode. Consistent characters and environments across all frames make Wan 2.7 Image ideal for pre-production storyboards, concept art series, and narrative illustration projects.

Developer & API Workflows

Seed-based reproducibility and flexible sizing make Wan 2.7 Image reliable for automated pipelines. Generate, test variations, lock in the best seed, and reproduce at any resolution. Batch generation (1–12 images per request) reduces API calls for high-volume workflows.

Current Limitations

Thinking Mode Speed

Thinking Mode significantly increases generation time — around 51 seconds for a single image in Pro. For high-volume, time-sensitive workflows, disabling Thinking Mode trades some quality for speed.

Multi-Reference Mixing

When using more than 4 reference images, characters from all inputs may not blend smoothly. The model works best with 1–4 references for clean fusion; 5–9 references can produce inconsistent mixing between subjects.

Cloud-Only Access

Unlike earlier Wan video models that released open weights on GitHub, Wan 2.7 Image is currently cloud-only. Self-hosted deployment is not yet available, which limits use cases requiring on-premise or air-gapped environments.

4K Editing Restriction

4K resolution is only available for text-to-image generation — not for image editing mode. When editing with reference images, the maximum output is 2K even on the Pro tier. Plan your workflow accordingly for large-format editing tasks.

Content Restrictions

Wan 2.7 Image applies content moderation filters. The model excels at realistic photo generation but enforces content policies that restrict certain categories of imagery. This is standard for cloud-hosted commercial models.

Large File Sizes at 4K

4K output images can reach 25MB per file. For workflows generating large batches at maximum resolution, storage and transfer costs add up quickly. Consider using 2K for digital-only outputs and reserving 4K for final print-ready assets.

Frequently Asked Questions

What is Wan 2.7 Image?

Wan 2.7 Image is Alibaba's latest unified AI model for image generation and editing, released in April 2026. It handles both creating new images from text prompts and editing existing images — in a single model, without switching tools. Key capabilities include built-in Thinking Mode for enhanced reasoning, up to 9 reference images for editing, 12-language text rendering, and 4K output in the Pro tier.

What is the difference between Wan 2.7 Image and Wan 2.7 Image Pro?

Both versions share the same core capabilities — text-to-image, image editing, Thinking Mode, 9-reference inputs, and image set generation. The key difference is resolution: the standard model outputs up to 2K (2048×2048), while Pro outputs up to 4K (4096×4096) for text-to-image generation. Pro is the choice for print-ready assets, large-format display, and commercial productions where maximum resolution matters.

What is Thinking Mode and when should I use it?

Thinking Mode is a built-in chain-of-thought reasoning layer that activates before image generation. It reasons through spatial relationships, composition balance, and multi-element prompts before committing to output — resulting in better prompt adherence and fewer artifacts. It's enabled by default for text-to-image and works best for complex prompts with multiple elements, specific spatial requirements, or detailed compositional instructions. Disable it for faster generation on simpler prompts.

How does image editing work in Wan 2.7?

Upload up to 9 reference images along with a text prompt describing what you want changed. The model operates in editing mode when images are provided — it can apply style transfer, swap elements between images, replace backgrounds, or fuse multiple references into a single output. You can also point to specific areas visually and describe the change in plain language. No masking or layer isolation required.

What is image set mode?

Image set mode generates up to 12 coherent, related images from a single prompt in one request. All images in the set share consistent style, lighting, and subject identity. It's ideal for character sheets across different poses, product shots from multiple angles, or sequential storyboard frames. Use structured prompts that describe each image in the set for best results.

Is 4K resolution available for image editing?

No — 4K resolution is only available for text-to-image generation in the Pro tier. When editing with reference images, the maximum output resolution is 2K (2048×2048), even on Pro. If you need large-format edited output, generate at 4K first and then use the result as a reference for further editing at 2K.

How does text rendering work in Wan 2.7 Image?

Wan 2.7 Image renders text inside images crisply and print-ready across 12 languages, supporting up to 3,000 tokens of text content per image. You can specify exact HEX color codes for text or extract color palettes from reference photos. This makes it particularly effective for generating slides, infographics, posters, and marketing materials where text legibility and brand color accuracy are critical.

How does seed-based reproducibility work?

Set a seed value (0–2,147,483,647) to reproduce any generation exactly. The same prompt, same seed, and same settings will always produce the same image. This is essential for iterative workflows — test variations by changing only one variable at a time, lock in the best result with its seed, and reproduce it at any resolution or with minor prompt adjustments.

Is Wan 2.7 Image open-source?

Wan 2.7 Image is currently cloud-only and not open-source, unlike earlier Wan video models which released open weights on GitHub. Alibaba has confirmed commitment to open-sourcing Wan and Qwen models, so open weights may follow. For now, access is via cloud API only.

How can I access Wan 2.7 Image?

You can access both Wan 2.7 Image and Wan 2.7 Image Pro directly through SharkFoto. Simply visit SharkFoto.com, select your preferred version from the available AI image models, and start creating. SharkFoto provides seamless access to all Wan 2.7 Image features — text-to-image generation, image editing, Thinking Mode, image set generation, and 4K Pro output — without any complex API setup.

Generate or Edit — One Model Does Both

Experience Alibaba's unified AI image model. Thinking Mode reasoning, 9-reference editing, 12-language text, and 4K Pro output — all in one place on SharkFoto.

Try Wan 2.7 Image Try Wan 2.7 Image Pro