Skip to main content
Get started with Qwen Image Edit, a specialized model engineered for
Precision Image Manipulation and Creative Editing.
The Qwen-Image-Edit is a cutting-edge vision-language model designed to modify existing images based on natural language instructions. Unlike standard generation models that create images from scratch, this model understands the content of an input image and applies specific edits—such as adding objects, changing backgrounds, or altering styles—while preserving the original structure and context. This model excels at understanding complex editing requests, maintaining visual consistency, and delivering high-quality results for both realistic and artistic modifications.

Using Qwen Image Edit Inference API

This model is accessible to users on Build Tier 1 or higher. The API accepts an input image file along with text prompts to guide the editing process.
import requests

url = "https://platform.qubrid.com/api/v1/qubridai/image/edit"
headers = {
  "Authorization": "Bearer Qubrid_API_KEY"
}

# Prepare the file and data

files = {
  'image': ('input.png', open('input.png', 'rb'), 'image/png')
}
data = {
  "model": "Qwen/Qwen-Image-Edit",
  "prompt": "Add a rainbow in the sky",
  "negative_prompt": "blur, low quality, distortion",
  "true_cfg_scale": 4,
  "num_inference_steps": 40,
  "seed": 42,
  "use_inpainting": "false"
}

response = requests.post(url, headers=headers, files=files, data=data)

# Save the result

result = response.json()
print(result)
This will produce a JSON response containing the URL or data of the edited image:
{
  "created": 1764851500,
  "data": [
    {
      "url": "https://qubrid-storage.s3.amazonaws.com/generated/edited-image-xyz.png"
    }
  ]
}

Available Models

The Qwen Image Edit model is optimized for instruction-following image manipulation: Qwen-Image-Edit
  • Model String: Qwen/Qwen-Image-Edit
  • Input Type: Image (PNG/JPG) + Text Prompt
  • Capabilities: Object addition/removal, background replacement, style transfer, color correction
  • Best for: Creative design, photo retouching, e-commerce asset generation

Qwen Image Edit Best Practices

To achieve the best results with Qwen Image Edit, consider these parameters and prompting strategies: Recommended Parameters
  • Prompt: Be descriptive about the change you want. Instead of just “rainbow”, use “Add a bright rainbow arching over the mountains in the background”.
  • True CFG Scale: Controls how strictly the model follows the text prompt.
    • Low (1-3): More creative freedom, less adherence to prompt.
    • Medium (4-7): Balanced (Recommended).
    • High (8+): Strict adherence, potentially less natural blending.
  • Num Inference Steps: Higher steps (e.g., 40-50) generally yield higher quality details but take longer to process.
  • Negative Prompt: Specify what you want to avoid (e.g., “blur, distortion, low resolution, extra fingers”).
  • Use Inpainting: Set to true if you are providing a mask or want the model to infer a mask for specific area editing.
Prompting Best Practices
  • Focus on the Edit: The prompt should describe the result you want to see or the action to perform.
  • Preserve Context: If you want to keep the rest of the image unchanged, ensure your prompt doesn’t contradict the existing scene unless intended.
  • Iterative Editing: For complex changes, it is often better to perform one edit at a time (e.g., change background first, then add an object).

Qwen Image Edit Use Cases

  • E-commerce: Change product backgrounds or add lifestyle elements to product shots.
  • Real Estate: Virtual staging, changing sky conditions (day to dusk), or removing clutter.
  • Creative Design: Rapidly prototyping variations of a design concept.
  • Photo Retouching: Removing unwanted objects or people from photographs.
  • Marketing: Adapting a single visual asset for different campaigns or seasonal themes.

Managing Context and Costs

Image Optimization

  • Input Resolution: Ensure input images are of reasonable resolution. Extremely high-resolution images may be resized or incur higher latency.
  • File Size: Compress images (e.g., standard JPEG/PNG) before uploading to reduce network transfer time.

Cost Optimization

  • Step Count: Lower num_inference_steps for draft iterations to save on compute time, then increase for the final render.
  • Batching: If editing multiple images with similar prompts, ensure your workflow handles them efficiently, though the API processes one request at a time.

Technical Architecture

Model Architecture

  • Foundation: Built on advanced diffusion transformer architectures fine-tuned for instruction-based image editing.
  • Vision-Language Alignment: Uses a powerful vision encoder to understand the input image semantics and aligns them with the text prompt to guide the diffusion process.
  • Precision: Designed to minimize artifacts and “hallucinations” in the unedited parts of the image, ensuring high fidelity to the original source where changes are not requested.