Skip to main content

Fine-Tuning Stable Diffusion XL on Image-Text Pair Dataset

· 5 min read
Qubrid AI
GPU SPEED | AI ACCELERATION

How to perform custom training of Stable Diffusion XL with image-text datasets.

Stable Diffusion XL or SDXL (stable-diffusion-xl-base-1.0) is an image generation model developed by Stability AI that is tailored towards more photorealistic outputs with more detailed imagery and composition compared to previous SD models, including SD 2.1. Qubrid AI Platform is a cutting-edge tool designed to streamline the process of deploying, managing, and fine-tuning AI models like Stable Diffusion XL. The platform provides an intuitive interface and robust infrastructure that simplifies the otherwise complex process of model training and deployment.

Let’s get started by signing in to the Qubrid AI Platform, and experience the power of the SDXL model by generating stunning images with just a text input as shown in Figure 1.

Qubrid AI - Model Inferencing

While the original SDXL model is powerful, it may not always produce perfect results for specific styles or themes, like in the case of generating images of "The Simpsons." As you can see in the image below, the model's initial output isn’t quite well what we'd expect.

 Qubrid AI - Model Inferencing

To improve this, we'll move ahead by fine-tuning the model using a dedicated Simpson dataset, allowing it to generate much more accurate and on-point images.

Beyond Inference: Fine-Tuning the Model

Beyond Inference: Fine-Tuning the Model

Qubrid AI offers much more beyond model inferencing , including the ability to fine-tune the model on specific datasets. Here’s how you can leverage this capability with Qubrid AI: Now we will step by step explore how to fine-tune Stable Diffusion XL on open source image pair dataset of Simpson .

Our platform is designed to be user-friendly, so getting started with fine-tuning is a breeze. Simply go to the Model Studio, select Image Generation, choose your desired model—in this case, Stable Diffusion XL—and hit Fine-Tune isn’t that easy!

After hitting the fine-tuning button on the Qubrid AI Platform, you will be presented with a JupyterLab environment equipped with several essential tools and resources to facilitate the fine-tuning process.

Here's what you will find inside Jupyter Lab:

  • GPU for Training

  • Pre-Trained Model

  • Fine-Tuning Sample Notebook

  • Pre-requisite Packages

By providing these tools and resources, the Qubrid AI Platform simplifies the process of fine-tuning the Stable Diffusion-XL- model, allowing you to focus on optimizing the model's performance. Figure 4 shows what the fine-tuning notebook looks like when launched on the Qubrid AI platform.

 Qubrid AI -Fine-Tuning Notebook Stable Diffusion-XL

1. Verify that you have a GPU for training the SDXL model.

Ensure that your instance includes a GPU, which is crucial for efficiently training the Stable Diffusion-XL model and significantly speeding up the fine-tuning process.

On the Qubrid AI Platform, you can access various types of GPUs such as T4, A10 G, L4, and more, making it suitable for training different AI models. Additionally, L40s and H100 GPUs will soon be available on the Qubrid AI platform.

 Qubrid AI - GPU for training Stable Diffusion-XL

2. Install the pre-requisite packages saved in the requirements.txt file.

Providing these pre-requisite packages directly to the user ensures a smooth setup process shown in Figure 2 , allowing the environment to be ready for fine-tuning without additional configuration.

3. Downloading Dataset.

In this step, we will use the dataset from hugging face. In this case, we are downloading the db-simpsons-dataset dataset from the JerryMo/db-simpsons-dataset repository, which features image-text pair dataset related to American animated series.

4. Model Training configurations.

This model training process utilizes a L4 GPU as shown in Figure below

 Qubrid AI - Model Training configuration for SDXL

5. Loading and Testing the fine-tuned model.

Now that the training is complete, it’s time to put our fine-tuned model to the test. We’ll use the same prompt as before, but this time, the results are impressive.

 Qubrid AI - Model Training configuration for SDXL

The fine-tuned model not only captures the style of "The Simpsons" but also understands the nuances and context of the characters and scenes. The difference is striking—what was once a generic output is now a vivid, on-point image that feels true to the original show. This is the magic of fine-tuning: transforming a general model into a specialized one that excels at generating images tailored to a specific dataset, like our Simpson dataset here.

Closing Remarks

In this blog, we covered a step-by-step guide on fine-tuning Stable Diffusion XL on image-text pair dataset using Qubrid AI Platform. If you're interested in fine-tuning other AI models, like LLM, Text-2-Image or Speech Recognition be sure to check out AI Hub .