Stable Diffusion: Getting Started (Windows)

Sean Simon
Oct 5, 2022
6 min read

Updated: Oct 9, 2022

In this tutorial I'll teach you how to setup Stable Diffusion locally on your machine.

Please note this is for Windows.

You'll need a beefy machine to run with a NVIDIA GPU.

For lower-end machines, you can install https://github.com/cmdr2/stable-diffusion-ui.

I'll be running through automatic1111's web-ui install that I've found to be very easy to use.

1. Installing the Packages

Start off by visiting https://github.com/AUTOMATIC1111/stable-diffusion-webui.

Give it a download.

Extract the zip file.

Note: If you intend to keep your webui updated rather than redownloading the latest build, I'd recommend cloning the repo from Git / Github Desktop instead.

Download Python (latest version: 3.10).

https://www.python.org/downloads/

Important: Ensure when you install the Python package, you tick the checkbox "Add Python to PATH"

If you already have a version of Python installed, there might be version conflicts. I'd recommend just having one version installed (uninstall the other).

Install Git

https://git-scm.com/download/win

Not entirely necessary, but useful nonetheless.

On the automatic1111 github page you were previously on, scroll down and open the dependencies link

We will be downloading the trained network for Stable Diffusion v1.4.

Scroll down on that page and you'll see a File Storage link.

Download that big 4 GB trained network model

https://drive.yerf.org/wl/?id=EBfTrmcCCUAGaQBXVIj5lJmEhjoP1tgl

While that's downloading, let's grab an upscaler. These are useful for increasing the resolution of the image.

Back on the dependencies page there will be a link to the Model Database.

https://upscale.wiki/wiki/Model_Database

I've chosen UniScale-Balanced/Strong as it is a universal ESRGAN model (we don't want Real-ESRGAN due to compatibility issues) and it has "a nice balance between sharpness and realism".

You can find this by doing a Ctrl+F find on that Model Database page.

Follow the link. https://mega.nz/folder/WEwUCDSJ#b1eXDT9b7yMKVlOURbR4FQ/folder/ed5xGaoa

Download the 4x-UniScale-Balanced [72000g].pth file.

2. Setting it up

Let's put this all together.

Place the upscaler model in the ESRGAN folder.

Place the sd model in the models folder.

With the web ui installed and the model / upscaler in the correct directories, lets fire up the webui.bat file.

It should do it's thing for a while.

It will eventually install various dependencies.

Once complete, it will show a local URL.

Head over to that URL and you shall see the Stable Diffusion UI!

Be careful not to close the command prompt or it will stop working. If you do, you can just rerun the webui.bat file (second time is always faster).

3. Using the UI

Quick run-through:

Tabs: This is where you can navigate between txt2img, img2img, and Extras (for upscaling)
Prompt: This is where you write the prompt. Negative prompts are things you don't want to see in the output.
Sampling Steps: Change this to modify how detailed the output is. I tend to put it around 50-80.
Sampling method: Don't worry about this too much, these will tend to produce the same results (but different methods of reaching it).
Batch settings: I tend to just modify the Batch count. It's a great way to explore a seed. Cool thing is that it saves a grid with the images in the batch, as well as each images individually.
CFG Scale: This affects the strength of your prompt. A low CFG scale will use the prompt less, while a high scale will try to stick more closely to the prompt. Values too high will end up with oversaturated colours/high contrast. I keep it around 7-8.
Width/Height: This will affect the composition quite drastically. For good landscape shots, make sure you set the width to be larger than the height. Portraits, do the opposite. Or just keep it as square.
Seed: The biggest selling point about Stable Diffusion is its Stability. This means if you keep the seed the same and change the prompt slightly, it will still retain some common elements/composition. Put the seed at -1 to generate something random each time.
Script: One of the most useful parts of this web ui. Lets you generate X/Y plots or feed lists of prompts from a file.
Generate: Once you've typed a prompt in, press that Generate button. Go on. You deserve it.
Output: Once you've generated the image it will appear in the output. Click on the image/s to view them in high res. You can send them to img2img/inpainting to do additional processing. Send them to extras for upscaling.

All your images are saved in the stable-diffusion-webui-master/outputs folder.

4. Img2Img

Head over to the img2img tab.

Drop a 512x512 image into the empty space.

Give it a prompt and hit Generate.

For Starry Night I gave it a prompt of "scifi technology landscape".

As you can see it kept the composition of Starry Night similar but it completely changed the objects in the scene.

Adjusting the Denoising strength will change how much it applies the img2img on the source image. Running the same seed on different Denoising strengths will let you explore this morph.

You can create a psychedelic gif using this technique, especially if you run the last frame back as the img2img source image and choose a different prompt.

5. Inpainting

Inpainting lets you replace parts of the image with something new.

It works best when you are inpainting on images that don't have too much detail.

In this example I'll start off in txt2img with "photo of a beautiful empty minimalist room with a pool". I'll then press [Send to inpaint].

Now in the img2img tab I'll use the drawing tool to erase a part of the image I want to put something new in. I'll give it the prompt "photo of a pool with an inflatable beachball".

I'll set the Batch count to 6 and hit Generate to make a bunch of these images.

The cool thing about inpainting is that it even generated reflections in the water!

You can also try different prompts.

"photo of a room with a bedroom"

"photo of a room with a scifi cosmic galaxy orb"

You can also apply this on uploaded images, or upload a .png file with a transparent border to outpaint areas (outpaint = reveal more content around the sides to extend the original image).

6. Exploring the Prompt-Space

To discover great prompts, I would recommend checking out Lexica for inspiration:

https://lexica.art/

Select an image you like and copy the prompt..

..and paste it into Stable Diffusion and press Generate.

There are some techniques I can recommend when exploring the capabilities of your prompt.

The first is to increase the Batch count to 16 and view the results. A wider pool of outputs will give you a better idea of what the prompt can generate. Generating only a few images means you are looking through a small pinhole in terms of creative diversity.

Automatic1111's web ui formats this into a nice grid for your viewing convenience.

This prompt I took from Lexica is: "iridescent opalescent landscape, warm tones, bioluminescent : by michal karcz, daniel merriam, victo ngai and guillermo del toro : ornate, dynamic, particulate, intricate, elegant, highly detailed, centered, artstation, smooth, sharp focus, octane render"

The next technique is to use the X/Y plot.

Set Batch count back to 1 and set Script to X/Y plot.

The X/Y plot will let you view how two variables affect your output image.

You provide comma seperated numbers for each variable you want to test.

For this example I'm testing the CFG for 4, 7, 9, and 12, as well as the Seed for 123, 5959, and 4001.

I'll be using another prompt I took off Lexica: "manhattan on a floating island in the sky, waterfalls falling down, low poly art, isometric art, 3d render, ray tracing, high detail, artstation, concept art, behance, smooth, sharp focus, ethereal lighting"

This helps me understand how changing the CFG Scale affects the outcome of the image. As I increase it, the output to sticks closer to the "low poly art" / "isometric art" parts of the prompt.

By only testing one CFG Scale or Seed, I end up limiting myself.

8. Upscaling

Stable Diffusion can generate images of any resolution, however increasing the width and height will exponentially increase the processing time. Lucky for us, we have upscaling models that can solely focus on increasing the resolution.

It is more optimal to render out an image at 512x512 and upscale to 2048x2048 rather than letting Stable Diffusion generate a 2048x2048 image, especially when you are generating tons of images.

Head to the Extras tab and drag your lower res image in.

Lets set Resize to 4 (so it upscales by 400%), and set Upscaler 1 to our 4x-UniScale-Balanced or Real-ESRGAN 4x plus (often works even better).

Then press Generate.

The output image will be 4 times larger.

GFPGAN visibility is a useful slider if you want to ensure faces are restored in the process.

Just a heads up, it does a fair bit of wrinkle smoothing, so you might want to lower the percentage (put it around 50%) if you want to retain some of those details.

Left = original, Right = GFPGAN (at 100%)