Train a Textual Inversion
The first thing you need to do is give the AI an idea of what you want. You could probably do that with a text prompt, but there's lots of stuff to do with weightings, tokens and so on that goes into how the AI reads text prompts that makes that pretty hard. Also, language is limited - how would you describe, for example, a particular hairstyle that the AI doesn't already understand?
So the best way is to show it. Collect a few images (it doesn't have to be many, 10-12 will do) of the type of thing you're trying to create. Don't have 12 images of the exact same thing, even in different poses/locations/etc, because then your Textual Inversion will be too specific to that one thing rather than the general style you're going for. For my Textual Inversions I found pictures of chav girls that had things I thought were particularly hot - big hair, a "bitchy" face, etc - but with different hair colours, clothes, locations, etc.
There are guides out there for how to create a Textual Inversion so I won't get into it here, but I do have a couple of useful tips;
- Use BIRME to resize your images. It's an online tool that lets you crop a set of images to a given size, set the cropped regions (as in, the bits of the image you keep) and download the cropped images as a ZIP file. Here is a link pre-loaded with the size of image that Stable Diffusion expects: https://www.birme.net/?target_width=512&target_height=512
- Use the following very simple prompt template when training your Textual Inversion:
a photo of [name], [filewords]
- You don't need the giant amount of steps Automatic1111 tells you to use. Our images don't need to be that accurate since we're going for a "look" rather than a particular person, and we're going to be doing stuff to eliminate any base SD 1.5 weirdness later on. 1000 steps per image is fine.
Make sure to only use the base SD1.5 model for generating Textual Inversions. Anything else will either produce bad results or just plain not work. Images don't need to be perfect at this stage and we are expecting jank!
Generate Your Base Images
Now you have your Textual Inversion trained, it's time to create some images with it. Again, make sure at this point that you are only using the base SD 1.5 model! The reason we're still using the base model rather than one of the dozens of realistic models out there is because those models are extremely opinionated. Models like that are very good at churning out generic hot blondes because that's what they've been trained on and so that's what they think women look like. That style you find so hot that you just gathered those images for? Chances are it will look at that data, compare it to what it thinks a woman looks like, go "this doesn't look right" and either discard it entirely or only let it influence the image very slightly.
We're also not going to be using Hires Fix here as that comes later.
My main tip here is to use wildcards. A lot of wildcards. Pretty much all my images are generated from just a few prompts, each of which mostly consists of wildcards. Hair colour, eye colour, the colour of their clothes, the type of clothes they're wearing, the pose they're striking, the room they're in, the type of weather going on if they're outside, etc. all get drawn from wildcard files with anywhere between half a dozen to several dozen possible options.
My suggestion is you work on a base prompt with concrete options until you get one that consistently produces decent images and then replace all of those options with wildcards, so "A happy (Textual Inversion Here) woman with blonde hair wearing a purple dress dancing in a nightclub" becomes "A __emotions__ (Textual Inversion Here) woman with __haircolours__ hair wearing a __clothescolours__ __clothes__ __poses__ in a __locations__". For the negative prompt, you can use one of the hundreds of generic negative prompts you can find out there and add in any weird artifacts that are particular to your Textual Inversion. Mine for some reason loves to do weird things with foreheads, so I put "marks on forehead" in my negative prompt.
Once I'm happy with my base prompt and have created my wildcard files, I'll crank out as many low resolution images based on my wildcarded prompt as I have time for. Even the best Textual Inversions and the best prompts will produce some duds. Go through them and pick out the ones you actually think could make a hot image if they were redrawn better and save them somewhere before moving onto the next step.
Batch Generate High Quality Images
Now we're going to do the fun part. First off, now is when we switch to a realistic model - we've let the base model generate the image we want and now we're just going to get the realistic model to re-draw it better rather than letting it decide the whole composition of the image. Once you've done that, in the img2img tab of Automatic1111, go to the Batch sub-tab in the Generation menu. Give it the directory you just saved your source images to, and an output directory to save its work to.
Next, customize the following settings;
- Sampling Method: Use your favourite sampling method here (I use DPM++ 2M SDE Karras)
- Batch Count: 3 (Or higher if you have a fast computer, this is how many we're going to generate from each source image, which gives us a better chance of getting a good output image)
- Denoising Strength: 0.45 (This seems to strike a good balance between keeping the content of the source image but cleaning up most of the jank. If you leave this at the default 0.75 it'll replace too much of the original image)
- Install the ADetailer extension if you haven't already and enable it, selecting the mediapipe_mesh_face_eyes_only option. This will tidy up any artifacts left in the eyes after the upscale without redrawing too much of the face.
Now start the batch and let it run!
This is pretty much my process, so if you follow these steps you should be able to get something close (or even better, depending on your tastes and the amount of effort you put into your stuff) to what I do. Hope this helps!
|