2022-09-20 Stable Diffusion Roundup
Lots happening in the past week or so with Stable Diffusion! Let's take a look.
This is a great blog post because it describes textual inversion, which allows Stable Diffusion, which has never been trained on "ugly sonic" to nevertheless use it's existing weights to produce very compelling "ugly sonic" images with only a handful of images of source material (plus textual prompts). This seems like a very powerful mechanism to extend an already trained model to produce images of things it would otherwise be unaware of.
Another interesting application of Stable Diffusion: inpainting. This again applies Stable Diffusion to a use case where it gives artists more control over the final product. In this case, the example notebook shows that Stable Diffusion, given an input image, a masked area to alter, and a text prompt, is capable of altering the clothing worn by a person in a photo.
This project is a Blender plugin that generates textures on-demand using a simple UI, and features much simpler prompts than are required when working with raw Stable Diffusion. I think this is trend-setting: crafting prompts is one of the weirder parts of working with Stable Diffusion, and I fully expect tools that leverage Stable Diffusion to try and make that aspect of the process much more intuitive, as this does.
It also supports an img2img mode, which allows textures to be created with a bias towards the attributes of the supplied image.
Max Woolf wrote up a very good overview of how image generation actually works, breaking it down piece by piece. If this all still feels like "magic", this is a good article to read!
Matthias Bühlmann has done some really interesting work applying Stable Diffusion to the problem of image compression. There are two interesting findings here, I think:
- Stable Diffusion can compress images to a smaller size that jpeg and webp. This alone is remarkable.
- While other image compression algorithms introduce "artifacts" that generally appear as blockiness or noise, Stable Diffusion's artifacts are more like "hallucinations" where it generates detail that doesn't exist in the source, and it can differ from "ground truth".
Matthias' post highlights this second effect with a picture of the San Francisco skyline. The image compressed using Stable Diffusion inserts an imaginary skyline and set of buildings in the far distance, leading the viewer to believe the image is very high quality. The JPG image is better at "admitting" that it doesn't really know what's there.
This is somewhat disquieting: this approach seems much less like "compression" and rather more like Stable Diffusion is, much like a human, trying to "remember" what it saw originally.
I can't believe I didn't think to visit this earlier, but it's a great place to see what others are doing with the tech and chat. Some notable stuff I ran across there in the 30 minutes I spent:
- Really good results using img2img to take rough sketchwork and make it much more final
- Discovering new types of images SD can generate: a simple prompt asking for a stereoscopic portrait produces exactly that.
- Genuinely impressive work extending SD through Deforum to animation. This is part of the road to addressing the animation and having stable subject matter from image to image.