The following materials offer an introduction to the fascinating world of AI Image Generation, using the specific example of OpenAI’s Dall-E 2. You can learn the basics of how to use the software, as well as more advanced techniques to help make the most of the full capabilities of AI generation tools. Consideration is also given to the important role that creativity and imagination plays in the process. By the end, you should have a better understanding of how to use Dall-E 2 to create projects that are both meaningful and unique. So, let's get started…

Introducing DALL-E 2

In the last year many articles (and social media posts) have been circulating regarding AI image generators, based upon new ‘decoder diffusion models’. The most widely reported has been OpenAI’s DALL-E 2, along with Google’s IMAGEN. Other smaller scale and/or open access models include: MidjourneyNightcafeStarryai, and Dalle-mini. In an article for The New York Times, Kevin Roose notes that what is impressive about a model such as DALL-E 2 is not simply that it can generate new art, but how it performs this task: ‘These aren’t composites made out of existing internet images — they’re wholly new creations’.

<aside> 👀 Watch the following video, ‘DALL-E 2 Explained’, produced by OpenAI, the makers of the image tool DALL-E 2. It explains how their tool works, but also explains more broadly how AI ‘diffusion’ techniques work, and what they might enable us to do.

</aside>

https://youtu.be/qTgPSKKjfVg

AI Image Diffusion Models

The diffusion model works by progressively corrupting (diffusing) training data, until data becomes pure noise. The model then trains a neural network to reverse the process. As Jonathan Ho (at Google) explains: ‘Running this reversed corruption process synthesises data from pure noise by gradually denoising it until a clean sample is produced’. While the model involves a form of reverse engineering, as found with the adversarial layer of GANs, the technique is categorically different and is fast overtaking the use of GANs. Indeed, as Carlos Pérez notes: ‘GANs have been surpassed by Diffusion models that are orders of magnitude more efficient’; not least as the underpinning decoder technology ‘is an Ordinary Differential Equation … that can be solved by many numerical methods developed in the past decades. It requires no training!

Importantly, two key factors will allow for more widespread, ‘off-the-shelf’ use. Firstly, these models work in two-parts, moving from a sentence to an image. They are layered upon increasingly impressive large language models, providing a powerful and accessible interface, which allows the user to request image generation based simply upon natural language sentences. Secondly, the diffusion method, which as noted first reduces data to noise before then building up an image (effectively from scratch), provides the means for the original generation of imagery, thereby introducing a new creative property to computer vision. The real ‘art’ of this model is its ability to form predication models of pixels in a similar manner to predicting words in language models. Despite the subtlety of imagery, which do not adhere to ‘grammar’ in the sense we would say of language, this technique is effectively working in the same way to text prediction; i.e. by training models on the smallest units of meaning (whether words or pixels). Furthermore, the integration of large language models and image diffusion knits together both word and image making these models very simple to use. In so doing, these models solve the apparent ‘problem’ of the complexity of images, which human culture has obsessed over, over millennia, and which computers can now format quickly as information.

New Creative Processes

<aside> ❓ AI image generation tools produce some startling results, yet the technology is still very new. What might we expect in the future as we start to use such tools? How will these tools impact on creative processes and on the creative economy?

</aside>

Despite the infancy of the technology, its impact is already evident. In a promotional video for the Google Pixel 6 Pro, production designer Hannah Beachler (working on Black Panther 2), talks through her creative process, which, involves building up huge collections of images for moods, tones and storytelling:

https://www.youtube.com/watch?v=I-dENaxSdKY&t=16s