Introducing DALL-E 3 inside ChatGPT

Starting from today DALL-E 3 is available inside ChatGPT. Here’s why DALL-E 3 will improve at a faster rate than MidJourney:

Multi-turn dialogue

Multi-turn dialogue is an excellent UI to collect human feedback. People will explain what’s wrong with the generated image in free-form language, giving very fine-grained annotations for each refinement. This chat log is natively compatible with multimodal LLM’s training set.

GPT-4’s vision ability (image -> internal representation) can also be improved with the very same data.

Far superior algorithmic efficiency

MidJourney mostly ignored copyright issues and has spun the data flywheel for much longer, which means they likely have a much larger dataset to work with than OpenAI. Yet the quality still pales.

OpenAI’s got far more data-efficient new algorithms (e.g. Consistency Model: https://arxiv.org/abs/2303.01469) than the standard diffusion stack out there. Model improvement per extra unit training data is superior. It’s not “just engineering”.

Ecosystem

Integration with ChatGPT is such a killer move. It’s almost trivial to add the existing puzzle pieces to DALLE 3, such as the Code Interpreter and Browser. Want to apply a filter? Just call the OpenCV API instead of running the model. Want a reference image? Call the Search plugin to emulate Bard (w/ Google Lens integration).

Existing user base

MidJourney has 16M users. ChatGPT’s got 100M. Distribution is not an issue. It’s such a clunky and beginner-unfriendly UI.

Credits: @DrJimFan

Read related articles: