DALLE-3 in ChatGPT

OpenAI Team has introduced the ability to generate unique images from a simple conversation directly in ChatGPT. This innovative feature is now accessible to ChatGPT Plus and Enterprise users. When one describes their vision, ChatGPT presents a range of visuals for further refinement and iteration. Users have the convenience of requesting modifications directly within the chat interface. The underlying technology for this feature is the advanced image model, DALL·E 3.

Research

Research indicates that DALL·E 3 is the result of numerous advancements in the field, stemming from both inside and outside of OpenAI. In comparison to its earlier version, DALL·E 3 produces images that stand out not just in their visual appeal, but also in their clarity and precision. It has a proven capability to depict complex details, such as text, hands, and facial features.

What’s more, it excels when given comprehensive, detailed instructions, and is versatile enough to cater to both landscape and portrait orientations. These enhancements were realized by employing a cutting-edge image captioner to produce superior textual descriptions for the images used in training.

Subsequently, DALL·E 3 was trained using these refined captions, leading to a model that pays greater heed to the descriptions provided by users. Further insights into this development can be found in the associated research paper.

Responsible Creation and Rollout

A layered safety mechanism is in place to restrict DALL·E 3 from producing images that could be deemed harmful, such as those depicting violence, explicit content, or hate. Before any image reaches the user, it undergoes rigorous safety evaluations based on the user’s input and the image itself.

In the developmental phase, feedback from initial users and expert evaluators was invaluable in pinpointing and rectifying any shortcomings in our safety protocols, especially as the model’s capabilities expanded. This feedback was instrumental in highlighting potential pitfalls, like the generation of explicit content or the creation of misleading images.

In the lead-up to DALL·E 3 in ChatGPT launch, measures were implemented to reduce the chances of the model replicating the styles of living artists, producing images of well-known personalities, and to enhance the diversity representation in the images it generates. For a comprehensive understanding of the preparations for DALL·E 3’s broad-based deployment, one can refer to the DALL·E 3 system card.

User insights are pivotal for ongoing refinement. Users of ChatGPT-4v can relay their feedback directly to our research division by utilizing the flag icon, especially if they come across unsafe results or outputs that don’t align with their initial prompts. Engaging with a wide and varied user base and gaining practical insights is fundamental for the ethical development and deployment of AI, a principle we hold dear.

Provenance Classifier

Currently, OpenAI are in the process of testing a preliminary version of a provenance classifier. This innovative tool is designed to ascertain if an image was crafted by DALL·E 3. Preliminary tests have shown it to be over 99% accurate in identifying DALL·E-generated images that remain unaltered. Even with common modifications like cropping, resizing, JPEG compression, or the addition of real image elements, its accuracy remains above 95%.

However, while these initial results are promising, the classifier can only suggest the likelihood of DALL·E’s involvement and cannot provide conclusive evidence. This classifier might eventually join a suite of tools aimed at helping individuals discern if content, be it audio or visual, is AI-derived. Addressing this challenge will necessitate collaboration throughout the AI ecosystem, encompassing platforms that deliver content to end-users. We anticipate gaining significant insights into the tool’s functionality and potential applications, refining our strategy as we progress.

Read related articles:

Introducing DALL-E 3 inside ChatGPT