Listen to this article now

DeepFloyd, a research group backed by Stability AI, has introduced DeepFloyd IF, a text-to-image model that can generate realistic images integrated with text. Despite advancements in generative AI, text-to-image models still struggle to create legible logos, text, calligraphy, or fonts. DeepFloyd IF, trained on a data set of over a billion images and text, addresses this challenge with its ability to produce images from prompts that contain legible text, even in multiple languages, thanks to a large language model. The model uses multiple diffusion steps to generate high-quality images directly with pixels. In addition, the model has a modular architecture that stacks multiple processes together. Nightcafe, the generative art platform, was granted early access to DeepFloyd IF, and according to Nightcafe CEO Angus Russell, DeepFloyd IF’s ability to generate legible and accurately spelled text in images is perhaps the most significant breakthrough.

The model’s introduction comes at a time when generative AI vendors are under fire from artists who claim they are profiting from their work without permission. Therefore, DeepFloyd IF is licensed in a way that prohibits commercial use, for now. However, Russell anticipates that DeepFloyd IF’s ability to generate text in images will unlock new possibilities for generative art, such as logo design, web design, posters, billboards, and even memes. In addition, it is expected to improve the generation of hands and other intricate features.

Despite the impressive feat, the base model still generates images that are not as aesthetically pleasing as some diffusion models. Fine-tuning could help improve this flaw. Nonetheless, DeepFloyd IF may still suffer from the same biases as other generative AI models. Research shows that image-generating AI, including Stable Diffusion, tends to produce images of people that look white and male, especially in positions of authority. The DeepFloyd team acknowledges this potential for biases and recommends sufficient accounting for communities and cultures that use other languages.

In conclusion, DeepFloyd IF is a significant step forward for generative AI as it addresses the long-standing weakness of text-to-image AI models, but it is still not the holy grail of text-to-image models. There is still much to explore in generative AI, and DeepFloyd IF has opened up new possibilities for the creative use of AI in art and design.