I just got back from a trip to India where I spent some time visiting the Bengaluru Fort. Inside the fort was a carving of what the historian guiding us believes is intended to represent a lion, but seems to have been made by someone who has never actually seen a lion before – only heard a description of one:
You’ll see it’s not a perfectly realistic representation of a lion, but for someone that has never seen a lion, it’s pretty impressive. It struck me that this might be an amusing, real-world example of the limitations of text-to-image synthesis using generative AI without fine-tuning (most people spend their vacations pondering metaphors to explain jargon terms for emerging technology trends... that's not weird, right?).
Generative AI is a powerful technology that uses artificial intelligence and machine learning algorithms to generate images from textual descriptions. It's like having an artist who can read your mind and create a visual representation of your thoughts. With generative AI, you can create custom images that are tailored to your specific needs, whether it's designing a billboard ad, creating an illustration for a book, or generating a realistic 3D model of a product.
Business Insider recently wrote an article explaining how a fashion brand you’re probably familiar with – Revolve – used generative AI to great success in their recent OOH campaign using billboards. Instead of investing time and money into securing models, building sets, renting locations, and doing a photoshoot, the creative for their billboards was developed with the help of generative AI. They were able to customize the clothing, environment, the models’ features–they were even able to swap out models’ faces. The end result was absolutely stunning. On brand, compelling, and well-suited to the context of a large format Out-of-Home advertising unit such as a billboard.
Credit: AI-Generated Billboard Design by Revolve Featured on Business Insider
While generative AI can be incredibly useful for creating custom images, it's not always perfect – as with our lion in Bengaluru Fort. A generalized model, trained on a large dataset of images and textual descriptions, might not be able to create the specific images you need for a specific application. That's why fine-tuning a generative AI model with a specific dataset is crucial for providing the model with the specific context it needs to generate effective images.
If you're a brand with a broad appeal where abstract creative fits, such as Revolve, the current generative AI technologies may work for you (although even they had to use multiple generative AI technologies such as Midjourney, ControlNet, and Stable Diffusion models - a topic we'll cover in a future article).
But if you're like most companies, the future of generative AI will have an impact across your organization once we begin to fine-tune the models trained on large datasets with datasets specific to our own contexts.
How Generative AI Enables Text-to-Image Synthesis
Text-to-image synthesis is a process that uses AI and machine learning algorithms to generate images from textual descriptions. It's like having an artist who can read your mind and create a visual representation of your thoughts.
Credit: AI-Generated Billboard Design by Revolve Featured on Business Insider
For example, imagine you want to design a new billboard ad for a product, but you don't have any photos of the product yet. With text-to-image synthesis, you can describe the product in words, and the computer can generate a visual representation of what the product might look like.
There are several different approaches to generative AI, but in the context of text-to-image synthesis, the most common approach is to train a neural network to generate images from textual descriptions. The neural network is a type of machine learning algorithm that is modeled after the structure of the human brain. The network consists of multiple layers of interconnected nodes, or "neurons," that work together to process data and make predictions or generate new content.
To generate images from textual descriptions, the neural network is trained on a large dataset of images and their corresponding textual descriptions. The model then learns to associate the textual descriptions with specific visual features and can generate images that match the descriptions.
One of the key challenges in text-to-image synthesis is learning to translate textual descriptions into visual features. For example, if the textual description says "a yellow banana with brown spots," the model needs to learn how to generate an image that accurately represents the color and texture of a banana.
To do this, the neural network uses a combination of techniques, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), to process the textual descriptions and extract visual features. The model then uses these features to generate a low-resolution version of the image, which is refined using multiple layers of transformations to produce a final high-resolution image.
One of the advantages of generative AI for text-to-image synthesis is its ability to generate images that are not limited by the constraints of the physical world. For example, generative AI can generate images of imaginary creatures or objects that don't exist in the real world. This allows for greater creativity and flexibility in image creation.
The Limitations of a Generalized Model for Specific Applications
While a generalized model for text-to-image synthesis can be useful for generating images in a wide range of contexts, it has limited utility for specific applications that require a high degree of specificity. For example, if you’re designing a billboard ad for a specific product, you need an image that accurately represents the product and its features. This is because a generalized model is trained on a large and diverse dataset of images and textual descriptions, which may not capture the nuances or specific features of the product you’re working with.
This is part of why you may notice that images generated using current generative AI tools may seem more like a Salvador Dali painting than something you'd expect your marketing team to create (fun fact: "DALL-E", a popular generative AI tool for text-to-image synthesis, was named by combining the artist "Dali" with the robot character "WALL-E").
To overcome this limitation, "fine-tuning" a generative AI model with a specific dataset is necessary. Fine-tuning involves re-training the model on a smaller and more specific dataset to improve its performance on a particular task or domain.
For example, to fine-tune a generative AI model for creating impactful billboard designs, one could use a smaller dataset of billboard designs that are known to be effective. The dataset could include images of billboards with different styles, layouts, and messaging, as well as corresponding textual descriptions.
The fine-tuning process would involve re-training the model using the smaller dataset, with the goal of improving its ability to generate billboard designs that are visually appealing and effective at conveying a specific message. This could involve adjusting the model architecture, training procedure, or loss function to better fit the characteristics of the new dataset.
Once the model is fine-tuned, it can be used to generate billboard designs by inputting a textual description of the desired design. The model will then generate an image that matches the description and has been optimized for impact based on the fine-tuning dataset.
The Importance of Context in Text-to-Image Synthesis
The context in which an image is generated plays a critical role in the quality and effectiveness of the image. This is because the context provides important information about the purpose, audience, and messaging of the image.
For example, if you’re designing a billboard ad for a product, the context of the ad (e.g. location, audience, and other sights competing for visual attention) will influence the design and messaging of the ad. A generative AI model that has been fine-tuned with a specific dataset can incorporate this context into its image generation process, resulting in more effective and impactful images.
The context of the application may also affect the choice of generative AI technique used for text-to-image synthesis. For example, Stable Diffusion may be more suitable for generating high-quality and realistic images, while GANs may be more suitable for generating more creative and imaginative images.
On the internet, the media experience is relatively universal (screen sizes are similar, ad creatives are generally standardized, and the experiential contexts within which people see ads is usually the same). The real world is much more diverse and dynamic. The viewing experience for any given ad is heavily affected by the many variations in size, behavioral contexts (e.g. driving vs. standing in an elevator seeing an ad via Captivate to avoid making eye contact with strangers), and even the weather! If it's raining or even if it's just dark outside, that will impact whether the image being generated will have the desired impact on the audience.
Because of this, fine-tuning generative AI models to account for the experiential context of the audience will be crucial to the future of enabling OOH artists to leverage this technology.
The Future of Generative AI's Role in Creative Design for Advertising
While there are limitations to using generative AI for creating marketing art, there is also great potential for this technology in the future. To realize this potential, however, it will be necessary to build fine-tuned models that are optimized for specific applications and contexts.
When it comes to Out-of-Home advertising creative, a fine-tuned generative AI model could be used to generate images that are optimized for the specific location, audience, messaging, and competition of the campaign. By fine-tuning the model using relevant data and expertise, marketers can create images that are not only visually appealing but also highly effective.
In addition, building fine-tuned models can help to address the potential biases and limitations of generative AI. By training the model on diverse and unbiased datasets and incorporating ethical considerations into the design process, marketers can ensure that the generated images are fair and equitable for all.
AI is not replacing human beings in the creation of marketing art. They no longer create ads using paint in an era of digital design, and now they may create images with their words through their keyboard instead of being limited to their eyes using their computer's mouse.
The future of generative AI's role in creative design for advertising will still require a collaborative approach between human designers and AI-generated images. By combining the creativity and expertise of human designers with the power and flexibility of generative AI, marketers can create truly impactful and engaging images that capture attention and reflect their brand.
How to Keep Up With Generative AI's Rapid Evolution
It may feel like generative AI is just another hype trend with few near-term practical applications for most companies, but generative AI is unlike previous hype trends (e.g. blockchain technologies, Web3/"Metaverse" technologies, and other futuristic technologies that are still ahead of their time). Even comparing what a tool like OpenAI's ChatGPT is capable of today vs. just a few months ago demonstrates a stark difference in the speed with which AI's, by their very nature, rapidly evolve and improve.
They may seem relatively useless at first, a simple novelty. That can be discouraging and lead people to dismiss it as irrelevant to near-term business needs. As an example, we had a user of the OneScreen.ai platform try to use Midjourney to design a billboard for a law firm that wanted to radiate a premium brand vibe. But the initial results were, shall we say, misaligned with the company's brand goals:
But AI technologies learn quickly as they're used by more and more people for more and more specific applications. The human brain is trained on a shared foundational dataset of knowledge during the education of our early years, but only has an impact once that general knowledge of language, history, art, etc. is combined with specialized experiences, training, and context to fine-tune us to do things like write blog articles about the importance fine-tuning to the future of generative AI for commercial applications.
Generative AI will be the same. That's why it's important to understand how these technologies work and how they can be made to work more effectively for your purposes over time.
AI vs. Plus Humans
I want to end by sharing a video we created using Cameo to thank Andy Sriubus (the Chief Commercial Officer at Outfront - one of the largest Out-of-Home media operators in the world) for taking time out of his busy day to give us advice when we were first launching OneScreen.ai. I mentioned in my Cameo request that part of our discussion with Andy revolved around the need for technology to aid human buyers and sellers in the OOH industry, but that the adoption of technology is being slowed in the industry because of people's fear that it will replace them - and I won't be surprised if this line ends up in a future Star Trek production (video starts at the relevant time):
Learn More in Our Free Ebook on AI Applications for ABM Marketers
... but we haven't finished the landing page yet 😅 so click the image below and when you fill out that form we'll email you the PDF manually. One day soon AI tools will automate boring manual work like creating download pages for ebooks! Until then, bear with us :)
Click the image below 👇: