prompt-engineering

Prompt Engineering And Why It Matters To The AI Revolution

Prompt engineering is a natural language processing (NLP) concept that involves discovering inputs that yield desirable or useful results. Prompting is the equivalent of telling the Genius in the magic lamp what to do. In this case, the magic lamp is DALL-E, ready to generate any image, you wish for. 

Understanding prompt engineering

Just like the wishes you express can turn against you, when you prompt the machine, the way you express what it needs to do can dramatically change the output. 

And the most interesting part?
Prompting was not a developed feature by AI experts. It was an emergent feature. In short, by developing these huge machine learning models, prompting became the way to have the machine execute the inputs.

None asked for it, it just happened! 

In a paper, in 2021, researchers from Stanford highlighted how transformer-based models have become foundational models.

foundational-models-machine-learning

As explained in the same paper:

The story of AI has been one of increasing emergence and homogenization. With the introduction of machine learning, how a task is performed emerges (is inferred automatically) from examples; with deep learning, the high-level features used for prediction emerge; and with foundation models, even advanced functionalities such as in-context learning emerge. At the same time, machine learning homogenizes learning algorithms (e.g., logistic regression), deep learning homogenizes model architectures (e.g., Convolutional Neural Networks), and foundation models homogenizes the model itself (e.g., GPT-3).

Prompt engineering is a process used in AI where one or several tasks are converted to a prompt-based dataset that a language model is then trained to learn.

The motivation behind prompt engineering can be difficult to understand at face value, so let’s describe the idea with an example.

Imagine that you are establishing an online food delivery platform and you possess thousands of images of different vegetables to include on the site.

The only problem is that none of the image metadata describes which vegetables are in which photos.

At this point, you could tediously sort through the images and place potato photos in the potato folder, broccoli photos in the broccoli folder, and so forth.

You could also run all the images through a classifier to sort them more easily but, as you discover, training the classifier model still requires labeled data. 

Using prompt engineering, you can write a text-based prompt that you feel will produce the best image classification results.

For example, you could tell the model to show “an image containing potatoes”. The structure of this prompt – or the statement that defines how the model recognizes images – is fundamental to prompt engineering. 

Writing the best prompt is often a matter of trial and error. Indeed, the prompt “an image containing potatoes” is quite different from “a photo of potatoes” or “a collection of potatoes”.

Prompt engineering best practices

Like most processes, the quality of the inputs determines the quality of the outputs. Designing effective prompts increases the likelihood that the model will return a response that is both favorable and contextual.

Writing good prompts is a matter of understanding what the model “knows” about the world and then applying that information accordingly.

Some believe it is akin to the game of charades where the actor provides just enough information for their partner to figure out the word or phrase using their intellect.

Think of the model as representing the partner in charades. Just enough information is provided via the training prompt for the model to work out the patterns and accomplish the task at hand.

There is no point in overloading the model with all the information at once and interrupting its natural intelligence flow.

Prompt engineering and the CLIP model

The CLIP (Contrastive Language-Image Pre-training) model was developed by the AI research laboratory OpenAI in 2021.

According to researchers, CLIP is “a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3.

Based on a neural network model, CLIP was trained on over 400 million image-text pairs which consist of an image matched with a caption.

Using this information, one can input an image into the model and it will generate a caption or summary it believes is the most accurate.

The above quote also touches on the zero-shot capabilities of CLIP which makes it somewhat special among machine learning models.

Most classifiers trained to recognize apples and oranges, for example, are expected to perform well on classifying apples and oranges but generally won’t detect bananas.

Some models, including CLIP, GPT-2, and GPT-3, can recognize bananas. In other words, they can execute tasks that they weren’t explicitly trained to perform. This ability is known as zero-shot learning.

Examples of prompt engineering

As of 2022, the evolution of AI models is accelerating. And this is making prompt engineering more and more important.

We first got text-to-text with language models like GPT-3, BERT, and others.

Then we got text-to-image with Dall-E, Imagen, MidJourney, and StableDiffusion.

At this stage, we’re moving to text-to-video with Meta’s Make-A-Video, and now Google’s developing its own Imagen Video.

Effective AI models today focus on getting more with much, much less!

One example is DreamFusion: Text-to-3D using 2D Diffusion, built by Google Research lab.

In short, AI diffusion models are generative models, meaning they produce an output that is similar to that on which they have been trained.

And by definition, diffusion models work by adding noise to the training data and by generating an output by recovering that data through a reversal of the noising process.

DreamFusion, by Google Research, is able to translate text to 3D images, without having a large-scale dataset of labeled 3D data (unavailable today).

And that’s the thing!

As explained by the research group:

“Adapting this approach to 3D synthesis would require large-scale datasets of labeled 3D data and efficient architectures for denoising 3D data, neither of which currently exist. In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis.”

Why is this relevant?

In a web that has been primarily text-based or 2D image-based for over two decades, it has now been time to enable enhanced formats, like 3D which can work well in AR environments.

In short, imagine that you’re wearing your Google’s AR Glasses, and these AI models underneath can enhance the real world, with 3D objects, on the fly, thus enabling you to make AR experiences way more compelling.

Meta AI, a make-a-video AI system, was given a prompt, and the machine gave back a short-form video.

At the same time, OpenAI announced speech-to-text with Whisper.

Combined, these AI models would create a multi-modal environment where a single person or small team can leverage all these tools for content generation, filmmaking, medicine, and more!

This means a few industries – that could not be entered before – become more easily scalable, as barriers to entry are wrecked off.

It’s possible to test/launch/iterate much faster, thus enabling markets to evolve more quickly.

If after almost 30 years of the Internet, still many industries (from healthcare to education) are locked into old paradigms.

A decade of AI might completely reshuffle them.

Each AI model will be prompted in the same way, yet the way to prompt a machine can have such subtleties that the machine can produce many different outputs thanks to the prompt variations.

Just in October 2022:

  • Stability AI Announces $101 Million in Funding for Open-Source Artificial Intelligence.
  • Jasper AI, a startup developing what it describes as an “AI content” platform, has raised $125 million at a $1.5 billion valuation. Jasper is in the process of acquiring AI startup Outwrite, a grammar and style checker with more than a million users.
  • OpenAI, valued at Nearly $20 Billion, in Advanced Talks with Microsoft For More Funding.

Today, with prompting you can generate a growing number of outputs.

open-ai-use-cases
Some of the use cases of OpenAI, which can be generated via prompting. From Q&A to classifiers, and code generators. The numbers of use cases that the AI, via prompting, enables is growing exponentially.

For fun, I used prompting from DreamStudio AI, the platform built by Stability AI to generate a portrait of Elon Musk, in Caravaggio’s style:

No alternative text description for this image
By prompting the machine with Elon Musk portrait in Caravaggio’s style” the AI generated the image above.

Another cool application? You can design your own showes with prompting:

Prompeted DreamStudio AI to generate a pair of custom sneakers.

Key takeaways:

  • Prompt engineering is a natural language processing (NLP) concept that involves discovering inputs that yield desirable or useful results.
  • Like most processes, the quality of the inputs determines the quality of the outputs in prompt engineering. Designing effective prompts increases the likelihood that the model will return a response that is both favorable and contextual.
  • Developed by OpenAI, the CLIP (Contrastive Language-Image Pre-training) model is an example of a model that utilizes prompts to classify images and captions from over 400 million image-caption pairs.

Read Next: AI Chips, AI Business Models, Enterprise AI, How Much Is The AI Industry Worth?, AI Economy.

Connected Business Frameworks

Artificial Intelligence vs. Machine Learning

artificial-intelligence-vs-machine-learning
Generalized AI consists of devices or systems that can handle all sorts of tasks on their own. The extension of generalized AI eventually led to the development of Machine learning. As an extension to AI, Machine Learning (ML) analyzes a series of computer algorithms to create a program that automates actions. Without explicitly programming actions, systems can learn and improve the overall experience. It explores large sets of data to find common patterns and formulate analytical models through learning.

AIOps

aiops
AIOps is the application of artificial intelligence to IT operations. It has become particularly useful for modern IT management in hybridized, distributed, and dynamic environments. AIOps has become a key operational component of modern digital-based organizations, built around software and algorithms.

Machine Learning

mlops
Machine Learning Ops (MLOps) describes a suite of best practices that successfully help a business run artificial intelligence. It consists of the skills, workflows, and processes to create, run, and maintain machine learning models to help various operational processes within organizations.

Continuous Intelligence

continuous-intelligence-business-model
The business intelligence models have transitioned to continuous intelligence, where dynamic technology infrastructure is coupled with continuous deployment and delivery to provide continuous intelligence. In short, the software offered in the cloud will integrate with the company’s data, leveraging on AI/ML to provide answers in real-time to current issues the organization might be experiencing.

Continuous Innovation

continuous-innovation
That is a process that requires a continuous feedback loop to develop a valuable product and build a viable business model. Continuous innovation is a mindset where products and services are designed and delivered to tune them around the customers’ problems and not the technical solution of its founders.

Technological Modeling

technological-modeling
Technological modeling is a discipline to provide the basis for companies to sustain innovation, thus developing incremental products. While also looking at breakthrough innovative products that can pave the way for long-term success. In a sort of Barbell Strategy, technological modeling suggests having a two-sided approach, on the one hand, to keep sustaining continuous innovation as a core part of the business model. On the other hand, it places bets on future developments that have the potential to break through and take a leap forward.

Business Engineering

business-engineering-manifesto

Tech Business Model Template

business-model-template
A tech business model is made of four main components: value model (value propositions, missionvision), technological model (R&D management), distribution model (sales and marketing organizational structure), and financial model (revenue modeling, cost structure, profitability and cash generation/management). Those elements coming together can serve as the basis to build a solid tech business model.

Additional resources:

Scroll to Top
FourWeekMBA