Prompt engineering is a natural language processing (NLP) concept that involves discovering inputs that yield desirable or useful results. Prompting is the equivalent of telling the Genius in the magic lamp what to do. In this case, the magic lamp is DALL-E, ready to generate any image you wish for.
In-context learning via prompting
In biology, emergence is an incredible property, where parts that come together, as the result of their interactions, show new behaviors (called emergent), which you can’t see on a smaller scale.
The even more incredible thing is even if the smaller scale version seems similar to the larger scale, the fact that the larger scale is comprised of more parts, and interactions, it eventually shows a completely different set of behaviors.
And there is no way of predicting what this behavior might be.
That’s the beauty (for better or worse) of scale!
In the current AI revolution, the most exciting aspect is the rise of emergent properties of machine learning models working at scale.
And it all started with the ability to have those AI models trained in an unsupervised manner. Indeed, unsupervised learning has been one of the key tenets of this AI revolution, and it has unstuck the AI progress of the last few years.
Before 2017, most AI worked by leveraging supervised learning via small, structured data datasets, which could train machine learning models on very narrow tasks.
After 2017, with a new architecture called a transformer, things started to change.
This new architecture could be used with an unsupervised learning approach. The machine learning model could be pre-trained on a very large, unstructured dataset with a very simple objective function: text-to-text prediction.
The exciting aspect is that the machine learning model, in order to learn how to properly perform a text-to-text prediction (what might seem a very simple task), started to learn a bunch of patterns and heuristics around the data on top of which it was trained.
This enabled the machine learning model to learn a wide variety of tasks.
Rather than trying to perform a single task, the large language model started to infer patterns from the data and re-used those when performing new tasks.
This has been a core revolution. In addition, the other turning point, which came out with the GPT-3 paper, was the ability to prompt these models.
In short, it enable these models to further learn the context of a user through natural language instruction, which could dramatically change the output of the model.
This other aspect was also emergent, as none expressly asked for it. Thus, this is how we got in-context learning, via prompting, as a core, emergent property of current machine learning models.
Understanding prompt engineering
Prompt Engineering is a key, emergent property of the current AI paradigm.
One of the most interesting aspects of Prompt Engineering is the fact that it came out as an emergent property of scaling up the transformer architecture to train large language models.
Just like the wishes you express can turn against you, when you prompt the machine, the way you express what it needs to do can dramatically change the output.
And the most interesting part?
Prompting was not a developed feature by AI experts. It was an emergent feature. In short, by developing these huge machine learning models, prompting became the way to have the machine execute the inputs.
None asked for it; it just happened!
In a paper in 2021, researchers from Stanford highlighted how transformer-based models had become foundational models.
As explained in the same paper:
The story of AI has been one of increasing emergence and homogenization. With the introduction of machine learning, how a task is performed emerges (is inferred automatically) from examples; with deep learning, the high-level features used for prediction emerge; and with foundation models, even advanced functionalities such as in-context learning emerge. At the same time, machine learning homogenizes learning algorithms (e.g., logistic regression), deep learning homogenizes model architectures (e.g., Convolutional Neural Networks), and foundation models homogenizes the model itself (e.g., GPT-3).
Prompt engineering is a process used in AI where one or several tasks are converted to a prompt-based dataset that a language model is then trained to learn.
The motivation behind prompt engineering can be difficult to understand at face value, so let’s describe the idea with an example.
Imagine that you are establishing an online food delivery platform and you possess thousands of images of different vegetables to include on the site.
The only problem is that none of the image metadata describes which vegetables are in which photos.
At this point, you could tediously sort through the images and place potato photos in the potato folder, broccoli photos in the broccoli folder, and so forth.
You could also run all the images through a classifier to sort them more easily but, as you discover, training the classifier model still requires labeled data.
Using prompt engineering, you can write a text-based prompt that you feel will produce the best image classification results.
For example, you could tell the model to show “an image containing potatoes”. The structure of this prompt – or the statement that defines how the model recognizes images – is fundamental to prompt engineering.
Writing the best prompt is often a matter of trial and error. Indeed, the prompt “an image containing potatoes” is quite different from “a photo of potatoes” or “a collection of potatoes.”
Prompt engineering best practices
Like most processes, the quality of the inputs determines the quality of the outputs. Designing effective prompts increases the likelihood that the model will return a response that is both favorable and contextual.
Writing good prompts is a matter of understanding what the model “knows” about the world and then applying that information accordingly.
Some believe it is akin to the game of charades where the actor provides just enough information for their partner to figure out the word or phrase using their intellect.
Think of the model as representing the partner in charades. Just enough information is provided via the training prompt for the model to work out the patterns and accomplish the task at hand.
There is no point in overloading the model with all the information at once and interrupting its natural intelligence flow.
Prompt engineering and the CLIP model
The CLIP (Contrastive Language-Image Pre-training) model was developed by the AI research laboratory OpenAI in 2021.
According to researchers, CLIP is “a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3.”
Based on a neural network model, CLIP was trained on over 400 million image-text pairs, which consist of an image matched with a caption.
Using this information, one can input an image into the model, and it will generate a caption or summary it believes is the most accurate.
The above quote also touches on the zero-shot capabilities of CLIP, which makes it somewhat special among machine learning models.
Most classifiers trained to recognize apples and oranges, for example, are expected to perform well on classifying apples and oranges but generally won’t detect bananas.
Some models, including CLIP, GPT-2, and GPT-3, can recognize bananas. In other words, they can execute tasks that they weren’t explicitly trained to perform. This ability is known as zero-shot learning.
Examples of prompt engineering
As of 2022, the evolution of AI models is accelerating. And this is making prompt engineering more and more important.
We first got text-to-text with language models like GPT-3, BERT, and others.
Then we got text-to-image with Dall-E, Imagen, MidJourney, and StableDiffusion.
At this stage, we’re moving to text-to-video with Meta’s Make-A-Video, and now Google’s developing its own Imagen Video.
Effective AI models today focus on getting more with much, much less!
One example is DreamFusion: Text-to-3D using 2D Diffusion, built by Google Research lab.
In short, AI diffusion models are generative models, meaning they produce an output that is similar to that on which they have been trained.
And by definition, diffusion models work by adding noise to the training data and by generating an output by recovering that data through a reversal of the noising process.
DreamFusion, by Google Research, is able to translate text to 3D images, without having a large-scale dataset of labeled 3D data (unavailable today).
And that’s the thing!
As explained by the research group:
“Adapting this approach to 3D synthesis would require large-scale datasets of labeled 3D data and efficient architectures for denoising 3D data, neither of which currently exist. In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis.”
Why is this relevant?
In a web that has been primarily text-based or 2D image-based for over two decades, it has now been time to enable enhanced formats, like 3D which can work well in AR environments.
In short, imagine that you’re wearing your Google’s AR Glasses, and these AI models underneath can enhance the real world, with 3D objects, on the fly, thus enabling you to make AR experiences way more compelling.
At the same time, OpenAI announced speech-to-text with Whisper.
Combined, these AI models would create a multi-modal environment where a single person or small team can leverage all these tools for content generation, filmmaking, medicine, and more!
This means a few industries – that could not be entered before – become more easily scalable, as barriers to entry are wrecked off.
It’s possible to test/launch/iterate much faster, thus enabling markets to evolve more quickly.
If after almost 30 years of the Internet, still many industries (from healthcare to education) are locked into old paradigms.
A decade of AI might completely reshuffle them.
Each AI model will be prompted in the same way, yet the way to prompt a machine can have such subtleties that the machine can produce many different outputs thanks to the prompt variations.
Just in October 2022:
- Stability AI Announces $101 Million in Funding for Open-Source Artificial Intelligence.
- Jasper AI, a startup developing what it describes as an “AI content” platform, has raised $125 million at a $1.5 billion valuation. Jasper is in the process of acquiring AI startup Outwrite, a grammar and style checker with more than a million users.
- OpenAI, valued at Nearly $20 Billion, is in Advanced Talks with Microsoft For More Funding.
Today, with prompting, you can generate a growing number of outputs.
Another cool application? You can design your own shoes with prompting:
Prompting like coding?
On November 30, OpenAI released ChatGPT.
A conversational AI interface with incredible capabilities.
As I tested ChatGPT, it was mind-blowing!
I used it to generate job descriptions.
With a simple prompt, it gave me a pretty accurate output in a matter of a few seconds!
That made me realize this was another turning point for AI…
And that’s nothing, indeed the current paradigm of AI is that it can code incredibly well!
ChatGPT is a tool that combines the GPT-3 model plus an additional model called InstructGPT, which is fine-tuned through reinforcement learning from human feedback to make it more grounded compared to GPT.
With ChatGPT, you can get any answer on any topic (though for this Beta release, it was restricted to various areas).
There is much more to it.
With ChatGPT, you can turn yourself into a coder.
All you need is prompting!
Here I prompted ChatGPT to generate the code for a stock trading web app!
How much does a prompt engineer make?
In the midst of the AI (buzz) and revolution, a prompt engineer can make anywhere between $150-300 per year.
As an interesting example, a prompt engineer and librarian job posting would look like that.
How Does OpenAI Work?
Prompt engineering case study
Here is a prompt engineering example with some best practices included in the process.
Customer refund for a television
Imagine that a customer contacts an electronics company requesting a refund on a television they recently purchased. The company wants to use a model that would assist the customer service department by generating a plausible response.
In a trial run, a hypothetical or “test” customer contacts the company with the following query: Hello, I’d like to get a refund for the television I purchased. Is this possible?
To design the prompt and by extension, useful ways in which the agent can interact with the customer, the company starts by informing the model of the general setting and what the rest of the prompt will contain.
The prompt may read something like this: This is a conversation between a customer and customer care agent who is helpful and polite. The customer’s question: I’d like to get a refund for the television purchased. Is this possible?
Now that the model knows what to expect, it is shown the start of the response it should provide to the customer: Response by the customer care agent: Hello, we appreciate you reaching out to us. Yes,
Combining the first and second parts the prompt clarifies that the response to the customer query comes from a customer care agent and that the answer should be positive.
Composition of the customer care language model
The above scenario can be summarized by defining the components of the model itself:
- Task description – This is a conversation between a customer and customer care agent who is helpful and polite.
- Input indicator – the customer’s question.
- Current input, and
- Output indicator – Response by the customer care agent: Hello, we appreciate you reaching out to us. Yes,
Note that input and output indicators are an effective way to describe desired tasks to the model – especially when multiple examples are included in the prompt. Based on this, the model may produce three text outputs (known as completions) to complete the sentence after the comma:
- Yes, we can accept returns if the television is unused, unopened, and not damaged.
- Yes, we are happy to process a refund for your television purchase. However, please note that we require the television to be returned to your nearest store.
- Yes, this is possible. Please reply with your name, address, phone number, and receipt number at your earliest convenience. One of our customer care staff will be in touch with you as soon as possible.
While this is a somewhat simplified approach, it is clear that in this example the model clarifies several plausible completions with only a small number of customer service interactions.
In theory, the electronics company could fine-tune the model with examples of how it should respond to specific questions, requests, and comments.
- Prompt engineering is a natural language processing (NLP) concept that involves discovering inputs that yield desirable or useful results.
- Like most processes, the quality of the inputs determines the quality of the outputs in prompt engineering. Designing effective prompts increases the likelihood that the model will return a response that is both favorable and contextual.
- Developed by OpenAI, the CLIP (Contrastive Language-Image Pre-training) model is an example of a model that utilizes prompts to classify images and captions from over 400 million image-caption pairs.
Read Next: AI Chips, AI Business Models, Enterprise AI, How Much Is The AI Industry Worth?, AI Economy.
Connected Business Frameworks
Artificial Intelligence vs. Machine Learning