Large language models (LLMs) are AI tools that can read, summarize, and translate text. This enables them to predict words and craft sentences that reflect how humans write and speak.
Understanding large language models
Large language models have transformed natural language processing (NLP) because they have facilitated the development of powerful, pre-trained models for a variety of tasks.
Large language models are trained on vast datasets with hundreds of millions (or even billions) of words. Complex algorithms recognize patterns at the word level and allow the model to learn about natural language and its contextual use.
LLMs such as GPT-2 and BERT have replaced a lack of in-house training data and the tedious feature extraction process with datasets that train large neural networks. These models rely on recurrent neural networks (RNNs) to parse the data and predict what words will come in next in a particular phrase or sentence.
For example, if a model analyzed the sentence “He was riding a bicycle”, the LLM can understand what a bicycle is by analyzing swathes of data from words that tend to surround it. This makes them a powerful and versatile AI tool that provides accurate natural language generation, sentiment analysis, summarization, and even question-answering.
How are large language models trained?
Large language models are fed with text excerpts that have been partially obscured, or masked. The neural network endeavors to predict the missing parts and then compares the prediction to the actual text.
The neural network performs this task repeatedly and adjusts parameters based on the results. Over time, it builds a mathematical model of how words appear next to each other in phrases and sentences.
Note that the larger the neural network, the greater the LLM’s capacity to learn. The LLM’s output is also dependent on the size and quality of the dataset. If the model is exposed to high-quality, well-curated text, it will be exposed to a more diverse and accurate array of word sequences and make better predictions.
Large language model examples
Turing NLG is a 17-billion parameter LLM developed by Microsoft. When it was released in early 2020, it was the largest such model to date.
The model is a transformer-based generative language model. This means it can generate words to finish an incomplete sentence, answer questions with direct answers, and provide summaries of various input documents.
Gopher is a 280-billion-parameter model developed by DeepMind. Gopher was based on research into areas where the scale of the model boosted performance such as reading comprehension, fact-checking, and the identification of toxic results.
Research has discovered that Gopher excels in Massive Multitask Language Understanding (MMLU), a benchmark that covers model knowledge and problem-solving ability in 57 subjects across numerous STEM disciplines.
OpenAI’s GPT-3 is fed with around 570GB of text information sourced from the publicly available dataset known as CommonCrawl.
With one of the largest neural networks ever released, GPT-3 can recreate anything that has a language structure. This includes answers to questions, essays, summaries, translations, memos, and computer code.
- Large language models (LLMs) are AI tools that can read, summarize, and translate text. They can predict words and craft sentences that reflect how humans write and speak.
- Large language models are fed with text excerpts that have been partially obscured, or masked. The neural network then endeavors to predict the missing parts and then compares the prediction to the actual text.
- Three popular and powerful large language models include Microsoft’s Turing NLG, DeepMind’s Gopher, and OpenAI’s GPT-3.
Connected AI Concepts
Deep Learning vs. Machine Learning
OpenAI Organizational Structure
Stability AI Ecosystem
Main Free Guides: