AI Language Models, GPTs, and Their Role in Government

AI technologies such as language models are transforming how computers process, understand, and generate human language. By comparing small and large language models, including powerful systems like GPTs, users can better understand their capabilities and practical applications in both everyday and specialized contexts.

Key Insights

Language models, including small (SLMs) and large (LLMs), use neural networks to analyze and generate human language, with LLMs offering greater contextual understanding and versatility.
Small language models are lightweight and well-suited for localized tasks such as website chatbots or mobile autocomplete features, though they operate with limited scope and data.
Large language models like GPTs are trained on vast datasets and use self-attention mechanisms to generate coherent, human-like responses across a wide range of tasks without specific retraining.

This lesson is a preview from our AI Fundamentals for Government Employees Course. Enroll in a course for detailed lessons, live instructor support, and project-based training.

Let's get ready to introduce language models and GPT-style systems. We'll also provide an overview of current AI technologies and their capabilities.

We'll examine some government use cases for AI applications, and we'll practice using AI assistants to support our daily work tasks. We'll also talk about improving AI outputs to craft better prompts and test some different approaches when using AI. So let's talk about language models.

Just like where we had the problem of cat identification, and we built an AI that can detect cats, well, people have built AI models that specialize in all things language. A language model is a type of AI that knows everything about how language works. These language models help computers understand, generate, and predict text based on patterns in human language, kind of like a super smart autocomplete, or a tool that can guess what comes next in a sentence based on the words that you've already typed.

Now, there are a couple of types of language models that do different things. So one type of language model is SLMs, or small language models. Older statistical language models relied on word-counting probabilities from a body of text, but a modern SLM uses a neural network architecture to process language.

Though significantly smaller than its larger counterpart, small language models are much more computationally demanding than statistical models. However, they're still lightweight enough to be deployed locally on consumer hardware or edge devices, which is a major advantage. You may have encountered small language models when you, say, encounter a chatbot on a website, or on your phone's autocomplete, or some other type of edge computing.

It doesn't have the full breadth of the internet at its disposal, like you can't ask your phone provider's chatbot about the game last night, or what the weather is like today, but you can ask it questions related to that specific task. So, examples like those, you probably see small language models at play. Now, its larger counterpart is aptly named large language models.

LLMs are trained on massive amounts of data, like millions of books, articles, and websites, and each of those books, articles, and websites has billions, if not trillions, of parameters within them. So they really understand quite a bit. LLMs can understand much larger chunks of text, sometimes entire paragraphs, or entire documents, or even entire books, and because they're so much bigger, they can capture context and meaning in ways that the small models can't.

LLMs are better at understanding the meaning of words in a specific context, so they can answer more complicated questions, generate more natural-sounding text, and even hold conversations with you. They can be more creative and generate content like stories, essays, code, or poetry. An SLM couldn't do that.

And large language models can perform many tasks, like language translation, summarizing, and question and answering questions, without the need to be specifically trained for each one of those specific tasks. So hopefully that difference is pretty clear. The reason I bring that up is that we have all probably encountered GPTs, and GPTs are a specific type of large language model.

GPT stands for Generative Pre-trained Transformer, and to break that down, generative is because it makes stuff. It doesn't just execute a specific task, like identify the cat, and that's it. It makes new things.

It's pre-trained, which means it's doing the work, or the work has already been done, and it's ready to be used. And that word transformer there refers to the model's architecture, which we won't get into in this course. But you can think of a GPT as like a person who loves to read and write, and has read a vast amount of books, articles, websites, magazines, all sorts of things, and it is super ready and anxious to give you human-like responses to your questions and tasks, and simulate a conversation with you.

It's ready and waiting for you. A GPT uses a mechanism called self-attention, which allows the model to weigh the importance of different words and text, regardless of their position, to understand context and relationships. This enables the GPT to generate coherent and relevant responses by predicting the most likely word based on the entire input, rather than just individual words sequentially.

Brian Simms

Brian Simms teaches for Graduate School USA in the area of Artificial Intelligence, helping federal agencies build the knowledge and skills needed to adopt AI responsibly and effectively. An AI educator and author, he focuses on practical, mission-driven applications of AI for government leaders, program managers, and technical professionals.

Key Insights

Brian Simms

How to Learn AI