Evaluating output from AI assistants requires more than simply reading the response—it demands a critical eye toward accuracy, completeness, and appropriateness. By systematically reviewing key aspects of the content, users can avoid misinformation, miscommunication, and potential reputational risk.
Key Insights
- Verify factual accuracy by cross-checking citations, source legitimacy, URLs, and quoted material to avoid falling victim to AI-generated misinformation or hallucinations.
- Assess whether the AI output fully addresses the original prompt, delivers relevant and complete answers, and presents information clearly and understandably for the intended audience.
- Evaluate tone and context suitability, ensuring the content meets the expectations of the target audience and setting, especially in professional or public-facing communication.
This lesson is a preview from our AI Prompt Engineering for the Government Workforce Course. Enroll in a course for detailed lessons, live instructor support, and project-based training.
As we get output from our AI assistant, there are a few things we want to keep in mind. One is the accuracy of that statement. Is it factually correct? Are the sources listed? Even if those sources are listed, are they legitimate sources? Like it doesn't matter if it takes you out to, you know, Brian Sim's blog or something, you know, that's not a reputable source necessarily.
So you're going to want to fact-check all of those sources for accuracy. Prompt adherence. Did it do everything you asked it to do? That's another fantastic, just take the two seconds to ask yourself that question before you copy and paste and send it in an email to your boss or something.
Did it actually accomplish everything that you asked it to accomplish? A very good thing to verify. Does it actually answer your question? Like I've certainly had that experience where, yeah, I mean, everything you said is correct, and you know, you did what I told you to do, but you're not really answering my question, AI. So, you know, maybe I need to answer that, ask that question in a different way, or you answered it, and you did what I said to do, but that doesn't solve my problem.
I still don't have what I need. So you definitely want to check that its output is relevant. Is it clear? Is it easy to understand and easy to follow? If you're having problems with it, again, chances are someone else will too.
A prompt I use, by the way, a little tip, I use this all the time because often, you know, when I'm engaging with an AI assistant, I'm using that AI assistant because it's something that I don't understand. So I am very quick to ask it, explain this to me as if I, you know, were five years old or in the simplest terms you can possibly imagine. So obviously, a five-year-old is not always going to be appropriate for your audience, but you can wrestle with your AI assistant to get it to a level of clarity where it makes sense to you, it will to those around you.
And in harmony with that, does the tone fit the audience? Because I can understand it, and this is written in, you know, plain Brian language, doesn't mean that that's appropriate for this executive presentation or something that we want to release to the public. So that's another great thing to check. It can be correct, it can have, you know, adhered to the prompt, it can be very clear, but is it the correct tone? That's a fantastic thing to look for before, you know, putting your answer out into the world.
And completeness. Are there any important parts missing? This may sound a little redundant from some of the other things to judge the output by, but, you know, I keep this in here because it's, I think it's pivotal that, you know, when you're in the work, in the middle of the throes of doing the work, okay, finally it did everything that, you know, I asked it to do, and it seems complete, and all the sources are accurate, it can check all the boxes. But this is your zoom out step.
Does this talk about everything that needs to be talked about here? Or is maybe this fine because of the context, you know, that this is being put into, you know, it's maybe all okay. But these are a few quick things, just six little things to check for whenever you get output from your AI assistant. And we do always want to check our AI's output because there is something out there called AI hallucinations.
And that hallucination means it has literally just made something up. There are a lot of cases out there where folks have even gotten in trouble because they're using AI and not following the steps that we just listed. So what can you do? You can cross-check unfamiliar terms or statistics or names.
You know, just because it sounds smart or it seems really confident, oh, that sounds like a real report, it follows the structure of, you know, reports like it, it doesn't necessarily mean it's correct. You know, again, there have been cases where folks have gotten in, you know, quite a bit of trouble for, you know, citing case law that doesn't exist. Like, they clearly used AI, and the AI just made it up.
So we also want to be wary of confident or specific claims that don't include citations because the AI assistant, like that case law, may make up quotes or studies or data just to complete the prompt, and it may not actually be true. So, we can always ask the AI assistant to cite its sources. But even then, don't blindly trust them.
AI may also generate fake links or incorrect references. So a really good rule of thumb is to always verify the URLs, the publication names, and any information that it gives you, go out and see for yourself. And asking your AI to think longer or use deep research or whatever your platform's equivalent to ChatGPT's deep research is, while that may help reduce hallucinations because it's using real-world data, real web data, it's still not perfect.
It can still even misrepresent those sources or synthesize an incorrect summary. So again, always, always, always check those cited links in the quoted content. And we'll leave the topic of AI hallucinations with this.
A real red flag that your AI may be hallucinating is overly polished or vague, authoritative language. So when it's making a really strong assertion but doesn't do anything to back it up, just like if you were talking to a person, warning, right? Recent studies have shown, well, what recent studies? I need you to cite your sources. So these are really, really good things to look out for to make sure you're not subject to AI hallucinations.