self-study / Technology 1 Credit

Jun. 25, 2025

Why ChatGPT writes fake court opinions

Clint Ehrlich

Partner
Trial Lawyers for Justice

Clint Ehrlich is a partner at Trial Lawyers for Justice and a computer scientist who served as a principal investigator for the National Science Foundation. He sits on the AI Committee of the U.S. Court of Appeal for the 9th Circuit.

The most consequential legal technology of the century goes by two letters: AI. But few lawyers understand what is actually going on behind the scenes when they submit a query to an artificial intelligence platform like ChatGPT. That is evinced by the recently-imposed sanction orders for the misuse of AI against even high-profile law firms.

As both a computer scientist and an appellate attorney, I see myself as a bridge between the world of artificial intelligence and that of legal practice. If all lawyers grasped the internal architecture of AI systems, they would be less prone to blindly trust them and more capable of harnessing their powers. That is the goal of this article: To explain how today's "LLMs" (Large Language Models) work, so lawyers can develop a healthy respect for their capabilities and limits.

Distilled to their essence, LLMs are like a souped-up version of the autocorrect feature on your iPhone. When you are composing a text message, autocorrect looks at your draft and suggests the word that it considers most likely based on what you have typed in the past. If you accept every suggestion that autocorrect makes, you will end up with a primitive AI impersonation of your writing style. For example, I am going to have autocorrect complete this sentence, and then I will be able to respond substantively to your email.

Those last 12 words are not mine. They are the product of a statistical model running inside my smartphone. They sound vaguely like something I would write but, as you probably noticed, they do not actually make very much sense in context. That is obvious because the patterns that autocorrect learns are very simple. If you write "Supreme," it will suggest "Court." If you write "love," it knows "you" is likely to follow. But it does not base its suggestions on the full context of your conversation.

LLMs do. That is what makes them so powerful. They predict the next word in a sentence based on the sequence of preceding words, even if that sequence is thousands of words long. Then they add the predicted word to the end of the sequence. That creates a new sequence, which the LLM uses to predict the new next word. Repeating that process over and over allows an LLM to extend a sequence indefinitely by tacking on more and more words to the end. (Technically, LLMs break up words into smaller chunks called tokens, and they inject randomness to avoid making the same predictions every time, but these details do not alter the basic picture we are sketching.)

To understand how LLMs predict convincing words, imagine a chart where the vertical axis is the length of a word, and the horizontal axis is alphabetical order. Every possible word has a unique location on the chart based on those two attributes. And because every text is a sequence of words, whether it is a novel, a brief, or a Wikipedia article, it is also a unique pattern of movement between locations on the chart.

Now, imagine that we added another axis, representing the rarity of each word. Every sequence of words still traces a unique shape on our chart, but now those shapes are patterns of movement in a three-dimensional cube instead of on a two-dimensional plane. We could train a model to learn the patterns that texts make inside that cube, but it would only know how to pick words with the appropriate length, leading letters, and rarity -- not the "right" words. To capture the patterns needed for coherent prose, instead of a cube, we would need a chart with thousands of axes.

It is impossible for humans to picture such a complex chart, but that is what LLMs use. Our intuition breaks down after three dimensions, yet the math continues working perfectly. LLMs map words inside spaces with thousands of dimensions, and they learn the movement patterns that texts trace across those higher-dimensional spaces. This allows them to pick the right "next word" as measured by thousands of different attributes. Unlike in our toy example, those dimensions are not handpicked by human beings. The LLMs are the ones defining their own dimensions for mapping words, ones that do not track perfectly to our own human understanding.

This accounts for not only why LLMs can write so well, but also the uncanny verisimilitude of their false answers. Even when what they are saying is wrong, their outputs still embody many of the patterns we would expect to see in the truth.

An LLM writes like a great musician who receives the first half of a song and is asked to reconstruct the rest. What the musician comes up with will not be the real ending of the song, but if you have not heard the original you will be fooled. The musician knows how songs are supposed to sound, so what he improvises will follow conventions like rhythm and chord progression.

ChatGPT is doing the same thing when it makes up fake court opinions. It knows how legal opinions are supposed to sound, so it can come up with convincing imitations. More technically, it knows the shapes that legal opinions trace in higher-dimensional spaces, so it traces those shapes too.

Filter by date

Exact Phrase

Not the words

At least one of the words

All the words