Law Practice,
Ethics/Professional Responsibility
Dec. 12, 2025
Treat your (human) colleagues as allies - and your LLMs as adversaries
As large language models like ChatGPT and Claude proliferate, attorneys must balance their promise for efficiency and insight with the real risk of hallucinated or fabricated citations, treating AI as a cautious adversary while relying on human colleagues for verification, mentorship and critical oversight.
As large language models (LLMs) like ChatGPT and Claude
improve and proliferate, they present both promise and peril to new and
experienced attorneys alike. Amid a recent increase in improper (hallucinated)
caselaw citations by lawyers and even some judges, attributed in most cases to
AI-generated content, practitioners would do well to remember two simple
truths: Your (human) colleagues are your allies, but your LLMs are (often) your
adversaries.
This is not to say that generative AI has no place in the
future of legal practice. On the contrary: learning to harness the power of
LLMs may well constitute a cornerstone of legal practice in the near and
distant future. Indeed, like any technological tool, AI offers tremendous
promise in terms of efficiency, accuracy and even creativity. But, like any
technology, LLMs will assume their rightful place only when attorneys learn to
leverage their advantages while avoiding, or at least mitigating, their fatal
shortcomings.
Take the recent citation controversies, which have seen
junior and senior attorneys alike fall prey to seemingly plausible but non-existent
precedent identified by LLMs. In June 2023, just months after GPT-4
demonstrated passing scores on various bar exams, "it" was caught fabricating
case law in a legal brief. Since then, a plethora of law firms across the
country, including well-known names, have been admonished by courts for
over-reliance on LLMs, citing cases that were either fabricated or materially
misrepresented by an AI tool. Our research identified over 300 cases in the
United States alone where a lawyer or law firm was caught using improper AI citations -- i.e.,
found responsible for doing so by a judge in a written opinion; presumably
there have been many more such instances that have gone unpublicized or even
unnoticed.
At times, senior attorneys atop the pleadings have played
the blame game, insisting to judges that junior or even unlicensed attorneys
conducted the fabricated legal research. Courts have rightly discounted this
excuse, pointing to supervising attorneys' ethical duty to oversee the work of
junior attorneys and chastising those in case leadership roles for attempting
to avoid their own responsibility. On most occasions, thankfully, partners have
been more forthcoming, owning the mishaps, issuing apologies and implementing
AI training programs at their firms.
But the written opinions and firm statements, taken
together, reveal little about why otherwise competent law firms and some
jurists continue to rely on fabricated or misrepresented cases. While it may be
logical to attribute these mistakes to case mismanagement and constrained
bandwidth -- concepts that are not new to legal practice and have manifested in
inapposite citations making their way into briefs since long before the
proliferation of LLMs -- the real rationale may actually be
less culpable: The citing attorneys, plainly, got duped by a well-trained
persuasion machine.
How so? Attorneys, like all LLM users, are well-aware of
chatbots' tendencies to hallucinate, but many don't fully appreciate exactly
how and why they do so. While we generally recognize LLMs eagerness to please and
reinforce our beliefs, we often fail to understand the conceptual underpinnings
guiding their approach -- and how that shapes our responses. Only a more
complete grasp of how LLMs approach their tasks will allow litigators to
optimize their use without falling into the fake case trap.
LLM chatbots like ChatGPT are, first and foremost, text
generators. They are programmed to generate text using an internal statistical
model, repeatedly predicting the most likely next token. This statistical model
is then tuned until it does a statistically acceptable job of predicting the
internet's text. Along the way, it learns grammar, logic and facts
simultaneously from the same training data.
Emerging research has suggested various mechanisms to
explain what the models "learn." For
example, Anthropic has shown that LLMs can be modeled as a massive collection
of independent circuits that run in parallel. Rather than
executing a rigorous, logical deduction process, independent circuits fight to
influence the output. Certain circuits recognize patterns they have seen
before. Grammatical circuits are strong because they are trained repeatedly in
the training set. Factual circuits are weaker, but hopefully fire strongly
enough in the right circumstances to generate factually correct output. This
research has suggested that a circuit that states facts it knows and a circuit
that refuses to answer questions that it cannot answer may also be fully independent.
The faithfulness of the output then depends on the relative strength of the
circuits.
After being trained on training text, LLMs are
"fine-tuned" based on human feedback. In this process, different responses are
presented to human users, and the circuits that produced the "most helpful"
output are strengthened. The result of this process is that the model is
fine-tuned to appear helpful, rather than actually be
helpful.
As the ability of LLMs to produce accurate outputs
increases, so too does their ability to replicate a helpful and accurate
output. In other words, the same training and development that makes LLMs more
trustworthy also makes them convincing liars. And the same "improvements" that
indulge LLMs' tendencies to reinforce what we suspect or hope to be true lead
attorneys to imbue these tools with undeserved authority when their putative
results favor our positions.
How, then, can attorneys at all levels avoid these
pitfalls?
First, attorneys must make efforts to account for and
counteract LLMs hallucinatory tendencies. The practice of litigation is an art
of imperfect analogy. When foursquare precedent does not exist, it is not
uncommon, and not necessarily frowned-upon, for litigators to emphasize
parallels in the holdings of previous cases that may not directly relate to the
case at hand. In fact, sometimes analogies that are somewhat far afield can
assist a court in understanding and adopting a litigant's view of a particular
issue.
Consequently, it is the
responsibility of the attorney to discern, articulate and, where appropriate,
establish connections between contested matters and applicable legal precedent.
Oftentimes, this
process begins with keyword searches curated to reveal similar language in
court opinions -- a task tailor-made for LLMs, which function by recognizing,
replicating and predicting patterns in language. The risk is that LLMs can then
use these patterns against the user's best interest, creating citations based
on grammatical and linguistic patterns that "feel right," but are not
necessarily factual.
To be sure, litigators must ensure they account for and
eliminate outright hallucination by cite-checking and reviewing any cases cited
to or explained by an LLM, but this alone does nothing to improve or streamline
output. Attorneys can and should harness LLM capabilities (and in some cases,
minimize the risk of fake cases) by developing strategies for writing prompts
that increase the likelihood of a usable result. For example, requesting that
the LLM provide a summary of the case facts, the name of the judge who wrote
the opinion and a Westlaw or Lexis citation may increase the likelihood of a
real result. In other words, verify everything: treat your LLM as an adversary
who, however well-intentioned, does not necessarily have your (or your
client's) best interest in mind.
Additionally, an LLM may be more likely to produce
effective output when clearly instructed on the purpose and scope of its
assigned task. However, unlike a human being, an LLM is not constrained by the
need to actually comprehend information to determine
its relevance. As a result, LLMs can be effective tools for compiling accounts
of large bodies of case law.
It is important to keep in mind that strategies of this
nature are not a failsafe but a starting point. In its current technological
state, the process of pattern extraction and synthesis will sometimes lead an
LLM to "find" a case that does not exist, or to contrive hallucinated holdings
from real cases. In other instances, it will indeed find just the right case and
dramatically streamline the research process. And at other times still, it will
recognize patterns the attorney may not have, and lead to an unlikely, but
effective, case analogy. To get the benefits without
the risks, prompt from different perspectives and always verify outputs before
trusting them, no matter how convincing they seem.
Second, attorneys present and
future can practice utilizing LLMs early and often. Law students, particularly
those who are just shy of becoming practitioners, are uniquely poised to learn
the ins and outs of LLMs as legal practice tools. While some law schools offer
courses that encourage or even teach LLM usage, the bulk of the response to
AI's proliferation in law schools has been to ban or strictly limit AI's
intervention in the legal learning process. Examples of these include
disallowing internet access during final exams (such that students don't
utilize ChatGPT or related platforms) or issuing outright bans on utilizing any
form of AI for legal research.
Of course, a tool capable of producing information with a
relatively high degree of accuracy poses a real and tangible threat to student
absorption and retention of information. But unqualified prohibitions such as
those described should not stand in the way of teaching law students the
burgeoning art of LLM case research and synthesis. While we cannot be certain
about the LLM's future role in the legal profession, there are steps we can
take to ensure that we are prepared for it.
Ultimately, whether LLMs transcend their people-pleasing
tendencies to become effective assistive tools in litigation practice will depend
on how lawyers use them, and there is no better or safer forum to develop LLM
literacy than law school. By making conscious and well-informed choices as to
their inputs, students (and attorneys) can bolster the reliability of their
outputs. And of course, take a moment to search for your case in your legal
research database of choice, just to make sure it exists and stands for the
proposition for which the LLM has cited it. And good news for law students: the
tendency of generative AI to hallucinate facts limits the viability of the LLMs
replacing human attorneys anytime soon, as many attorneys originally feared.
Third, and finally, senior attorneys can and should
reinforce the best practices of the lawyers they supervise, both by emulating
the "don't trust, but verify" approach necessary to ensure accuracy and, more
importantly, by collaborating with their junior colleagues in an effective way
designed to ensure success. Partners and counsel should make sure newer lawyers
train appropriately on LLMs and take all necessary steps to ensure the accuracy
of their output. Senior attorneys must also work cooperatively with their
junior colleagues to foster the kind of critical thinking, skepticism and
devil's advocacy that not only mitigates the most serious risks of generative
AI but also more generally ensures that the entire litigation team is engaging --
and, hopefully, rebutting -- their adversaries' most powerful arguments. By
treating LLMs as opponents and (human) colleagues as allies, attorneys can
benefit from the best of both.
Submit your own column for publication to Diana Bosetti
For reprint rights or to order a copy of your photo:
Email
Jeremy_Ellis@dailyjournal.com
for prices.
Direct dial: 213-229-5424
Send a letter to the editor:
Email: letters@dailyjournal.com
