self-study / Technology 1 Credit

Dec. 6, 2024

California cracks open AI's black box

Matthew G. White

Shareholder
Baker, Donelson, Bearman, Caldwell & Berkowitz, PC

Email: mwhite@bakerdonelson.com

Alexander F. Koskey

Shareholder
Baker, Donelson, Bearman, Caldwell & Berkowitz, PC

Email: akoskey@bakerdonelson.com

Madison "MJ" McMahan

Associate
Baker, Donelson, Bearman, Caldwell & Berkowitz, PC

Email: mmcmahan@bakerdonelson.com

California is making waves with its new AI law, Assembly Bill 2013 (AB 2013), set to take effect in 2026. This groundbreaking legislation (again) puts the state at the forefront of tech regulation by tackling one of AI's biggest challenges: the "black box" problem. AB 2013 demands transparency, requiring AI companies to disclose detailed information about the data they use to train their generative models, shedding light on a previously hidden layer of machine learning. With this bold move, California is leading the charge for accountability in artificial intelligence by requiring AI companies to explain what's going on inside their AI systems.

Unpacking the basics: algorithms, training data, and models

To understand the implications of AB 2013, it is useful to understand the fundamentals of how machine learning works. At its core, machine learning consists of three primary components: (1) an algorithm (or set of algorithms); (2) training data; and (3) the resulting model. The algorithm is essentially a set of instructions or procedures that can be fine-tuned to find patterns. During the training phase, the algorithm is fed a vast array of examples - known as training data - which allows it to recognize these patterns on its own. Once this training phase is complete, the outcome is a machine-learning model. This model is what users actually interact with; it's the tool that applies the algorithm's learned patterns to real-world data.

AB 2013 focuses primarily on the training data piece of this triad. Since training data is fundamental to a model's behavior, any hidden biases or issues in the data directly impact the resulting model, often in ways that are hard to detect or understand. Under AB 2013, developers will need to disclose extensive details about their training data, including its sources, types, and whether it includes copyrighted or sensitive information. This type of documentation offers insight into how models are shaped by the data they're built on - and turns that black box into a somewhat clearer shade of gray.

What AB 2013 entails

Under AB 2013, developers of generative AI models must disclose key details about their training data, effectively pulling back the curtain on how these algorithms have been fed. No more treating data like a casserole of mystery ingredients. The documentation requirements are extensive, covering not only the source of each data point but also its origin, date of collection, and potential biases. By requiring this level of documentation, the law emphasizes the importance of organizations adopting responsible AI practices and providing users and other stakeholders with a clearer picture of how these models function.

The disclosure of training data is crucial to understanding generative AI for several reasons:

1. Accountability and transparency: Generative AI models produce outputs based on patterns and information learned from their training data. Disclosing this data allows researchers, policymakers, and the public to understand the origins of the model's capabilities and limitations, fostering trust and accountability.

2. Bias detection and mitigation: Training data often contain biases that can manifest in the AI's outputs, such as reinforcing stereotypes or excluding certain demographics. Disclosing the data enables scrutiny for such biases and supports efforts to develop fairer AI systems.

3. Intellectual property and ethical considerations: The content of training datasets can include copyrighted material, personal data, or other sensitive information. Transparency ensures compliance with intellectual property laws and privacy standards, while also addressing ethical concerns.

4. Understanding model behavior: Training data shapes the knowledge and "decision-making" processes of AI. Without knowing what data was used, it is challenging to interpret why an AI model behaves a certain way or makes specific errors.

5. Facilitating research and innovation: Disclosed training data can aid researchers in replicating studies, validating claims, and building upon existing work. This openness promotes innovation and scientific progress in the field of AI.

6. Addressing the "black box" problem: AI models are often criticized for their lack of interpretability. Knowing the data used to train a generative model provides insights into how the model generates outputs, making it less opaque and easier to evaluate its reliability.

The law applies to any AI company offering generative AI tools to the public in California. This includes AI giants like OpenAI and Google, but also smaller developers hoping to stake a claim in the AI space. Some generative AI tools might be able to sidestep this law, but only if they're solely focused on data security or national defense. That means if your model's whole purpose is to keep hackers at bay or protect state secrets, you're in the clear. For everyone else, it's time to fess up.

Pros and cons: benefits of transparency and the potential for compliance challenges

Proponents of AB 2013 see it as a much-needed step toward ethical and consumer-centered AI. The law offers a way to address the longstanding issues of bias and data misuse in AI by requiring full disclosure of the datasets used to train these models. Artists and writers, for example, may finally get clarity on whether their copyrighted work has been co-opted as training fodder without their permission. For consumers, transparency provides a better understanding of how these tools operate.

But the law also sets the stage for logistical headaches. Compliance is expected for any generative model created as far back as 2022, which means companies will need to trace back through years of data that wasn't tracked with this law in mind. Big players like Meta and OpenAI might have the resources to handle this, but smaller companies could find this requirement overwhelming or cost-prohibitive. Additionally, it's possible that these transparency requirements may drive some companies to abandon open-source practices, making their models closed-source and therefore, ironically, less transparent.

The potential impact of AB 2013 isn't limited to transparency. It could also curb biases and unethical uses of AI, like deepfakes. Since training data directly influences a model's output, any biases in that data can cause harm in real-world applications, from hiring practices to medical recommendations. By demanding transparency around data sources, AB 2013 allows users, researchers, and policymakers to scrutinize data practices and address potential biases.

The law could also aid in the fight against deepfakes - those ultra-realistic, AI-generated videos or audio clips that can make it look like someone said or did something they never did. If developers are required to disclose data origins, consumers and regulators could more easily identify and challenge AI-generated misinformation. Similarly, AB 2013 offers a path for copyright holders to protect their rights more easily. For example, Sarah Silverman, the New York Times, and other creators have filed claims against AI companies for using their copyrighted works without permission. AB 2013 could give them a leg up by making it easier to investigate whether their content was included in training data, allowing them to track down unauthorized usage and challenge it in court.

California's new role in global AI regulation

California's approach to AB 2013 is no wild experiment. The European Union propounded similar disclosure requirements with the EU AI Act, which also mandates AI companies to share summaries of the training data used on their models. In the U.S., Colorado's Artificial Intelligence Act also contains some of these requirements. These laws are setting standards that could form a global framework for responsible AI, and a "global transparency race" that may push other regions to get on board or risk falling behind.

Why AB 2013 matters for the future of AI

AB 2013 represents California's commitment to creating a more ethical and consumer-focused AI environment. Notably, this comprehensive AI bill sits on top of over 18 AI-related bills signed into law in California this year. Clearly, California remains committed to regulating AI. AB 2013 will challenge developers to rethink data practices that have often been shrouded in secrecy. In a world where AI-generated content is increasingly prevalent, transparency could serve as a bridge, allowing users to see the forces shaping these tools. The law sets a potential blueprint for other states - and possibly even federal regulators - encouraging a framework where AI's power is coupled with a new level of accountability and public trust.

For AI developers, this means transparency is no longer optional and for the rest of us, it's a new chapter in understanding the technologies that shape our lives. By peering into the "black box" of AI, California's AB 2013 challenges the notion that machine learning has to remain mysterious - and forces the AI industry to play by a new rulebook.

Dr. Malcolm's iconic warning in Jurassic Park; "Your scientists were so preoccupied with whether or not they could, they didn't stop to think if they should," serves as a timeless cautionary tale about the risks of unchecked innovation. With AB 2013, California aims to address this very dilemma by mandating transparency in AI development. By shedding light on how AI systems are built and operate, the law seeks to ensure a deeper understanding of the technology, helping society navigate its complexities and avoid catastrophic missteps. If successful, this approach could enable us to harness the transformative potential of AI while sidestepping the pitfalls that led to the chaos of Jurassic Park.

As the legal landscape for AI continues to evolve, navigating the complexities of compliance, transparency, and risk mitigation is becoming more critical than ever. California's AB 2013 represents just one piece of the puzzle, and staying ahead in this rapidly changing environment requires informed, proactive guidance.

Filter by date

Exact Phrase

Not the words

At least one of the words

All the words