Fair use underpins how AI models learn

Fair use is at the heart of how AI models learn. Courts have long held that using copyrighted material is allowed when it serves a new purpose. AI training does exactly that. Models learn patterns and relationships rather than copying or republishing works. Copyright law protects creators while giving AI the room it needs to innovate.

How AI models learn

Generative AI, including large language models (LLMs), train on large datasets to understand how words and ideas connect. The more information an AI model can learn from, the better it gets at drafting the accurate, relevant responses that can drive breakthroughs that benefit society. As the debate about AI and copyright continues, it helps to start with the basics of exactly how models are trained.

1

Step 1

The training process begins with the model processing millions, if not billions, of text sequences. This involves analyzing vast amounts of raw text from training data and breaking it down into smaller parts.

RAW TEXT is filtered and processed.
2

Step 2

These smaller parts, called tokens, represent words, sub-words, or even characters that the model can process.

TOKENS are created.
3

Step 3

Each token is then assigned a numerical representation, called an “embedding,” that captures its meaning and statistical relationship across the training dataset. This helps transform complex data that humans can read into numbers that a computer can process.

EMBEDDINGS are created.
4

Step 4

LLMs transform words and their relationships into vectors — points in a multi-dimensional space that form a kind of map the model uses to understand and generate language. This mathematical map doesn’t need to store the original content — instead, it focuses on the relationships between words in order to generate new material.

MODEL TRAINING interprets relationships between embeddings.
5

Step 5

Once training is complete, the model uses what it has learned to generate original  responses. It predicts one word at a time based on the statistical relationships encoded during training, assembling sentences that reflect patterns drawn from the dataset as a whole. This process allows LLMs to create original outputs without need to retrieve, replicate, or store the underlying training text.

RESPONSE generates answers
from each interpretation.
6

Step 6

For example, if a user asks “Do queens live in a castle?” the model would reference the relationships between similar key words. Because “queen” is closely related to “king” and “royal” in its training data and “castle” is strongly associated with “monarchs” and “residences,” the model recognizes these concepts frequently occur together. It then responds: “Yes, as royals, kings and queens often live in castles.

RELATIONSHIPS BETWEEN WORDS guide the model’s answer.

4 key takeaways

AI models generate new 
works based on patterns

AI models learn by breaking down text and other data into smaller parts. They don’t need to store, retrieve, or reproduce the original materials they’ve been trained on.

Law protects expression, 
not ideas or competition

U.S. copyright law protects creative expression, not underlying ideas, facts, or patterns. Similarly, the law doesn’t protect against new competition — only against copying existing expression.

Courts back using copyrighted materials in AI training

Courts have found that transformative uses — including AI training on whole books or datasets — are lawful when they serve a new purpose and don’t replace the original works in the market.

Restricting AI training threatens American innovation

Limiting access to training data would undermine the development of AI technologies and run counter to the goals of copyright, which seeks to promote creativity and progress.

Understanding the AI & copyright landscape

Fair use is a core tenet of U.S. copyright law and is essential to how we build on existing ideas to create something entirely new. The doctrine draws a clear line between protected creative expression and the facts that fuel learning, support research, and 
foster discovery.

What fair use protects — and doesn’t

The fair use doctrine protects creative expression, not facts, ideas, or patterns — the elements used to train AI models.

Read more fair use facts

Separating fact from fiction in AI training

AI turns words into numbers, builds mathematical maps of language, and uses those patterns to generate new content.

Learn how AI training works

Changing law puts America’s competitive edge at risk

Restricting AI model training threatens U.S. competitiveness in a fast-moving global race.

Learn how fair use applies to AI