• New research from Stanford and Yale reveals that large language models are not just “learning” but storing and reproducing book content verbatim.
  • Four popular models, including OpenAI’s GPT, Anthropic’s Claude, Google’s Gemini, and xAI’s Grok, were found to memorize copyrighted content.
  • When prompted correctly, Claude was able to reproduce nearly the entirety of Harry Potter and the Sorcerer’s StoneThe Great Gatsby1984, and Frankenstein.
  • A total of 13 books were tested, with many having thousands of words extracted almost verbatim.
  • This phenomenon is called “memorization,” contradicting previous claims by AI companies that models do not store copies of training data.
  • OpenAI and Google previously asserted that models only save linguistic “patterns,” not original content.
  • In reality, LLMs function as a form of lossy compression, similar to MP3 or JPEG, compressing data but retaining the core.
  • A court in Germany compared ChatGPT to a compressed file and issued a ruling unfavorable to OpenAI in a lawsuit by GEMA.
  • AI image generators like Stable Diffusion have also been found to reproduce images nearly identical to the originals.
  • Models can copy not just wording but also structure, imagery, and artistic style.
  • Other research indicates that 8–15% of LLM-generated content already exists verbatim on the web.
  • This opens up huge legal risks, potentially causing AI to be forced off the market or retrained from scratch.
  • Although Sam Altman defends AI’s “right to learn,” researchers argue that the comparison to humans is misleading.
  • Experts assess that memorization is an inherent feature, not a rare bug.

📌 New research from Stanford and Yale shows that large language models are not merely “learning” but storing and reproducing book content verbatim. This finding shakes the theoretical foundation of the generative AI industry. Instead of “learning” like humans, models operate by compressing and retrieving data, leading to a risk of copyright infringement on a massive scale. If courts deem AI models to be illegal copies, the industry could face billions of dollars in fines and be forced to restructure the entire way AI is developed in the future.

Share.
Contact

Email: info@vietmetric.vn
Address: No. 34, Alley 91, Tran Duy Hung Street, Yen Hoa Ward, Hanoi City

© 2026 Vietmetric
Exit mobile version