Harvard Launches Massive Public Domain Database for AI Training

Technology News

Harvard Launches Massive Public Domain Database for AI Training
AIArtificial IntelligenceAI Training
  • 📰 WIRED
  • ⏱ Reading Time:
  • 14 sec. here
  • 13 min. at publisher
  • 📊 Quality Score:
  • News: 47%
  • Publisher: 51%

The Institutional Data Initiative (IDI) has unveiled a vast public domain database of books, poised to reshape the landscape of AI development. Spanning genres, languages, and centuries, this collection dwarfs existing datasets like Books3, offering a wealth of resources for researchers and developers. The project aims to democratize access to high-quality training data, leveling the playing field for smaller players in the AI industry. Microsoft, OpenAI, and Google are among the prominent supporters of this initiative, recognizing its potential to foster innovation and address concerns surrounding copyright in AI training.

Around five times the size of the notorious Books3 dataset that was used to train AI models like Meta’s Llama, the Institutional Data Initiative 's database spans genres, decades, and languages, with classics from Shakespeare, Charles Dickens, and Dante included alongside obscure Czech math textbooks and Welsh pocket dictionaries.

The Institutional Data Initiative has asked Google to work together on public distribution, but the details are still being hammered out. In a statement, Kent Walker, Google's president of global affairs, said the company was 'proud to support' the project.

We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

WIRED /  🏆 555. in US

AI Artificial Intelligence AI Training Public Domain Copyright Dataset Harvard University Institutional Data Initiative Microsoft Openai Google

United States Latest News, United States Headlines

Similar News:You can also read news stories similar to this one that we have collected from other news sources.

Harvard Launches Massive Public Domain Dataset for AI TrainingHarvard Launches Massive Public Domain Dataset for AI TrainingThe Institutional Data Initiative (IDI) unveils a vast public domain database of books spanning centuries and languages, aiming to level the playing field for AI development by providing accessible training materials. Microsoft, OpenAI, and Google are among the supporters of this initiative, which could redefine how AI models are trained.
Read more »

Harvard Launches Massive Public Domain Book Database for AI TrainingHarvard Launches Massive Public Domain Book Database for AI TrainingHarvard University's Institutional Data Initiative (IDI) has created a vast public domain book database, five times larger than the Books3 dataset, to provide open access to high-quality training materials for AI development. The initiative aims to level the playing field by enabling smaller companies and individual researchers to leverage resources previously accessible only to tech giants. Microsoft, OpenAI, and Google are supporting the project, recognizing the value of accessible data for AI innovation.
Read more »

Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and MicrosoftHarvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and MicrosoftThe project’s leader says that allowing everyone to access the collection of public-domain books will help “level the playing field” in the AI industry.
Read more »

Russia launches a massive aerial attack against Ukraine with dozens of cruise missiles and dronesRussia launches a massive aerial attack against Ukraine with dozens of cruise missiles and dronesRussia on Friday launched a massive aerial attack against Ukraine, involving dozens of cruise missiles and drones. The Russian military targeted Ukrainian power grid, energy minister Herman Halushchenko wrote on his Facebook page. “The enemy continues its terror,” he said.
Read more »

Russia launches a massive aerial attack against Ukraine with dozens of cruise missiles and dronesRussia launches a massive aerial attack against Ukraine with dozens of cruise missiles and dronesRussia on Friday launched a massive aerial attack against Ukraine, involving dozens of cruise missiles and drones.
Read more »

Russia launches a massive aerial attack against Ukraine with dozens of cruise missiles and dronesRussia launches a massive aerial attack against Ukraine with dozens of cruise missiles and dronesRussia on Friday launched a massive aerial attack against Ukraine, involving dozens of cruise missiles and drones.
Read more »



Render Time: 2025-08-30 13:38:14