Microsoft and HarperCollins partner up to train AI models on books

Microsoft is reportedly hunting for high-quality training data.
By Cecily Mauran  on 
the 'Microsoft' logo is displayed on a mobile phone screen in front of a computer screen displaying the 'Team Copilot' logo
The Microsoft and HarperCollins strike a major AI licensing deal. Credit: Omer Taha Cetin / Anadolu / Getty Images

Microsoft has signed a licensing deal with HarperCollins to train its AI models.

According to Bloomberg sources, HarperCollins will allow Microsoft's LLMs to train on nonfiction titles. Microsoft reportedly doesn't plan on creating AI-generated books, but instead will access high quality data to make its models more intelligent and accurate. "HarperCollins authors will have the option to participate or not," said the outlet.

404 Media first broke the news of a licensing deal with an anonymous AI company. Author Daniel Kibblesmith shared screenshots of an email on Bluesky, likely from his agent informing him about the deal. "You are likely aware, as we all are, that there are controversies surrounding the use of copyrighted materials in the training of AI models," said the memo. "Much of the controversy comes from the fact that many companies seem to be doing so without acknowledging or compensating the original creators. And of course there is concern that these AI models might one day make us all obsolete."

Mashable Light Speed
Want more out-of-this world tech, space and science stories?
Sign up for Mashable's weekly Light Speed newsletter.
By signing up you agree to our Terms of Use and Privacy Policy.
Thanks for signing up!

According to the screenshots of the email, HarperCollins is offering a non-negotiable payment of $2,500 per title for a three-year licensing deal.

HarperCollins confirmed there is a deal with an unnamed AI company, telling Bloomberg, "its limited scope and clear guardrails around model output" respects authors' rights while presenting them with new opportunities.

Meanwhile, multiple outlets have reported that AI companies, like Google, OpenAI, and Anthropic are getting diminishing returns with the development of new models because they're running out of high quality data to train on. Microsoft was not included in these reports, but its Copilot model relies on underlying genAI technology from OpenAI. So training AI models on nonfiction works might be a strategy to combat lessening improvements.

Mashable Image
Cecily Mauran

Cecily is a tech reporter at Mashable who covers AI, Apple, and emerging tech trends. Before getting her master's degree at Columbia Journalism School, she spent several years working with startups and social impact businesses for Unreasonable Group and B Lab. Before that, she co-founded a startup consulting business for emerging entrepreneurial hubs in South America, Europe, and Asia. You can find her on Twitter at @cecily_mauran.


Recommended For You
Stock up on hundreds of free books on the latest Stuff Your Kindle Day
Kindle in hands


Score free historical mystery books on Stuff Your Kindle Day
Selection of mystery books

Find a new favorite read with two free Kindle books for Prime members
Amazon Kindle free reads on purple and blue abstract background


Trending on Mashable
NYT Connections hints today: Clues, answers for December 15, 2024
A phone displaying the New York Times game 'Connections.'

Wordle today: Answer, hints for December 15
a phone displaying Wordle



NYT Strands hints, answers for December 15
A game being played on a smartphone.
The biggest stories of the day delivered to your inbox.
This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our Terms of Use and Privacy Policy. You may unsubscribe from the newsletters at any time.
Thanks for signing up. See you at your inbox!