MoreRSS

site iconManton ReeceModify

I created Micro.blog. I also have 2 podcasts: Core Intuition and Timetable.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Manton Reece

2025-06-25 21:48:17

Day 24 of the photo challenge, although a day late… Bloom.

Training LLMs on books judged as fair use

2025-06-25 02:38:16

I know I said I’d stop blogging about AI for a while, because it has become so divisive, but this court ruling on fair use is too fascinating to ignore. From federal judge William Alsup:

…the use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use under Section 107 of the Copyright Act. And, the digitization of the books purchased in print form by Anthropic was also a fair use but not for the same reason as applies to the training copies. Instead, it was a fair use because all Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies. However, Anthropic had no entitlement to use pirated copies for its central library.

This strikes me as a defensible conclusion. As I’ve written before on AI training, the invention of LLMs may require updating copyright law. But for now we have to work with what we’ve got. There are some similar themes in the text of the judge’s ruling and in my own blog post linked here about C-3PO. This ruling is much more comprehensive, though, starting to narrow in on a path forward.

In a nutshell, the judge says that legally purchased books can be used to train AI, as long as the models do not reproduce verbatim the original copyrighted works. Pirated books, of course, are a separate issue. They are unlawfully acquired! We can’t steal a book from a store, regardless of what we planned to do with it.

Crawling the web is also a unique problem that is out of scope for this decision. If someone writes on the web and makes that web page freely available, hoping that people will read it, downloading that web page is not the same thing as pirating a book. It’s a gray area in copyright that could be made clear if everyone used something like the proposed no-training Creative Commons license.

People who are deeply concerned that all AI training is theft will likely be disappointed with this decision. But the issue is so complicated, it makes sense that there will be layers to it. Some actions are theft, such as pirating books. Other actions to train AI may be fine, such as purchasing books or licensing web content that has been otherwise excluded from training. I guess the courts will continue to sort out the less obvious questions in the middle.

2025-06-25 00:13:22

Working on some more iOS improvements, currently waiting for Apple to review the beta. 🙄 Automated builds via Xcode Cloud are still working well. I mentioned on the special episode of Core Int (🤯) that builds are slow-ish. To be specific, took 16 minutes today. It’s fine.

2025-06-24 23:48:34

Stephen Hackett reminiscing on the Aqua introduction from 2000 and what we’ve lost without live demos:

This all makes me miss live keynotes. I know Apple likes the control it has over pre-recorded introductions, but its announcements deserve live demos, off-the-cuff remarks, and the humanity that was once more prevalent at things like WWDC or iPhone introductions.

2025-06-24 23:16:47

We know a few new things about the OpenAI / Jony Ive partnership, because of leaks and the iyO lawsuit. I’m skeptical of a screen-free device that is not a wearable. Maybe someone should break away from the rectangle form factor. Square screen, a few inches on each side, very good voice interface.

2025-06-24 02:26:58

This is a silly and inconsequential thing to rant about, but… What is the point of crushed ice? It melts too quickly. I understand it for Sonic and Chick-fil-a, but not for a coffee shop. 🤪