The latest spat of debates over AI & copyright law has focused either on whether they *are* copyright violations, or on whether the speaker would *like* them to be; less has been said about whether, for the purposes of the copyright system as a whole, they *should* be. Many artists believe copyright should secure them a living, or police a perceived moral right to control all possible uses of their work forever, or that the harder they work the more valuable their outputs must be, or at least ensure that should some entity somewhere be better off because of it, that the copyright owner be empowered to dip their fingers into that entity's wallet. However, copyright is not any sort of transcendental human right instituted by God or the United Nations, but a pragmatic legal gimmick invented a few centuries ago for narrow purposes: first for the purpose of state/church censorship of the public, then as an indirect state subsidy for research & creation, where the government infringes on the freedoms of every person to enforce rents paid to an IP owner. The US federal Constitution, often vague or unclear, is admirably clear in the purpose of US copyright: it is explicitly and solely limited to the second economic purpose. To quote the [Copyright Clause](https://en.wikipedia.org/wiki/Copyright_Clause) in its entirety: > \[the United States Congress shall have power\] To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries. So, from the perspective of US copyright, the only question about how a copyright regime should work is the clear (but not easily-answered) question: does it "promote the progress of Science and useful Arts"? Other copyright systems may introduce concepts like [moral rights](https://en.wikipedia.org/wiki/Moral_rights) or religion or 'social harmony' as part of their purpose, but not the US one---it is about promoting progress. Period. If someone complains about GPT-3 being trained on scraped web text without permission of copyright owners, the question is not whether this is 'immoral' (whosoever is defining that), but whether creating GPT-3 that way hinders the progress of science or useful arts, and whether their progress would be accelerated if GPT-3's creators had to, say, pay \$1 per token or get pre-emptive permission from everyone on the Internet. Likewise, Stable Diffusion---no matter how many artists lose a commission to a competing Stable Diffusion, that is *not what copyright is for*, and the question is, did a machine doing that commission advance science & useful arts more than paying an artist 1,000× more to do it? These are not easy questions to answer, but they are rarely asked in these discussions, and asking them would clarify things---people are always apt to confuse what they would like with what is or should be legal, and make claims about the law which are not even wrong. So, with that in mind: do generative models promote, or demote, the progress of science and useful arts? First, their effects before release: the effect of generative models in the present on the past has been nil. Outside a few circles of ML enthusiasts, there was no expectation worldwide that generative models would so abruptly become so good. (This is why the release of Stable Diffusion & GPT-3.5 in late 2022 caused such shockwaves.) As one of those enthusiasts, I thought people were foolish for not anticipating photorealistic image generation ~2021--2023, increasingly human-level text-generation/programming, and soon, video---but this shortsightedness has the silver lining that since no one expected it, they couldn't've refused to progress for reasons like expecting to not get paid or being morally outraged. So there could have been no disincentivizing in the past: everyone who wrote something online, or released [FLOSS](https://en.wikipedia.org/wiki/FLOSS) software, or posted a drawing, had adequate incentive to do so, because they did so. As their effect at release, they obviously promote 'science', both in the narrow current form and the older broad meaning of 'knowledge': generative models are already highly scientifically useful, and we have learned an extraordinary amount of fascinating things from generative models. They have revolutionized deep learning, and AI, and are showing up everywhere from psychology & philosophy of mind to particle physics to biology. They have been just as influential on the useful arts, like coding or writing. For the most part, they have not demoted either area, not even in the sense of disemployment---whatever the future technological unemployment effects may be of more advanced & comprehensive AI systems, the current systems largely remain complements to human usage. Aside from some commission artists, the clearcut cases of technological unemployment thus far remain niches, and often ones of little social value. (For example, academic ghostwriting for cheating students: the disemployed ghostwriters are hardly sympathetic as the hard work in faking homework demotes progress by rendering credentials meaningless, and if we are concerned about any demoting of progress, it's because the generative models *increase* cheating by making it so much cheaper & easier. Likewise, while generative models wreak havoc on pornographic artists, who now compete against floods of generated images, one hesitates to say this is either promoting or demoting: the nature of pornography is to be as ephemeral as a Kleenex, and, this daily need satisfied, the end-consumer benefits minimally from there being 100,000 pieces of pornography rather than 10,000, and it is irrelevant who provides them.) What about *future* disincentives? Artists complain about AIs being able to imitate 'styles' and moot the idea of somehow being able to copyright 'styles' and extract royalties from any work which looks vaguely like theirs forever. Practical issues aside, this is not a clear case for copyright: the point of copyright is not to ensure them sinecures, but to create progress, particularly by competition, and for that progress to then become universally accessible to maximize gains. An artist losing a sale to a rival who paints a similar but better painting is not a problem, but competition at work to spur the artist to paint progressively better; only if the rival copies a superior painting entirely (and can undercut the superior painting without being able to create it) is there demotion as the superior painter hangs up their brush. (This was reinforced by the recent Prince copyright case on derivative works & transformativeness: the transformed painting could be, and was in fact, paid for by customers as an exact substitute & alternative to the original photograph, thereby disincentivizing the original photographer.) This is why the clause specifies "*limited*" times, and implies a public domain, and why patents require publication: the entire point is to ensure that creators must keep creating and innovating and suffering competition after a certain limited time (originally, *very* limited, to just a few years), and cannot sit on their laurels. So, if a style is so widespread & famous, or can be so easily named & imitated, such that an AI can create it, then lack of protection is a feature and not a bug, as far as the copyright clause is concerned: the humans need to innovate a new style. Newspapers come to mind. Newspapers are incentivized to report by selling subscriptions & advertising; their reporting serves many important useful functions, and definitely promote the progress of science & arts. Generative models were irrelevant to them, and have not disincentivized any reporting in the past, and probably do not disincentivize articles right now: who would rather ask GPT-3 for its speculation, based on knowledge that cut off in 2019, about today's current events, than to go and read an actual newspaper article? LLMs are simply not a substitute for newspapers, and so do not demote. However, this changes when retrieval is added: if an AI can download a copy of the current newspaper, and write an up to date summary of the current news, then even without any direct quotes of the usual copyright-violating sort, this *can* substitute completely for reading the newspaper and thus a subscription or advertising. For the most part, the actual writing of a newspaper article is unimportant compared to the new facts inside it, and so a long summary can be a complete replacement. This sort of paraphrasing has long been an issue that online publishers have complained about, whether done by publishers themselves with a fig leaf of some added commentary or background, or done by fly-by-night content mills churning rewrites, and AIs, by automating content mills, could make it much worse. At scale, this could choke off newspaper revenues, and demote progress, and is thus a serious concern for how US copyright should deal with it. So unlike artistic styles, there's a concern here. How about books? Books are in much less danger. Books tend to be too long to be meaningfully summarized, as they are filled with details and often an experience in their own right. (A summary of _Hamlet_ hardly replaces the text of the play for anything but superficial uses like school reports.) And no one is going to try to ask an AI to print out, line by line, a book they want to read like _Harry Potter and the Philosopher's Stone_, even if it could do so accurately without silently veering into confabulation---slow, tedious, & expensive, that must be just about the worst way to read a good novel! AIs can endanger books as references, by looking up key passages and extracting key facts, thereby replacing the need to purchase or read the entire book. However, aside from the observation that the 'lost sales' here are similar to those lost due to libraries or lending books or other books quoting them and no one thinks all of that should be outlawed to boost book sales, there is a trilemma for the demotion claim: #. if a book can be replaced by a summary, then it could not have represented much progress at all (perhaps it should've been a blog post or newspaper article, or shouldn't've been written at all), and there is little loss; #. if the AI cannot replace the book by a summary because it contains so many relevant facts and the AI fails to adequately cover them all, then the original book is not replaced, and there is little loss #. and if the AI is good enough at extracting the relevant facts that the summary can in fact replace the book, then because those facts usually come from other works or the author's thoughts, then they ought to be good at looking them up in wherever the book got them from in the first place, and so do so in a superior way, across larger corpuses, on the fly, customized, etc, being used by other AIs; being replaceable by AI, then disincentivizing similar new books is not a big deal because AI will just do the same thing but better, and authors shifting to focusing on making use of the retrieved facts or generated thoughts to compete with AIs, thereby promoting progress. So it's not obvious from the perspective of promoting-progress that retrieval over large book corpuses is a bad thing after all.