Excellent article and discussion. 👍🏽 I am hoping the parties on both sides can come up with a fair profit-sharing arrangement but that will take a lot of trust, which seems to be in short supply.
Thanks! I think trust goes out the window when we start suing each other, but even then, some sort of profit-sharing seems like a reasonable outcome here.
Canadian news outlets are currently suing OpenAI for copyright infringement in the amount of $20,000 per article. This could add up to billions in damages if they succeed.
When it comes to image copyrights, we work very closely with our legal dept. when we create some graphic that we use in a film production. If we wanted to make our own version of a real police badge, we have to prove that our version is significantly different. We have to show all of our work like a high school math problem. As creators, we take inspiration from other creators all the time. In fact, it's impossible to design in a vacuum. But we don't take inspiration from millions of sources.
I think this is the major shortfall of any legal challenge by artists. If Gen AI is stealing from everyone, how can a single or a small group of artists sue the Gen AI company? How can the courts determine which part of which artist's work was used in which part of a Gen AI creation? Even the AI experts don't know exactly how it works! It's an impossible situation where the loser is the artist.
Regarding "How can the courts determine which part of which artist's work was used": That's certainly part of the problem, and it may feel like the AI companies have 'hacked copyright' with their fancy 'learning' technology, carefully shrouded in secrecy. But what some of the lawsuits are arguing is that it is illegal by itself to collect copyrighted works for training purposes. If the courts agree, this would introduce lots of red tape to the AI process, as illustrated by @Nneoma Grace Agwu-Okoro elsewhere in the comments.
I'm curious though: when you create graphics for a film production, is AI image generation an acceptable tool? And if OpenAI wins all of these court cases, making AI art 'totally legal' (there's a better legal term, probably), would that make a difference for the art department?
The short answer is it depends. Graphics that we put on the screen is evaluated on a gradient of highly featured as in an ECU to deep, deep, deep background where it might be just a blurry image. Different Studios have different appetites for risk. Not only that but not all training data used in LLM are created equally. The largest stock library in the world iStock offers AI images that supposedly were trained on their licensed data.
I think the cases against OpenAI and other AI companies could go either way because whether we like it or not, generative AI is here to stay. Its pace of development is a whole other issue that will be determined by the outcome of the cases.
Now onto copyright infringements and my opinion;
- The creatives can argue that these AI companies should pay them for using their work because they're well-funded for-profit companies, but think about the logistics of that for a minute. Who will they pay and who will be left out? What's the method for calculating fees to be paid to each creative? How will one work be valued over another (because IP is qualitative, not quantitative)? and other questions like these
- The AI companies can argue for fair use, transformative use, or even "research" (as someone in the comments pointed out), but this defense has a 50/50 chance of success because fair use isn't a globally accepted defense to copyright infringement, it might not be accepted as a defense to every claim in the suit, and it may be accepted as a defense because the AI companies can prove "transformative use" whatever that truly means
- Legal departments can argue that AI companies should meticulously note and report in a public registry every copyrighted work being used as training data for its generative AI systems, that way there is complete transparency in the training process, making it as highly regulated as data privacy has now become. whether or not this will please the creatives is still to be seen.
- Legal departments can also argue that these AI companies should use the principle of first sale, where they knowingly buy a copy of a copyrighted work and automatically receive the right to sell, display, or otherwise dispose of that particular copy, notwithstanding the interests of the copyright owner (just like movie and music streamers did at the birth of the streaming industry). But like the first point, the issue lies in the logistics of it. Unlike the music or movie industries, there aren't unified bodies of copyright owners worldwide on the many subject matters and copyright-protected categories that exist today. Asking the AI companies to do this might prove to be a herculean task.
These are just my opinions. I am paying rapt attention to these cases as they will have a global effect and I am interested to see which way the courts go.
Thanks, Nneoma! I feel like you just wrote the follow-up article I wish I could write.
The logistics of copyright holders' compensation will be complicated. But I don't know if that's a good argument against compensation. AI critics will say that it's simply the consequence of the AI companies' ruthlessness in stealing everyone's content. (Just like how a criminal can be ordered to compensate their victim, but cannot possibly compensate everyone fairly after having robbed a million people.)
I remember Reddit coming under fire for selling their users' collective data to Google for AI training purposes. Reddit reportedly got $60 million, none of which went to its users. Maybe the real problem is not logistics, but power balance?
Finally, I am intrigued by the "principle of first sale" you mentioned. If I understand correctly, this doctrine applies to physical copies (books, movies, CDs), but not digital copies or licenses. That would mean that OpenAI's (presumed) digital scraping is off the table, but something like Google Books has stronger legal footing. How interesting!
@ PQ.RUBIN Great Overview. Sam Altman just donated recently millions to Trump inauguration Fund, my thoughts about how this lawsuit will go through since Trump will be in the white house soon. I think everybody knows and Sam Altman is on suck-up mode right now, all you have to do is just listening to him. @bruce landay made an interesting point and i do believe that Open AI or other AI companies should pay creators to train their models. I don't think they lack the money for the billions they will be making in the next coming years.
Sam Altman handing out donations (some would say bribes) to politicians is unfortunately not exceptional. Even if it doesn’t get him out of these lawsuits, it ensures favorable legislation in the long run. This dynamic should be interesting, given that there are also significant financial interests on the other side. Let's not forget: copyright law itself is often seen as a product of corporate interests.
This is a very good overview. My thoughts on this are that we're in for a protracted legal battle. While AI companies may plead "fair use", content creators still have rights protected by law. But if the AI companies can prove that they generate content through "researching" their data trove, and then refining that into "original" content, then that could be a different angle altogether.
One thing I mostly left out (because I have trouble understanding it) is "transformative use", which is apparently a type of fair use. When you write "researching", are you referring to the same thing, or is "research" a distinct type of fair use?
I was using “research” as a distinct type of fair use; however, I think the proper term would be transformative use. If the AI companies can prove that their platforms add a unique purpose to the original work, they just might get away with it. How they would do that though is another issue.
As an author I would like to see the AI organizations lose the lawsuits and pay for training their models. These are well funded for profit companies and they need to take responsibility for the data they are vacuuming up.
Thanks, Bruce! Surely, OpenAI has the funds to compensate all parties involved and continue its highly lucrative business. However, if a loss in court means no more scraping without permission, that sounds to me like a real problem competition-wise.
But yes, it's been a while, and the only reason I brought up Napster is because I couldn't think of a more recent case where a tech company got in legal trouble and ended operations because of it.
The reason I think it is because these AI machines write stories. Writing is an act of creation which is where something emerges from nothing. AI cannot do that which means its stories already existed in some form.
This is a profound issue that goes beyond copyright infringement. It may be more accurate to say that this is a good reason the AI companies *should* lose.
Good response Robert but I look at it like this (just my 2 cents): AI has a much bigger combination of words in it's memory than a "creator" does....and it understands how to combine these words at faster speed and varying combinations than a "creator" So if i tell it to write me a story on Robert switching off the light and running into an alien in his garage....that story was created by AI...not me or whoever made the sentence switch off the light plus a story of running into an alien...it is absolutely brand new....however....when I was responding to you....I was more looking at the angle of AI organizations (with major big bucks) going against old laws that did not fathom the number of ways AI can help us today and even more in the future as it learns more "words"
I'm curious how you as a photographer look at unauthorized scraping. You make the comparison: "If I take a photography student into a museum...", but there are some important differences from the artists's perspective: In a museum, you get control over the way your art is displayed, you get to set the admission price and other conditions: 'no pictures, just buy a reproduction in the museum shop!'.
Scrapers work differently. To stick to the comparison: they break into all museums at night, they copy everything they can find, they remove name tags and signatures, and then they feed all of their copies into their magical machine that makes them more money than any photographer has ever made. Don't you think artists need some form of protection, or compensation at the very least?
I see where you're coming from, and how the AI's automated process can feel like a creative process. But I still think there's a difference between a student's homage and a machine's mass scraping. Maybe we can agree that AI's "inspiration" is a bit more... industrial-scale than the average art student's?
Excellent article and discussion. 👍🏽 I am hoping the parties on both sides can come up with a fair profit-sharing arrangement but that will take a lot of trust, which seems to be in short supply.
Thanks! I think trust goes out the window when we start suing each other, but even then, some sort of profit-sharing seems like a reasonable outcome here.
Canadian news outlets are currently suing OpenAI for copyright infringement in the amount of $20,000 per article. This could add up to billions in damages if they succeed.
When it comes to image copyrights, we work very closely with our legal dept. when we create some graphic that we use in a film production. If we wanted to make our own version of a real police badge, we have to prove that our version is significantly different. We have to show all of our work like a high school math problem. As creators, we take inspiration from other creators all the time. In fact, it's impossible to design in a vacuum. But we don't take inspiration from millions of sources.
I think this is the major shortfall of any legal challenge by artists. If Gen AI is stealing from everyone, how can a single or a small group of artists sue the Gen AI company? How can the courts determine which part of which artist's work was used in which part of a Gen AI creation? Even the AI experts don't know exactly how it works! It's an impossible situation where the loser is the artist.
Regarding "How can the courts determine which part of which artist's work was used": That's certainly part of the problem, and it may feel like the AI companies have 'hacked copyright' with their fancy 'learning' technology, carefully shrouded in secrecy. But what some of the lawsuits are arguing is that it is illegal by itself to collect copyrighted works for training purposes. If the courts agree, this would introduce lots of red tape to the AI process, as illustrated by @Nneoma Grace Agwu-Okoro elsewhere in the comments.
I'm curious though: when you create graphics for a film production, is AI image generation an acceptable tool? And if OpenAI wins all of these court cases, making AI art 'totally legal' (there's a better legal term, probably), would that make a difference for the art department?
The short answer is it depends. Graphics that we put on the screen is evaluated on a gradient of highly featured as in an ECU to deep, deep, deep background where it might be just a blurry image. Different Studios have different appetites for risk. Not only that but not all training data used in LLM are created equally. The largest stock library in the world iStock offers AI images that supposedly were trained on their licensed data.
Ah, of course! It's easy to forget there are in fact licensed alternatives.
(Also, I definitely knew ECU means Extreme Close-Up. Just like everyone reading this.)
LOL Nice catch! I often forget that not everyone knows Filmspeak.
I think the cases against OpenAI and other AI companies could go either way because whether we like it or not, generative AI is here to stay. Its pace of development is a whole other issue that will be determined by the outcome of the cases.
Now onto copyright infringements and my opinion;
- The creatives can argue that these AI companies should pay them for using their work because they're well-funded for-profit companies, but think about the logistics of that for a minute. Who will they pay and who will be left out? What's the method for calculating fees to be paid to each creative? How will one work be valued over another (because IP is qualitative, not quantitative)? and other questions like these
- The AI companies can argue for fair use, transformative use, or even "research" (as someone in the comments pointed out), but this defense has a 50/50 chance of success because fair use isn't a globally accepted defense to copyright infringement, it might not be accepted as a defense to every claim in the suit, and it may be accepted as a defense because the AI companies can prove "transformative use" whatever that truly means
- Legal departments can argue that AI companies should meticulously note and report in a public registry every copyrighted work being used as training data for its generative AI systems, that way there is complete transparency in the training process, making it as highly regulated as data privacy has now become. whether or not this will please the creatives is still to be seen.
- Legal departments can also argue that these AI companies should use the principle of first sale, where they knowingly buy a copy of a copyrighted work and automatically receive the right to sell, display, or otherwise dispose of that particular copy, notwithstanding the interests of the copyright owner (just like movie and music streamers did at the birth of the streaming industry). But like the first point, the issue lies in the logistics of it. Unlike the music or movie industries, there aren't unified bodies of copyright owners worldwide on the many subject matters and copyright-protected categories that exist today. Asking the AI companies to do this might prove to be a herculean task.
These are just my opinions. I am paying rapt attention to these cases as they will have a global effect and I am interested to see which way the courts go.
Insightful article as always @P.Q. Rubin
Thanks, Nneoma! I feel like you just wrote the follow-up article I wish I could write.
The logistics of copyright holders' compensation will be complicated. But I don't know if that's a good argument against compensation. AI critics will say that it's simply the consequence of the AI companies' ruthlessness in stealing everyone's content. (Just like how a criminal can be ordered to compensate their victim, but cannot possibly compensate everyone fairly after having robbed a million people.)
I remember Reddit coming under fire for selling their users' collective data to Google for AI training purposes. Reddit reportedly got $60 million, none of which went to its users. Maybe the real problem is not logistics, but power balance?
Finally, I am intrigued by the "principle of first sale" you mentioned. If I understand correctly, this doctrine applies to physical copies (books, movies, CDs), but not digital copies or licenses. That would mean that OpenAI's (presumed) digital scraping is off the table, but something like Google Books has stronger legal footing. How interesting!
@ PQ.RUBIN Great Overview. Sam Altman just donated recently millions to Trump inauguration Fund, my thoughts about how this lawsuit will go through since Trump will be in the white house soon. I think everybody knows and Sam Altman is on suck-up mode right now, all you have to do is just listening to him. @bruce landay made an interesting point and i do believe that Open AI or other AI companies should pay creators to train their models. I don't think they lack the money for the billions they will be making in the next coming years.
Sam Altman handing out donations (some would say bribes) to politicians is unfortunately not exceptional. Even if it doesn’t get him out of these lawsuits, it ensures favorable legislation in the long run. This dynamic should be interesting, given that there are also significant financial interests on the other side. Let's not forget: copyright law itself is often seen as a product of corporate interests.
This is a very good overview. My thoughts on this are that we're in for a protracted legal battle. While AI companies may plead "fair use", content creators still have rights protected by law. But if the AI companies can prove that they generate content through "researching" their data trove, and then refining that into "original" content, then that could be a different angle altogether.
Thank you!
One thing I mostly left out (because I have trouble understanding it) is "transformative use", which is apparently a type of fair use. When you write "researching", are you referring to the same thing, or is "research" a distinct type of fair use?
I was using “research” as a distinct type of fair use; however, I think the proper term would be transformative use. If the AI companies can prove that their platforms add a unique purpose to the original work, they just might get away with it. How they would do that though is another issue.
I see, thanks!
As an author I would like to see the AI organizations lose the lawsuits and pay for training their models. These are well funded for profit companies and they need to take responsibility for the data they are vacuuming up.
Thanks for a good overview!
Thanks, Bruce! Surely, OpenAI has the funds to compensate all parties involved and continue its highly lucrative business. However, if a loss in court means no more scraping without permission, that sounds to me like a real problem competition-wise.
I had totally forgotten about Napster
That's exactly how the record industry likes it 😉
But yes, it's been a while, and the only reason I brought up Napster is because I couldn't think of a more recent case where a tech company got in legal trouble and ended operations because of it.
My instinct is that the AI organizations will lose.
Really? I think it's the opposite.
The reason I think it is because these AI machines write stories. Writing is an act of creation which is where something emerges from nothing. AI cannot do that which means its stories already existed in some form.
This is a profound issue that goes beyond copyright infringement. It may be more accurate to say that this is a good reason the AI companies *should* lose.
Good response Robert but I look at it like this (just my 2 cents): AI has a much bigger combination of words in it's memory than a "creator" does....and it understands how to combine these words at faster speed and varying combinations than a "creator" So if i tell it to write me a story on Robert switching off the light and running into an alien in his garage....that story was created by AI...not me or whoever made the sentence switch off the light plus a story of running into an alien...it is absolutely brand new....however....when I was responding to you....I was more looking at the angle of AI organizations (with major big bucks) going against old laws that did not fathom the number of ways AI can help us today and even more in the future as it learns more "words"
Thanks Al, much appreciated!
I'm curious how you as a photographer look at unauthorized scraping. You make the comparison: "If I take a photography student into a museum...", but there are some important differences from the artists's perspective: In a museum, you get control over the way your art is displayed, you get to set the admission price and other conditions: 'no pictures, just buy a reproduction in the museum shop!'.
Scrapers work differently. To stick to the comparison: they break into all museums at night, they copy everything they can find, they remove name tags and signatures, and then they feed all of their copies into their magical machine that makes them more money than any photographer has ever made. Don't you think artists need some form of protection, or compensation at the very least?
I see where you're coming from, and how the AI's automated process can feel like a creative process. But I still think there's a difference between a student's homage and a machine's mass scraping. Maybe we can agree that AI's "inspiration" is a bit more... industrial-scale than the average art student's?