Licensing can be a dry topic, but itās what allows people to keep doing fun stuff, so hear me out. You might also amount what you read below to little more than a personal hot-take, and thatās fair. Iāve tried to allay those concerns with references to qualified sources though, in case you want something to back this up.
Iām also using the term āAIā in this post. While machine learning is the method of intelligence used by current art generators and language models, there is a difference between it and AI. Iād like to speak broadly however, to the ethics and morality of AI systems as a whole.
The Past
Art fraud, copyright issues, and plagiarism, have been happening for hundreds, if not thousands, of years. Recent examples can be seen with album cover disputes in the US and with imitation cars in China (autocar, Top Gear).
In 2013, photographer Richard Prince was found to have used photographs from other photographers in his own artworks without permission or proper credit, violating copyright law (justia.com). That same year, the estate of Marvin Gaye successfully sued Robin Thicke and Pharrell Williams for copyright infringement over their song āBlurred Linesā, which was found to have similarities to Gayeās 1977 song āGot to Give It Upā (archive.org). George Harrison was sued for plagiarism in 1991 over his song āMy Sweet Lordā which was found to have a striking similarity to the Chiffonsā earlier hit āHeās So Fineā (archive.org).
Just as water, gas, and electricity are brought into our houses from far off to satisfy our need in response to a minimal effort, so we shall be supplied with visual or auditory images, which will appear and disappear at a simple movement of the hand, hardly more than a sign.
- Paul Valery, French poet, circa 1900
French poet Paul Valery was quoted as saying that mechanical reproductions would bring media into the home around 100 years ago and Walter Benjaminās essay āThe Work of Art in the Age of Mechanical Reproductionā covers this in more detail. Perhaps not surprising given the author and the geopolitics of 1935, the essay is prefaced and concluded with commentary relating to both politics and technology. There is food for thought in both domains, but Iāll focus on the tech. Benjamin explored the impact of then-current technology on the way art is created, distributed, and consumed. He argued that the ability to reproduce art mechanically, through means like photography & film, led to a loss of the āaura,ā or the unique, original, and authentic qualities of a one-of-a-kind work of art. Benjamin examined how mechanical reproduction changed the nature of art from having a cult like status to enabling itās exhibiton, and that challenged itās traditional hierarchies. He covered how this change affected the way art is valued and perceived by audiences with political and social implications, and how it also led to a democratization of art. While the original essence of art could be lost, it seems Benjamin also thought that through the new medium a lot could also be gained. So what do I think makes AI-generated work any different? Well potentially, not a lot.
The Present
Tools like GPT-3, Stable Difussion, Dall-E 2, Midjourney, and others, have made generating text and imagery an easy as having an idea. Itās easy to see why these tools are popular as there are no more meetings, waiting for email responses, working with freelancers, or even paying them money - just type and go. The impact on copywriting, stock photography, and any sort of content production is immense. It will only increase as the tools improve.
It currently takes a large amount of resources, not only to train, but also to maintain the best models; GPT-3 is estimated to have cost $4.6M USD to train. The reality is that only a small number of companies even have, or can justify spending, the capital needed to run the models in the first place, let alone at scale. Microsoft has confirmed further investment in OpenAI, reported to be valued at $10B USD. The deal moves OpenAI further towards being āopenā in name only, as it will see Microsoft become the exclusive cloud provider for the company.
As I see it, the tools that hit the headlines are obscured behind the planet-scale resources needed to run them, the lawyers that come with companies of that size, and the black-box nature of the technology. Training datasets like Laion-5B use Creative Commons for their metadata, but leave the image copyright with the original owner. I think this simply passes the buck in terms of accountability and liability. Similarly, the CommonCrawl dataset contains petabytes of data scraped from the web. Thatās many millions of gigabytes of information! One could argue that authors, artists, and creators who donāt want their work stolen or reused shouldnāt post it online, but that ship sailed a long time ago. Whether someone creates for fun, community, or profit, I think itās fair to generalize that publishing on blogs, social media, and portfolios is simply too important for these outlets. This means that machine learning users may absorb the dataset without considering the consent of the image/content owners and ethical or legal issues that presents. I donāt want my own work to be stolen by humans or machines and existing laws around the world seem to have managed these issues well to date, whether they relate to trademarks, copyright, plagerism, or fair-use. Iām not a lawyer though, so Iām following the ongoing US cases against Stability AI (vs Getty, class-action) with great interest. The result of these may see similar cases follow around the world.
A recent video from an attorney with Corridor Crew presents my understanding well and why I think de-coupling is a useful way to think about how to use AI generation tools ethically.
To summarize (the US law):
-
Derivative Work: The models donāt directly contain source data, but the idea of them in a latent space. This means the generated files/text/content are made by interpolating beteen concepts without making direct use of source data. The issue here is that the latent space is exclusively derived from the copyrighted information used to train it. They are coupled. Legally, derivative works must be made only with a copyright owners permission.
-
Transformative Use: Work is considered transformative if it āalters an original work with new expression, meaning, or message or when it adds value to the original work by creating new information, aesthetics, insights, or understandings.ā - so the purpose of the work has changed. The input training data and output data are still inherently coupled, but legally (in the US) a transformative work does not need permission from the copyright owner. It seems this is the angle Stablity AI will take to defend themselves, and it may work.
-
Fair Use: (Columbia University) Copyrighted works can be used without author permissions if:
- The purpose of the work has changed significantly (itās transformative).
- The work is considered public or to have had itās first publication by the copyright owner.
- The amount of the original work used is no more than what is necessary for an objective.
- The effect on the potential market for or value of the work is low.
I think the transformative work argument and first three fair-use factors could make a strong defence, however and especially in the case of Getty Images stock photography, the effect on the market and value of work may be a sticking point for Stability AIās attorneys as consumers can completely avoid stock photography websites by using AI generation tools. My hope is that Stability AI lose the two cases. Not to the detriment of the tools, as they are technologically amazing and will develop helpful uses cases in the future, but so that a dangerous precedent is avoided. Large, dominant, companies could continue down the route of making and using AI tools with abandon, all in the pursuit of growth. Customers would use the tools because theyāre cheap, easily-accessible, and produce great results. Even an author, who resents their own prose influencing AI models, may produce illustrations for a new book with DALL-E because they canāt afford to compensate their human counterparts when the tools sit one URL away and everyone else is doing it anyway. Artists, authors, and creators will struggle more and more as the industry shifts rapidly around them. Creativity could become a fringe project, where publishing online is defacto acceptance of your work being stolen en-masse and at scale. Sure, I believe that no thought or work is 100% original, but I also believe that the hyper-mechanization of creativity is a hugely detrimental step-change to those who use the works, as well as the humans who make them. The value and nuance of the process is lost, along with the āauraā and context of the outputs.
Itās worth noting that the problem of exploitation and traceability exists in many forms across every industry, from sweatshops that make clothing & shoes to the factories that manufacture electronics and other goods. Where do the things we consume come from? The way the world globally manufactures, purchases, and consumes goods is problematic, and the artefacts of ML/AI systems are no more than a different type of good; A good that consumers should be made aware of, not to judge those who use them, but to help them inform their own decisions. To use the parlance seen with physical products: I think ethical consumption and supply-chain visibility can apply here too.
The Future
Scraping data is a fundamental part of how swathes of the internet work, including in search engines like Google, Yahoo, and Bing. As Jake Watson highlighted in the Corridor Crew video, search engine use is considered transformative because the purpose is so different from the original work and it provides a public good. By design, users are also lead to the source of information too, so attribution is provided. You can read more about the case in an EFF article. I disagree with Watsons statement that the AI tools donāt deprive the copyright owners of the ability to control and benefit from their original work. The owners do seem to retain all of the control they had before, however I think the benefits are lost. Original authors will not be able to market their work or sell original pieces & prints because a win for Stability AI in their cases would set a precedent for stealing the āauraā, emotion and value of work to create what can, with the right prompt, amount to something weād call forgery if it were done by a human.
For any given output (text, imagery, media, classification, regression, etc), it can be hard, if not impossible, to determine which sources make up parts of it and by how much they contributed. The same is true of work completed by real people. The issue I take is one of attribution and acknowledgement. An article for the Guardian explored this issue too, with the cat already being out of the bag in this regard, as the models have already been trained without artist consent. Itās possible to check if an image is already in a dataset, but this wonāt affect the models as they exist today.
The EU is proposing an important act for the safety of AI systems. As concisely explained by Lillian Edwards, Professor of Law, Innovation and Society at Newcastle University (explainer,recommendations), the AI Act would introduce controls for managing risk in different systems, which include requirements for transparency and provision of information to users. I think comparable measures should be implemented for copyright and intellectual property issues in AI systems too; it should be required for system consumers to see, at the point of use, which data and works (may have) influenced the output text/image they receive.
De-coupling Data
Transparancy, permissions, and licensing shouldnāt be necessary, in my opinion, when the weights within the model and the output it produces can be decoupled from the training data. That is to say, the training data can be considered fungible and if trained on completely different datasets, a model could still produce the same output for a series of tests. Itās value would remain the same. This would be applicable to classification and regression models, such as sentiment analysis and topic extraction, instead of generative tools like transformers.
RAIL-D
The good news is that work is already underway by people more qualified than me to make responsible AI licenses. The RAIL specification is being developed for four artifacts in AI systems: data, applications, models, and source code. It covers both open-source and limited use cases. The initiative hasnāt yet developed sample licenses for āD/dataā artifacts, so itās yet to be seen if they could be applied to individual works or sets of works by an artist. Iād like to see that happen though, with badges or transparancy marks, similar to a CE or Fairtrade logo, displayed at the point of use and linking to either licenses, when the training data is closed source, or to data sources when they are open.
Conclusion
I wonāt profess to be an expert on licensing, machine learning, or art history, but I work on software every day and occasionally create terrible drawings for the amusement of myself & friends. Pandoraās box has already been opened with existing tools and I think they are awesome, but need to be approached with caution. Tools like ChatGPT have finetuned the human element of text generation to be very convincing. It can help you overcome writers block, rephrase sentences, or give you ideas you wouldnāt otherwise have thought of. However, the human components and āauraā are still missing from AI generated images. I think theyāve crossed the line in terms of disrupting work with images, but arenāt reliable enough to create foolproof results every time. The pace of change is fast enough though that it wonāt be long before AI-generated images are indistinguishable from human works. For this reason, Iām looking towards what can be done in the near future. Iāll be keeping up with the cases against Stability AI and participating in the RAIL initiative in the mean time.
Update 29/01: LegalEagle published their own video which I think aligns with my opinions, but includes more explanation and US case law.