Harmonies of Code and Ethics
Music AI and Ethical Dilemmas Surrounding Copyright
Generative Music AI, a subfield of Artificial Intelligence (AI) and Machine Learning (ML) is actively impacting music creation and music consumption, across the full spectrum of contemporary music culture, from industry to public domain to private listening. Potential future directions for Generative Music AI include the development of more sophisticated models that can understand and replicate the nuances of human musical expression, as well as the integration of AI into music production, music discovery and data-driven marketing. This short essay does not attempt to cover the whole field, but instead touches upon some of the research and technologies behind Music AI whilst highlighting the urgent debate around author rights. Let’s dive in.
The use of AI in music composition and production has gained significant traction in recent years. A major player in this field, for example, is a company called Suno, whose product allows any person ( or automated script ) to generate 2 minutes of high fidelity ‘original’ music from a model using a simple text prompt, such as ‘An orchestral theme for a romantic movie’. Et voila. To the undiscerning ear, you get exactly what you asked for. But on repeated listening, it’s hard to imagine how such a recording could ever be considered as a professional work of art. Nevertheless, progress moves very fast in this space, and it is only a matter of time until the results of generative music AI will be embedded in the modern soundscape.
So how does this ‘magic’ work? In a nutshell, the technological process to create a generative AI model, involves ‘training’ one or more ‘Neural Networks’, types of data structures loosely inspired by the structure and function of the human brain. They consist of millions of interconnected nodes, or “neurons,” that have processed and analysed data points to then become ‘biased’ or ‘weighted’ according to statistical attributes of the input data. They also learn from their own growing internal data structures which develop over what is termed “the epochs” of training. Training is an expensive process, which uses a huge amount of computation and therefore energy. The energy consumption and environmental impact of training AI models is something that remains highly controversial, and certainly some reports suggest a significant negative impact. In general, the training of models seems to be an opaque and unregulated engineering process.
In the context of Generative Music AI, networks get trained on large datasets of musical recordings and/or notated music, allowing them to learn the patterns and structures that underpin different musical styles and sounds. Researchers use complex data processing to extract and connect millions of musical parameters in such a way so that from any arbitrary input — be it a text prompt, a snippet of sound or just a few notes — the neural network can extrapolate its training data and serve forth a result — a novel composition synthesised from its training.
The state of the art for music model training is largely based on the last few decades of research in the engineering fields of machine listening and audio feature extraction. Interestingly, given the knowledge domain, very little input has come from musicological research or musical analysis at this stage. The knowledge of composers, music researchers and music performers has generally not been required (so far) to get acceptable results, leading to a situation where artists have not been directly involved in the technical development of these models. Except that it is highly likely that their works have been involved indirectly — and in many cases, without the creators consent. Remember, the training process literally involves feeding the models’ as much human made music as fast as possible. The bigger the dataset used for training, the more capable is the model.
The neural network once trained is ready for public consumption and continuous use. It is now a discrete data structure. A model becomes a piece of digital property — it can be posted online, traded, fine-tuned, stolen, used to train other models etc. It is important to understand here that a model does not contain the actual audio files that it was trained on, only the interconnected statistical results of deep analysis of ‘scraped’ source material. This is where things get murky enough in relation to copyright, that certain tech companies , like OpenAI and StabilityAI and others decided a long time ago that the models were separate entities to the cultural artefacts they were trained on. And therefore, the outputs they generate are original, without provenance and can be attributed to whoever prompted the model.
Critics have described this mass un-consensual ‘scraping’ of copyrighted image, music and text by these now immensely wealthy companies, as “the biggest IP heist of the century”. The creators behind the training data don’t get a say, and they don’t get paid. It is unfathomable how these companies paid no mind to already existing licensing arrangements in the first place. For example, Antonio Rodriguez, one of Suno’s earliest investors is quoted as saying he invested with the full knowledge that music labels and publishers could sue. At least, as this statement shows, these companies know what they did and know that they are on the hook, as the models themselves can be prompted to generate copyright infringing content quite easily. They hope to buy themselves out of court with the profits they can make off the back of rights exploitation. It will be interesting to see how they will fair in reality against the many law suits lining up to claim copyright infringement. Let’s hope that someone remembers the under-represented artists and ethnic cultures whose fundamental right to consent has been clearly violated.
Just because images and music can be consumed online in an apparently free way, it does not mean that such creative work is automatically free and in the public domain. This has never been the case. Why aren’t we seeing these Generative AI companies facing the music? Is it because justice and policy is moving too slowly? Hard to tell. What is clear though, is that the growing availability of powerful computational resources, massive investment in the sector and the continuing practice of releasing un-legislated commercial applications built from unregulated AI research, means that the human activity of music composition, production and discovery is becoming increasingly automated and is under threat. Creators are being dis-enfranchised and their authorship diminished. Music discovery is shaped by algorithms and AI marketing. This is the situation right now.
Interestingly, in contrast to the disruption happening in the visual arts, cinema, photography, illustration etc — things have been developing a little slower in regards to music AI. Perhaps things in Music AI are just that little bit more ethical as regards provenance. Careers in music were already highly impacted by the development of various new technologies — home recording, the digital sampler, the mp3 and the monster it brought forth; the brutal regime of the music streaming giants. Perhaps this painful trajectory is why we are seeing opportunities for transparency and advocacy rising from music technology focussed AI startups.
One example of an attempt at transparency and equity for creators is a company called ‘NeuTone’ formed by a group of young researchers in Tokyo and Edinburgh, building on state of the art open research from IRCAM, MTG at UPF and more. They create tools which empower composers with easy to use AI technology in the studio, in the form of a plugin that can host trained models directly inside your music software. Some NeuTone models have been trained by composers and sound designers, and try to adhere to existing open attribution licensing, such as Creative Commons, GPL or MIT licenses. Their plugins offer the sound designer or composer creative opportunities to explore sounds that simply were not possible without Generative AI and trained models. For example, what might a cello solo sound like when passed into a model trained on — for example — whale song recordings or perhaps, recordings of traditional Bulgarian women’s choir. The model tries to ‘sing’ from what it knows. Arguably, these too are forms of cultural appropriation, yet — undeniably — they are new and interesting sounds. Another example of an opportunity for advocacy in music AI, comes from the high-profile resignation last year by Ed Newton-Rex, formerly head of music AI at Stability. After speaking out to the press on the un-sustainable ethics of the company’s stance on collecting training data, he went on to found a startup called Fairly Trained, pioneering certification of fair training data use in Generative AI. A kind of ‘ethical’ certification such as those seen in food and agriculture. We have yet to see how such initiatives could have any impact against tech titans such as Microsoft and OpenAI, who are heavily invested in manufacturing commercial products trained on creators’ work, used without their consent. Let us hope that with more public awareness, some initiatives for fair use and fair practice could get more traction.
As Generative Music AI continues to advance, it is likely to have a significant impact on the music industry — but much of it has the potential of being negative and harmful. Not just to creators through rights theft, but to listeners through homogenisation and cloning. Although there are many useful AI tools which provide a new set of possibilities for innovative sound works, they will be outshone by the companies who seem to be intent on automating the crafts of composition, recording and sound design. Legislation and mitigation against the potential replacement of human sound artists, composers and performers, is urgent and necessary to ensure a sustainable and equitable future for music composers and performers, past, present and future. I believe that a healthy eco-system is possible, where we can all enjoy the new creative technologies in our pursuits in a fair and safe way.