Why AI Might Actually Be Useful for Music Creators

After an initial flush of panic at the launch of UDIO a few weeks ago, I’ve since collected a few thoughts about AI generated music and what it might mean for music creators generally. This is not a thoroughly reasoned argument but a collection of fragments, so please excuse the rough edges and sharp corners.

The emergence of UDIO raises lots of questions about technology and the nature of creativity. While the actual results are awe-inspiring from the point of view of verisimilitude – especially vocals – the actual content reflects general tendencies in pop culture over the last 20 years. These tendencies include a generalised flattening of the aesthetic landscape. What is this flatness and how does it relate to machine learning and culture in general?

There is a tendency in any art form towards homeostasis. At the beginning, when there are fewer exponents of a craft or movement (leaving aside singletons and outliers), the features of individual works will tend towards heterogeneity, while the rules, language and syntax are being worked out. This tendency emerges through the hubs of a particular school or outlook, the originators and drivers of style. Historically, if there has been a master/student relationship in the hub (by hub I mean places or collectives where styles and techniques emerge – Michelangelo’s workshop, Waterford crystal, Goa trance, the First Viennese school of composers and so on) then we would expect students to regularly reflect the style and thought of the masters until such time as a break occurs. Such breaks are to be expected; the student seeks to individuate from the master just as the child seeks to individuate from the parent. Such breaks are not universal but they are common. Equally common – maybe even more so – is the student who absorbs and expands on the lessons of the master. Still more common than these first two cases is the student who reorganises the material of the master, adding little more than marginal glosses to the master’s work. (Please note – before I am accused of colonial non-intersectionality, I use master here in the sense of originator, in a non-gendered and non-representational sense).

In music, a great example of spectacular individuation is Claude-Achille Debussy. As a young man, Debussy came under the spell of Richard Wagner, a spell that is clearly evident in Debussy’s critical writings, if less so in his published musical output. By the time he starts publishing his own musical works, the influence of Wagner has already been totally absorbed and rejected, resulting in the creation of one of the greatest individual voices in all of music. Nothing could be further from the gigantism of the Ring cycle than the Afternoon of the Faun, and yet one could argue that it was this flight from Wagner that propelled Debussy to the opposite ends of the mythic world, a pre-Hellenic glade not even soiled yet by gods, let alone humans.

The great historical example of the second type of relationship – where a student elaborates and expands on the work of the master – is Mozart’s relationship to Haydn. Nearly all of the rules, language, syntax and procedures in the music of Mozart are already locatable in the music of Haydn. Yet there is a broadening of the expressive palette in Mozart, even when the constraints on musical syntax were so asphyxiating. It was not until the following generation, with Beethoven, that a true expressive individuation and rebellion in the first Viennese school would be established.

What does all of this have to do with machine learning? When a hub is successful in some measurable way – by its influence, spread, income, popularity and so on – it inevitably spawns imitators, charlatans, spivs, shills, opportunists and enthusiasts of all kinds. This is equally demonstrable in hip-hop, rock, shoe-gaze, Goa trance and advertising music of the 1970s as it is in the music of both Viennese schools.

It could be argued that it is the later generations of imitators that shore up the reputations of the originators, irrespective of any true artistic merit in either the original or the copy. Much of Mozart’s music is banal, emotionally thin, flippant and superficial, but it nonetheless enjoyed success and was heavily imitated. What was true of Mozart is deeply true of most 21st-century pop music. This is partly due to the tendency towards homogeneity mentioned earlier. Styles are like gases – the molecules (representing the elements of the style) bounce around until they reach a uniform average state. Even if the early phases of the style have little or no intrinsic merit, a uniform gas of even distribution is the result once the process of imitation has been exhausted. Machine learning engineers create models for the summing and averaging of styles, without having the faintest idea of the merit of the original hub, let alone the merits of the greater average gas that represents the genre. As far as the raw materials of music are concerned, machine learning may well find combinations of melodies and harmonies and rhythms that humans find irresistible. Even then, the quality that makes something irresistible is not also the quality that makes it great: heroin and sugar are two examples that come to mind.

So AI is here, generating gas. And yet there are so many influences in the creation of interesting music that resist explanation. Brian Eno’s collaboration with Robert Wyatt and Rhett Davies, a work for piano (1/1) that appears on Music for Airports, is slow, repetitive and non-narrative; it is a perfect candidate as fodder for machine learning. Yet there exists in this music a feeling that is almost impossible to put into words. Based on Eno’s own exegesis, the feeling that I experience as a listener contradicts the composers’ original intentions. For me, the slightly out of tune piano generates a strongly nostalgic feeling, a sort of inverse honky-tonk, devoid of people, ambition, romance, lust or colour, yet it creates a definite emotional space.

AI could well arrive at the same kind of space, but of course it would be pointless. Much of the semiotic meaning of music is generated by the performative context of that music, even when algorithms generate the music: the Serialist algorithms of the second Viennese school are contextually driven and defined by a series of particular historical moments, such as the abyssal shadows cast by the Great War.

Music arises from need. Music for Airports suggests a need for space and contemplative dilation. Tin Pan Alley met the need for simple, catchy and entertaining tunes informed by the way of life in New York City at the financially depressed end of the 19th century. Liturgical music responds to religious need. What need does AI music respond to? Is there an algorithm to describe a homogeneous gas that is (importantly) always a retrospective sum of gestures? AI will never understand the flavour of Brian Eno’s slightly detuned piano, and how flavour relates to music for billions of individuals. More importantly, AI will never be able to predict the flavour and colour of music as it might be expressed by an individual tomorrow. AI will always be looking backwards for its vocabulary of gestures. Even if it could combine all of the possible gestures available in order to create new combinations, with the predictive ability to discern what humans might find pleasing, this predictability can only be based on pre-existing musical models. In order to create genuinely new models, AI would also need to create new instruments, new distribution networks, new recording technologies, new relational modes, and new societies.

AI I will be useful for its capabilities around imitation and verisimilitude. It will be able to write endless “me too” jingles and replace generic library music with generative generic library music. It might even generate some startling accidental stylistic tropes, just as an infinite zoo of monkeys bashing away at word processors for infinite duration will eventually write Hamlet. In this way, AI music could therefore play a role in generating music for utilitarian needs, such as advertising and library music, applications that do not rely on creativity as much as an ability to replicate previously successful models, in the utilitarian sense. This could free up creators who are concerned with more nuanced expressive domains. Here AI will be useful in performing repetitive tasks that can be expressed algorithmically.

These considerations do not take into account the role of the sub or un conscious in culture. Most of the foregoing comments describe surface features of culture – end products. If this small essay were to have included intentionality, drives, the unconscious, and the symbolic structure of society, then the complexities would increase exponentially.  AI can only ever be concerned with the end products – the perceptible material outputs from which models can be derived. A demographer or demagogue might argue that human behaviour can be predicted. This, as we say in Australia, is a furphy, a convincing sounding fact that seems solid but is ultimately hollow. Simple behaviours – choosing one washing powder (or political leader) over another – can be predicted. Just as a large population’s preference for one musical modulation over another might find its way into an AI model. But can we predict how a symphony will end based on the first few notes as written by the composer? Maybe we could predict something about the general shape. But the details? The notes themselves? The narrative thrust, the flavours, the amount of dust in the furthest corner of the furthest room in that symphony? I believe neurological and philosophical proponents of determinism would be unable to argue convincingly against free will if the examples they chose were choices between uncountable alternatives instead of the typical binary choices they commonly deploy – chocolate versus strawberry, for example.

So we arrive back at the generalised equilibrium of the demographic gas of a society. Ultimately, I believe this is why our era is so depressed: our choices in all things are more and more corralled by algorithms based on the generalised gas models of society. I expressed a similar frustration over 20 years ago in an essay that was also focused on the role of machines in our cultural life. That essay was called Frankenstein’s Piano. In that essay I wrote:

One human dream is to become a machine; the only dream of the machine is to become human.

Shaun Rigney May 2024