A Machine Learning Assisted Compositional Paradigm: 1. Introduction

I have been composing electroacoustic music since 1999. Before that I was a tonal composer writing for small ensembles, theatre and choir. In 2000 I was undertaking research at the University of Ulster on machine learning and data fusion. Later in 2004 during the SARC opening at Belfast I was sitting at a workshop conducted by Karlheinz Stockhausen, Curtis Roads and others at Belfast. This was a significant privilege given that, sadly, Stockhausen passed a few years after that. Among many fundamental gems I learned as I reflect back on that workshop, are two struck struke me as crude realty-checks that later would change my compositional practice forever.

Firstly, at that time I undertook limited reflective practice on my compositional practice. For instance, I would elaborate an idea and throw some samples together in a mix then, by using a gross conceptual plot, I would hurriedly write a piece based on samples and synth patches. I would just go back and check it twice using some form of aural criteria and perhaps make a number of modifications to the final composition. The result was a wide variety of outcome styles and quality.

In contrast, Both Roads’ and Stockhausen’s repertoires had so much cohesion. You could tell there was so much thought and rethinking happening. Every sonic granule of Roads’ pieces was delicately placed at specific times – you could feel the craft of music composition unfolding. Similarly Stockhausen’s early tape music has gripping conceptual coherence. These composers did not have fast computers that allowed them to manage splices of audio, or hundreds of sonic granules, the way we could today. It took them time, a significant period of time to craft these apparently disjointed audio atoms into aural masterpieces almost by hand.

Secondly, I did not have real boundaries in terms of computer music options. By 2003 MAX/MSP was significantly powerful and the market was getting inundated with software based samplers, synthesisers and all kinds of wonderful plugins. Ableton Live was very stable and the dream musique-concrete machine. Sitting in front of a computer could send me on a profound state of existential catatonia – a composer’s block that many still face today as computational power rises and diversity of music production tools increases exponentially.

This later challenge of music tools hyper-resourcing lasted only one to two years until I decided to become a circuit bender. I built a eurorack modular synthesiser over 4 years and gradually populated my studio with all manner of modified music machines ranging from lo-fi sampling based toys to flash-grip FM synths from the 80s and 90s – they all went trough shameless solder iron hacking that produced sonic machines with limited predictability, decent sonic limitations and manageable controllability.

The aforementioned challenge of non-reflective compositional frameworks is the main issue this dissertation attempts to solve. I started exploring ways to formalise how I would reflect and redefine my compositions. I thought long and hard about formalising my compositional process, to achieve a description which it could be reproduced by another composer, whilst also keeping a degree of freedom. Hence I propose a computer-assisted method for music composition in which the computer is used not as a bottomless music creation toolset but rather as a decision making tool. This introductory section plots the basics of this method. The language is purposely informal so I can explain to fellow music makers my ‘gospel’ of music creation during the past 5 years.

The title of this dissertation is: A Machine Learning Assisted Compositional Paradigm. It is appropriate to analyse such a title and to aim to meet its requirements. In other words, to actually ensure that this thesis does what it says on the cover. The first words in the title are ‘machine learning’, therefore I will now explain what machine learning hence entitles for the purpose of this dissertation. Further, we will have a more in-depth discussion about what machine learning is and the application of it into the compositional methods that will be described here.

Machine learning is just that, it aims at utilising a machine, most often these days a computer, to simulate learning. Learning is the capacity of a person, or animal, or a computer to analyse input and create abstractions of such input, and then produce an output which is, well, learned. In other words an output that has some form of application and that it will be useful or that will make sense of the data.

Machine learning can be broken down into several stages. The first stage is data acquisition. The second stage, is feature extraction. Then, with those features, a form of computation or algorithmic engine is achieved. Finally an output is produced by that algorithmic engine.

For example, if you are playing a sport, let’s say for argument’s sake baseball. The pitcher will throw a ball at you, and you will try to bat that ball that is coming towards you at a high speed. The input in this case is a number of visuals that you can actually produce in your brain. Before your brain does something with that data, a form of feature extraction needs to be accomplished. For instance that ball needs to be isolated from the background, otherwise there’s far too much information. Certain angles or movements of the pitcher are recorded as numbers in your brain and things like the speed of the ball, or the diameter change of the ball as it’s coming towards you are all computed too. At that point, your brain crunches those numbers, it computes an output. That output is: bat now! Once all that information is processed, your brain will tell your arms to swing and hopefully if your batting algorithm is very good, or if the pitcher is very forgiving, you will be at the position of scoring a home-run. With experience, your batting will improve, as you learn to better judge the flight of the ball and to control your swinging of the bat.

This process of learning can be partially achieved by a computer when considering very simple problems. For instance a computer can have sensors like cameras as data inputs. Also software can produce an extraction of certain features or useful numbers, that can be fed to a form of algorithmic implementation or a formula the machine uses to produce an output.

Another example is speech recognition. There are several software packages out there that can translate speech into text – i.e. do transcription. The computer will have a microphone. This microphone will record your speech: This is a data acquisition stage. Once your speech is recorded and converted into a series of ones and zeroes into an audio file, the computer will undertake feature extraction on those digitised sound waves. The feature extraction algorithms will calculate certain things like how many times a particular form of a wave crossed through the zero line, or a local number of peak amplitudes, or a number of spectral characterisics indicating how this voice looks in the frequency domain. The computer then will get all these simplified numbers, or features, and try to produce an output. In this case the output will be a dictated word that actually matches the sound ‘pencil’. The computer could detect the percussive sound of the P, and then the vowel sound E, and then N-C-I-L. These phonemes will somehow be translated into the written word. Then you will see in your speech recognition program how the word pencil is typed for you.

Following the term machine-learning in the title, there is another word – assisted. In the example that I just mentioned (speech recognition), we can see that the whole process of dictation via speech recognition to text is realised to completion. Then, what do I mean by the word assisted? The word assisted is trying to imply here that in this dissertation machine learning, will not be employed as a universal black-box solution for generating compositions but rather I will use a number of machine-learning approaches to inform my compositional practice.

Let’s consider a couple of examples. Remember that ‘data acquisition’ is part of machine-learning. Let’s say that we are about to write a particular piece for piano. No matter how hard composers may try, they are not hermits. Therefore  before a composer writes that piano piece, he/she has already gathered a lot of data through experience about existing music and about the piano. It all counts, memories, feelings, emotions and thoughts.

Feature extraction comes after that. You might write a melody in a dorian mode over a C major seventh chord.You have learned that the medieval mode Dorian is a scale; a series of 7 intervals that can be visualised as a pattern over a keyboard or a guitar. You used the Dorian mode, and perhaps also the concepts of excitement and harmonic fitness. You are using features rather than the raw sound data.

Hence, it make sense to suggest that meditating and analysing musical features, as encompassed by abstract concepts such as emotions or memories, may significantly inform compositional practice. It may sound a little complex at first glance but this idea will be greatly expanded on in this dissertation.

Back to the word assisted. The word assisted in the title implies support or self-reflection. I’m going to machine learning methodologies use machine learning methods as part of the compositional process in order to inform my self-reflective compositional practice with the aim to improve the music that I write.

The last word in the title is a word paradigm which implies that rather than actually proposing a specific methodology, the idea of this dissertation is to share compositional guidelines to provide others with a practical compositional approach others can also use. This paradigm is one I have been using during recent last years in order to reflect, tune, and improve my compositions.

Finally, it is also worth mentioning that the main type of machine learning method that will be used in this dissertation is called non-supervised machine learning. Hence it is worth briefly clarifying the difference between supervised and non-supervised learning. By doing so I hope to further explain machine learning by using simple examples.

Supervised learning aims to provide a definite output based on classes or a classification mechanism. Let’s say that you are for instance trying to differentiate magnolias from lilies. Then let’s pretend that a very gifted botanist tries to explain to you that, “Well, lilies will have a certain form of petals, certain form of pistils, and certain colours. The fragrance could be important too!.” After all these data are considered and after formulating some features from of each flower, you will be able to start a learning process with a bunch of different magnolias and a bunch of different lilies. After your learning phase, you will be prepared to sit the magnolias versus lilies test in which you will be shown a lily or a magnolia at random.
If you did your learning with a decent variety of magnolias and lilies, you will be able to confidently say: “This is a lily” without much thought, or “This is definitely a magnolia.” When a computer undergoes this process it is a special type of machine learning called pattern recognition. The answers are given. The classes are given. The learning is how to classify data based on features – it is not open ended, responses are true or false.

Unsupervised learning, in contrast, proposes that you could tackle data in a way in which classes are not so important. To illustrate, you might be shown a bunch of different flowers, let’s say 50 very different flowers. After looking at them for a while, you will produce some feature extraction out of that data; the shape of the flowers, how they smell, their colours, and the textures. Slowly, you will try to classify certain flowers that actually look similar to each other. Then you might end up with little bunches of flower that are very similar. This is called clustering. Here, you did not have a set of master classes from the start. You did not have a goal of flower classification. The botanist was not there to tell you “this is a class of flower of the name X”. You just gathered the data in a pot and you then plotted similarities between those different features. Here, you may have discovered from a pile of flowers the categories of roses, petunias and geraniums.

Another form of analysis is considered in this thesis as a compositional aid: sensitivity analysis. Sensitivity analysis is a well respected tool amongst the variety of machine-learning techniques. In this type of analysis, statistical techniques are used to go back and reengineer machine learning functions. In this case and considering our flower classification and clustering, you might learn that a particular feature such as colour was very important for you in order to differentiate a lily from a magnolia by investigating with sections of a machine learning function ‘worked harder’ at either classifying or clustering tasks.

Let’s put our flower example aside and consider music as the subject. Let’s say that you would like to compose a piece for a piano. Then let’s say that you would like to try to use this methodology of machine learning assisted compositional paradigms in order to better inform your work. As you sit down you compose a musical motive. Then you compose another motive . Then you might compose a number of chord progressions. By the end of composition process, you will end with a bunch of little snippets of music. For the sake of consistency, let’s call these musical motives, or just motives. Now, you want to put some structure into your composition. For instance, let’s say that you want to use a paradigm such as simple emotions: sad, happy, etc. Now, you make another decision your first movement will sound happy. Your second movement will sound sad. And your third movement will sound happy again. You want to compose in a ‘emotional’ sonata form. At this point you can go back and actually classify your themes. You listen to your themes again and label them as happy or sad. You come across a nice F major sequence. You think, “Oh, that sounds pretty happy.” Then you labele it ‘happy’. Then you find this progression of D minor 7th that goes into an F minor. Then you think, “All right, that actually sounds sad” – you get the point.

By the end of this classification process, you will have a number of different musical motives with their labels. Each theme will be part of a class – remember lilies versus magnolias?— in this case happy and sad. Then you can use those labeled themes and compose with them by simply populating the first movement with happy themes. The second movement with sad themes, and the last one movement with happy themes again. Although this process is simple, you just have used a machine learning paradigm – so simple you did not needed a computer.  This idea will be expanded and illustrated later with the piece ‘Emotional Sonata Form’.

Other examples of features might be dynamics, whether a motive included a crescendo or decrescendo;  the number of notes;  how fast the motive is; the amount of swing that you apply to a particular bar; or just tone – which instrument? A violin, a piano, an accordion? As we complicate matters you can see how a computer will be of great assistance. The musical information retrieval literature is packed with algorithms  which extract features from audio files or that somehow describe musical characteristics. These are pretty mathematical but will be described in detail in the feature extraction section of this dissertation.

This dissertation starts by setting the background in several ways: i. autobiographically, ii. other composer’s similar approaches, iii. machine learning and self-reflective practice applied to composition and also other computer assisted methods of composition. I will also propose a number of machine learning methods that will be fully reproducible and hopefully of use to other composers in thier compositional practice. Finally, I will present a number of my compositions made with varying application of the approach along with a full analysis of how this process of machine learning assisted a self-reflective compositional approach in each case. Other aesthetic and production details will be explored too as they apply to individual pieces. I will also try to describe my personal experiences whilst using these paradigms, and also how they have impacted my musical practices and influenced the way in which I collaborate with other musicians.

I am deeply honoured that you my dear reader are giving your time to consider my compositional methods. I hope you enjoy reading this dissertation as much as I enjoyed writing it. I also hope that these pages will inform your compositional practice in a way that helps you enjoy your music composition and improvisation at a new level. Please join me in this little adventure called A Machine Learning Assisted Compositional Paradigm.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s