The Emotional Sonata Form as a Case for Musical Surautomatism

Ox-ent-dach: Emotional Sonata Form Performance for Modular synthesiser
and Amplified Piano Forte

Since the advent of Breton’s first surreal manifesto, surreal
techniques have been proposed as methodologies to drive creativity by
taking control out of the artist’s actions into an “automatic” realm.
We borrow ideas from Luca and Trost’s (1945) Dialectique de la
Dialectique: “the way to surrealist painting blossoming lies in the
use of aplastic, objective and entirely non-artistic procedures”. We
extend this idea to music composition by proposing a computer-asisted
method that uses non-artistic and non-musical (“aplastic”) procedures
to promote a self-reflective compositional approach.

This compositional workflow is proposed to guide composers’ selection
of music themes and features contingent upon their prior emotional
responses to these themes – a form of surautomatism. This workflow
applies Plutchik’s emotions classification framework. Here the
statistical formalism of Fisher Discriminants is utilised as a
non-artistic method to (i) guide the classification of themes as per
the composers perceived emotions and (ii) to discover prevalent
musical descriptors. This method ultimately could be regarded as a
form of guided surautomatism with the aim of generating a final
compositional work by populating sections of a predefined musical form
– in our case an electro acoustic adaptation of the sonata form termed
emotional sonata form.

Emotional sonata form is an idea that evolved naturally during the
development of our emotional annotation process and is a four-fold
musical structure: introduction; exposition; development and;
recapitulation. The sonata-form is then populated with themes based on
musical features that were ranked as emotion-prevalent for a selected
Plutchik’s axis.

By means of this framework we would like to present a
composition/performance for modular analog synthetizer and amplified
piano. The piece is entitled Ox-ent-dach, a phantasy on the anxiety
associated with landing a Boeing 747. The piece lasts for 17 minutes
and it is based on Plutchik’s axis: surprise/anticipation. It is, as
expected, structured as per the emotional sonata form paradigm
described above.
Jesus A Lopez-Donado BIO
Artist bio,

Jesus Lopez-DoNaDo is a Hispanic-Australian composer. He is currently
a doctorate candidate on music composition at the QLD Conservatorium
under the guidance of Prof A Brown and Dr G Dirie. He works at the
crossover of emotional psychology, music composition and computational
intelligence. In his doctoral thesis he is exploring ways to generate
a surautomatic compositional framework guided by individuals’
emotional responses. He has been an avid and productive composer since
1992. His works include scores for theatre, contemporary dance, film,
acousmatic and acoustic ensembles. He has attended educational
sessions and learnt from the likes of Adina Izarra, Alberto Grau,
Karlheinz Stockhausen, Maria Guinan, Curtis Roads and Rosa Briceno. He
has attended residencies at Queen’s University’s Sonic Arts Research
Centre (Belfast), Conservatorio Jose Angel Lamas and Simon Bolivar
University (Caracas, Venezuela), Taller Flamenco (Seville, Spain),
Teatro del Circulo (Rosario, Argentina), Queensland University of
Technology and Griffith University (Brisbane, Australia). He is
married with two daughters, and enjoys cooking, camping and studying
the bible daily.

Emotional Cantata Form

This piece is an electro acoustic composition, which is based on a pool of pre-recorded choir passages and a set of modular synthesizer patches also with associated pre-recorded audio files. All annotated audio files were annotated by the composer and a database was generated with calculated FSkl across audio files for each of four {MI,¬MI} clusters: (i) {anger, fear}, (ii) {trust, disgust}, (iii) {surprise, anticipation}, and (iv) {joy, sadness}. Then these feature ranks were utilized to bias the composition process. By using the idea of sonata-form and hence using emotionally contrasting musical features to populate each segment of the sonata structure. We call this piece/performance:

Four short cantatas in emotional-sonata-form
anger versus fear;
trust versus disgust;
surprise versus anticipation;
joy versus sadness;

Emotional sonata form is an idea that evolved naturally during the development of our emotional annotation process. Hence, we propose using musical features that highly correlate with specific MIs to bias the musical discourse. We arbitrarily use the sonata form, which commonly uses the following structure per movement:


Here we approach the structure of each movement by using the sonata form as a structural guide for composing a musical discourse. The sonata-form is then populated with themes based on musical features that are ranked as emotion prevalent. Classically, the sonata form uses more traditional variation tools, i.e. dynamics, key modulation, tempo, etc, so as to construct a musical discourse.
In our case, the introduction (i) is ad-lib but exposition (ii) is monothematic. For instance Haydn was well known for monothematic expositions in which only key was used to contrast the same material. In contrast with Haydn, we use the highest-ranking emotional musical feature for one MI to generate a contrasting exposition, hence, exploiting the inherent antagonist pairs in that MI. During development (iii), other highly ranking features for the MI associated to the movement are introduced to develop the main theme. Finally, in recapitulation (iv), a variation on the exposition is revisited to construct a sense of closure.
Figure 1: Plutchik’s Wheel of Emotions (1980): Robert Plutchik created a new conception of emotions in 1980. He called it the “wheel of emotions” because it demonstrated how different emotions could blend into one another and create new emotions. Plutchik first suggested 8 primary bipolar emotions: joy versus sadness; anger versus fear; trust versus disgust; and surprise versus anticipation. From there Plutchik identified more advanced emotions based on their differences in intensities. If you look at the diagram below you can see how each emotion relates to the other [1].
From a technical stand point, the four cantatas total to a 16 minutes long composition. The piece is conceived for virtual choir over 8 ambisonics channels with concomitant live electronics via 4 surround channels spatially distributed in real-time by the performer within the modular synthesizer.

Our second piece was an earlier application prior to this paper. In this piece each movement was conceptualized by just populating it with features the highly correlated just one Plutchik’s emotion or MI.

 A Machine Learning Assisted Compositional Paradigm: 1. Introduction

I have been composing electroacoustic music since 1999. Before that I was a tonal composer writing for small ensembles, theatre and choir. In 2000 I was undertaking research at the University of Ulster on machine learning and data fusion. Later in 2004 during the SARC opening at Belfast I was sitting at a workshop conducted by Karlheinz Stockhausen, Curtis Roads and others at Belfast. This was a significant privilege given that, sadly, Stockhausen passed a few years after that. Among many fundamental gems I learned as I reflect back on that workshop, are two struck struke me as crude realty-checks that later would change my compositional practice forever.

Firstly, at that time I undertook limited reflective practice on my compositional practice. For instance, I would elaborate an idea and throw some samples together in a mix then, by using a gross conceptual plot, I would hurriedly write a piece based on samples and synth patches. I would just go back and check it twice using some form of aural criteria and perhaps make a number of modifications to the final composition. The result was a wide variety of outcome styles and quality.

In contrast, Both Roads’ and Stockhausen’s repertoires had so much cohesion. You could tell there was so much thought and rethinking happening. Every sonic granule of Roads’ pieces was delicately placed at specific times – you could feel the craft of music composition unfolding. Similarly Stockhausen’s early tape music has gripping conceptual coherence. These composers did not have fast computers that allowed them to manage splices of audio, or hundreds of sonic granules, the way we could today. It took them time, a significant period of time to craft these apparently disjointed audio atoms into aural masterpieces almost by hand.

Secondly, I did not have real boundaries in terms of computer music options. By 2003 MAX/MSP was significantly powerful and the market was getting inundated with software based samplers, synthesisers and all kinds of wonderful plugins. Ableton Live was very stable and the dream musique-concrete machine. Sitting in front of a computer could send me on a profound state of existential catatonia – a composer’s block that many still face today as computational power rises and diversity of music production tools increases exponentially.

This later challenge of music tools hyper-resourcing lasted only one to two years until I decided to become a circuit bender. I built a eurorack modular synthesiser over 4 years and gradually populated my studio with all manner of modified music machines ranging from lo-fi sampling based toys to flash-grip FM synths from the 80s and 90s – they all went trough shameless solder iron hacking that produced sonic machines with limited predictability, decent sonic limitations and manageable controllability.

The aforementioned challenge of non-reflective compositional frameworks is the main issue this dissertation attempts to solve. I started exploring ways to formalise how I would reflect and redefine my compositions. I thought long and hard about formalising my compositional process, to achieve a description which it could be reproduced by another composer, whilst also keeping a degree of freedom. Hence I propose a computer-assisted method for music composition in which the computer is used not as a bottomless music creation toolset but rather as a decision making tool. This introductory section plots the basics of this method. The language is purposely informal so I can explain to fellow music makers my ‘gospel’ of music creation during the past 5 years.

The title of this dissertation is: A Machine Learning Assisted Compositional Paradigm. It is appropriate to analyse such a title and to aim to meet its requirements. In other words, to actually ensure that this thesis does what it says on the cover. The first words in the title are ‘machine learning’, therefore I will now explain what machine learning hence entitles for the purpose of this dissertation. Further, we will have a more in-depth discussion about what machine learning is and the application of it into the compositional methods that will be described here.

Machine learning is just that, it aims at utilising a machine, most often these days a computer, to simulate learning. Learning is the capacity of a person, or animal, or a computer to analyse input and create abstractions of such input, and then produce an output which is, well, learned. In other words an output that has some form of application and that it will be useful or that will make sense of the data.

Machine learning can be broken down into several stages. The first stage is data acquisition. The second stage, is feature extraction. Then, with those features, a form of computation or algorithmic engine is achieved. Finally an output is produced by that algorithmic engine.

For example, if you are playing a sport, let’s say for argument’s sake baseball. The pitcher will throw a ball at you, and you will try to bat that ball that is coming towards you at a high speed. The input in this case is a number of visuals that you can actually produce in your brain. Before your brain does something with that data, a form of feature extraction needs to be accomplished. For instance that ball needs to be isolated from the background, otherwise there’s far too much information. Certain angles or movements of the pitcher are recorded as numbers in your brain and things like the speed of the ball, or the diameter change of the ball as it’s coming towards you are all computed too. At that point, your brain crunches those numbers, it computes an output. That output is: bat now! Once all that information is processed, your brain will tell your arms to swing and hopefully if your batting algorithm is very good, or if the pitcher is very forgiving, you will be at the position of scoring a home-run. With experience, your batting will improve, as you learn to better judge the flight of the ball and to control your swinging of the bat.

This process of learning can be partially achieved by a computer when considering very simple problems. For instance a computer can have sensors like cameras as data inputs. Also software can produce an extraction of certain features or useful numbers, that can be fed to a form of algorithmic implementation or a formula the machine uses to produce an output.

Another example is speech recognition. There are several software packages out there that can translate speech into text – i.e. do transcription. The computer will have a microphone. This microphone will record your speech: This is a data acquisition stage. Once your speech is recorded and converted into a series of ones and zeroes into an audio file, the computer will undertake feature extraction on those digitised sound waves. The feature extraction algorithms will calculate certain things like how many times a particular form of a wave crossed through the zero line, or a local number of peak amplitudes, or a number of spectral characterisics indicating how this voice looks in the frequency domain. The computer then will get all these simplified numbers, or features, and try to produce an output. In this case the output will be a dictated word that actually matches the sound ‘pencil’. The computer could detect the percussive sound of the P, and then the vowel sound E, and then N-C-I-L. These phonemes will somehow be translated into the written word. Then you will see in your speech recognition program how the word pencil is typed for you.

Following the term machine-learning in the title, there is another word – assisted. In the example that I just mentioned (speech recognition), we can see that the whole process of dictation via speech recognition to text is realised to completion. Then, what do I mean by the word assisted? The word assisted is trying to imply here that in this dissertation machine learning, will not be employed as a universal black-box solution for generating compositions but rather I will use a number of machine-learning approaches to inform my compositional practice.

Let’s consider a couple of examples. Remember that ‘data acquisition’ is part of machine-learning. Let’s say that we are about to write a particular piece for piano. No matter how hard composers may try, they are not hermits. Therefore  before a composer writes that piano piece, he/she has already gathered a lot of data through experience about existing music and about the piano. It all counts, memories, feelings, emotions and thoughts.

Feature extraction comes after that. You might write a melody in a dorian mode over a C major seventh chord.You have learned that the medieval mode Dorian is a scale; a series of 7 intervals that can be visualised as a pattern over a keyboard or a guitar. You used the Dorian mode, and perhaps also the concepts of excitement and harmonic fitness. You are using features rather than the raw sound data.

Hence, it make sense to suggest that meditating and analysing musical features, as encompassed by abstract concepts such as emotions or memories, may significantly inform compositional practice. It may sound a little complex at first glance but this idea will be greatly expanded on in this dissertation.

Back to the word assisted. The word assisted in the title implies support or self-reflection. I’m going to machine learning methodologies use machine learning methods as part of the compositional process in order to inform my self-reflective compositional practice with the aim to improve the music that I write.

The last word in the title is a word paradigm which implies that rather than actually proposing a specific methodology, the idea of this dissertation is to share compositional guidelines to provide others with a practical compositional approach others can also use. This paradigm is one I have been using during recent last years in order to reflect, tune, and improve my compositions.

Finally, it is also worth mentioning that the main type of machine learning method that will be used in this dissertation is called non-supervised machine learning. Hence it is worth briefly clarifying the difference between supervised and non-supervised learning. By doing so I hope to further explain machine learning by using simple examples.

Supervised learning aims to provide a definite output based on classes or a classification mechanism. Let’s say that you are for instance trying to differentiate magnolias from lilies. Then let’s pretend that a very gifted botanist tries to explain to you that, “Well, lilies will have a certain form of petals, certain form of pistils, and certain colours. The fragrance could be important too!.” After all these data are considered and after formulating some features from of each flower, you will be able to start a learning process with a bunch of different magnolias and a bunch of different lilies. After your learning phase, you will be prepared to sit the magnolias versus lilies test in which you will be shown a lily or a magnolia at random.
If you did your learning with a decent variety of magnolias and lilies, you will be able to confidently say: “This is a lily” without much thought, or “This is definitely a magnolia.” When a computer undergoes this process it is a special type of machine learning called pattern recognition. The answers are given. The classes are given. The learning is how to classify data based on features – it is not open ended, responses are true or false.

Unsupervised learning, in contrast, proposes that you could tackle data in a way in which classes are not so important. To illustrate, you might be shown a bunch of different flowers, let’s say 50 very different flowers. After looking at them for a while, you will produce some feature extraction out of that data; the shape of the flowers, how they smell, their colours, and the textures. Slowly, you will try to classify certain flowers that actually look similar to each other. Then you might end up with little bunches of flower that are very similar. This is called clustering. Here, you did not have a set of master classes from the start. You did not have a goal of flower classification. The botanist was not there to tell you “this is a class of flower of the name X”. You just gathered the data in a pot and you then plotted similarities between those different features. Here, you may have discovered from a pile of flowers the categories of roses, petunias and geraniums.

Another form of analysis is considered in this thesis as a compositional aid: sensitivity analysis. Sensitivity analysis is a well respected tool amongst the variety of machine-learning techniques. In this type of analysis, statistical techniques are used to go back and reengineer machine learning functions. In this case and considering our flower classification and clustering, you might learn that a particular feature such as colour was very important for you in order to differentiate a lily from a magnolia by investigating with sections of a machine learning function ‘worked harder’ at either classifying or clustering tasks.

Let’s put our flower example aside and consider music as the subject. Let’s say that you would like to compose a piece for a piano. Then let’s say that you would like to try to use this methodology of machine learning assisted compositional paradigms in order to better inform your work. As you sit down you compose a musical motive. Then you compose another motive . Then you might compose a number of chord progressions. By the end of composition process, you will end with a bunch of little snippets of music. For the sake of consistency, let’s call these musical motives, or just motives. Now, you want to put some structure into your composition. For instance, let’s say that you want to use a paradigm such as simple emotions: sad, happy, etc. Now, you make another decision your first movement will sound happy. Your second movement will sound sad. And your third movement will sound happy again. You want to compose in a ‘emotional’ sonata form. At this point you can go back and actually classify your themes. You listen to your themes again and label them as happy or sad. You come across a nice F major sequence. You think, “Oh, that sounds pretty happy.” Then you labele it ‘happy’. Then you find this progression of D minor 7th that goes into an F minor. Then you think, “All right, that actually sounds sad” – you get the point.

By the end of this classification process, you will have a number of different musical motives with their labels. Each theme will be part of a class – remember lilies versus magnolias?— in this case happy and sad. Then you can use those labeled themes and compose with them by simply populating the first movement with happy themes. The second movement with sad themes, and the last one movement with happy themes again. Although this process is simple, you just have used a machine learning paradigm – so simple you did not needed a computer.  This idea will be expanded and illustrated later with the piece ‘Emotional Sonata Form’.

Other examples of features might be dynamics, whether a motive included a crescendo or decrescendo;  the number of notes;  how fast the motive is; the amount of swing that you apply to a particular bar; or just tone – which instrument? A violin, a piano, an accordion? As we complicate matters you can see how a computer will be of great assistance. The musical information retrieval literature is packed with algorithms  which extract features from audio files or that somehow describe musical characteristics. These are pretty mathematical but will be described in detail in the feature extraction section of this dissertation.

This dissertation starts by setting the background in several ways: i. autobiographically, ii. other composer’s similar approaches, iii. machine learning and self-reflective practice applied to composition and also other computer assisted methods of composition. I will also propose a number of machine learning methods that will be fully reproducible and hopefully of use to other composers in thier compositional practice. Finally, I will present a number of my compositions made with varying application of the approach along with a full analysis of how this process of machine learning assisted a self-reflective compositional approach in each case. Other aesthetic and production details will be explored too as they apply to individual pieces. I will also try to describe my personal experiences whilst using these paradigms, and also how they have impacted my musical practices and influenced the way in which I collaborate with other musicians.

I am deeply honoured that you my dear reader are giving your time to consider my compositional methods. I hope you enjoy reading this dissertation as much as I enjoyed writing it. I also hope that these pages will inform your compositional practice in a way that helps you enjoy your music composition and improvisation at a new level. Please join me in this little adventure called A Machine Learning Assisted Compositional Paradigm.

OnE with viri – lopezdonado + barret


Part of the last trogotronic comp – proud to be part a w t nelson endeavour. available from:

LopezDoNaDo performing this Friday Brisbane 15 March, 7.30 pm, Queensland Conservatorium, South Bank – Basil Jones Concert Hall ++++++ i shall see you all there @ the international premiere of 18{6{6{6}}} @ QLD Conservatorium Brisbane – Evening of March 15 2013


Plutchik’s Wheel of Emotions (1980): Robert Plutchik created a
new conception of emotions in 1980. He called it the “wheel of
emotions” because it demonstrated how different emotions can blend
into one another and create new emotions. Plutchik first suggested 8
primary bipolar emotions: joy versus sadness; anger versus fear; trust
versus disgust; and surprise versus anticipation. From there Plutchik
identified more advanced emotions based on their differences in
intensities. If you look at the diagram below you can see how each
emotion relates to the other.

Those of you in Oz specially Brisbanedorfs are most welcome to attend my next live show. 1/2 hrs of Eurorack emotional distress at the Queensland Conservatorium Brisbane during the evening of March 15 2013.

BLURB as follows:

Title (short): 18{6{6{6}}}
Title (long) Spanish: dieciocho temas para seis movimientos de seis
individuos con seis sentimientos
Title (semi-long) English: 18 themes for 6 movements of 6 individuals
with 6 emotions

Performance synopsis:
You see, dear listener, there are many ways to classify emotions but 2
main systems have chiefly made it into the sacrosanct realm of
phsychology journals. Namely Plutchik’s Wheel of Emotions (1980) and
Parrots’ Classification of Emotions (2001). This is my proposal of a
lil’framework to abuse my modular synth based on 6 acquaintances’
emotions. I call it 18{6{6{6}}} which is a pseudo LISPian joke
describing how 18 eurorack modular synth patches produced 18 fragments
of sound that got labeled with Parrot’s and Plutchik’s tertiary
classes of emotions. The story goes onto how I trusted Drs Parrot and
Plutchik empirical grouping of these tertiary emotions into 6 primary
emotions. I then adventured into improvising 6 movements each using
exclusively patches that were blessed with either of 6 emotions as
perceived by my 6 close friends/colleagues. Finally, this becomes a
hypothetical imperfect parlour of 6 imperfect individuals for 6
perfect emotions and 6 imperfect music movements. To bring the spirits
up somehow the piece is summarised by a child poem that goes * Lukes
is for Love * & Joseph is for Joy & ^ Sam is for Surprise! ^ $ Ann is
for Anger $ % Sid is for Sadness % ,but, # Fred is for Fear #



1st Movement: Lukes is for Love * – 6min
2nd Movement: Joseph is for Joy & – 6min
3rd Movement:  Sam is for Surprise! ^ -6min
4th Movement: Ann is for Anger $ – 6min
5th Movenet: Sid is for Sadness % – 6min
6th Movement: Fred is for Fear # – 6min
*&^$%# = Names are not actual.

Adios, J LopezDoNaDo