Music, like genetic information, can be expressed at points along a spectrum of detail, from abstract representations to physical phenomena. In each case, moving an element from one representation to another along the spectrum is a process of translation.
In the case of genetics, DNA is translated (or interpreted) by cell mechanisms to construct proteins, that eventually give rise to phenotypes. However a scientist can try to understand the effect of DNA by translating in reverse: given a population with a common characteristic, do they share a particular gene? Part of the modern approach to this branch of chemistry is the segmentation of the translation process into a sequence of increasingly detailed representations, called structures. The primary structure is a discrete representation: nucleotides, or amino acids. The quaternary structure is the physical molecule itself: its precise shape and chemical properties. (The Rosetta@Home project is a search to find likely tertiary and quaternary protein structures by translating from primary structure.)
Similarly in music: discrete musical notation is interpreted (by a musician — or an electronic playback device), and ultimately translated to vibration patterns in air and ear. One view of composition is that it is the reverse process: it is the art of imagining a sound, and determining the music required to produce it. (Though, just as there is structure in the sounds that sound pleasant, there are identifiable patterns in notation that a composer can use as seeds for experimentation.)
I like music, and I’ve been making good use of a set of online resources that provide scores of musical scores:
- The International Music Score Library Project offers scores, often in the form of scans of out-of-copyright editions from ancient times.
- The Mutopia Project is an attempt to product modern free editions, using LilyPond (a TeX-like music typesetting system).
- The Choral Public Domain Library, which specialises in choral works (including medieval and renaissance).
I am not much of a musician (and I’m even less of a composer). But I’ve been working a little project to create music from the discrete notation that I can painstakingly transliterate from the scores.
To synthesise music from first principles, we need a program that translates from notation to wave-form. Like many a translation program (a programming language compiler, for example), the synthesiser will attempt the translation task in stages. Each stage translates from an abstract representation to one that is more real, by computing additional detail that is only implicit in the input, just as genetic translation determines detail such as atom locations or chemical bonds.
The representations I have in mind are:
- Textual description of piece. The first bar of Beethoven’s Moonlight Sonatalooks like:
(T>3 G#-1 C# E)*4 + T<4 C#-1 + C#-2
- Sequence of tokens. (, T, >, 3, space, G, #, -1 are examples of this.
- Nested sequence of commands. The segment in parentheses, above, is a command, and it in turn contains four commands: one to set the tempo, and three notes).
- Set of notes. Each note is a tuple of (instrument, pitch, volume, offset, duration).
- Set of wave forms. Each note expands a set of wave forms — generally, one for each harmonic for that instrument.
- Queue of wave form starts/stops.
- Wave form for the entire piece.
- Binary PCM data for writing to a WAV file.
I actually began the project a couple of years ago. This is my first proper description of it, and I’ve somehow painted a rather ideal picture of it. The input syntax as implemented does not support time signatures that are not powers of 2; nor does it restrict its effect to the enclosing block. However, I managed to type in the first half of Bach’s Fugue in D minor (preceded by a fragment of the accompanying Toccata). It is recognisable, but patently artificial.
The two main avenues of improvement I’d like to explore are:
- The textual representation. I’d like to optimise this for input ease — including allowing shorthand for patterns that arise in the score. Some of the basic patterns encountered in the Bach piece were: a passage where every second note is the same (whose representation should really be halved), and repetition of a passage on a sequence of varying keys (which should be reduced even more drastically). Could a simple note calculus that constructs notes from keys and numbers, combined with something like the lambda-calculus help in both cases? Instead of “A B A C A D A E” imagine “interleave A (B C D E)” where interleave is a function, either built-in or defined in advance.
- The waveforms generated for each instrument. This requires actual research! Currently I assume each instrument to be a set of harmonics combined with an envelope that describes the volume of the waveform from attack to decay. It’s a simple model of musical instruments, and is indubitably inadequate — but a simple model is better than none at all. I need to explore this more before venturing out into more complex (and complicated) mathematics.
Additionally, I’d like to improve the way waveforms are aggregated. The present implementation does this additively on the time-domain waveform — it means ill-aligned waveforms can cancel or compound, with a bizarre effect on the output. It tries to mitigate this by randomising the phase for each waveform, but this mostly makes the effect inconsistent. I suspect the real solution involves Fourier transforms. (But, to adapt Jamie Zawinski: people with a problem often say, “I know, I’ll use Fourier transforms!” Now they have two problems… :p)