Encoding Works of Music in Databases

The algorithms associated with  David Cope's Experiments in Musical Intelligence’s (hereafter, EMI) algorithms can’t operate miracles, though sometimes a surprised listener may believe so. One must keep in mind that any information outputted by EMI derives directly from previously existing data as stored in the database, the ground-level of the program. Only continued practice can educate the listener/user to associate a given database with certain musical results. However, a few specific steps regarding the selection and editing of works must be observed to ensure an uncompromising start.

The works selected should share common stylistic traits if the user expects a stylistically coherent output. The more stylistic diversity in a database’s content, the more unpredictable becomes the musical result of recombinancy. Any errors present in the database, such as wrong notes that might’ve passed undetected during the sequencing process, will most likely reappear within the final output. Depending on the amount and types of errors, the expected side-effects of replicating data-base "toxic data" can run anything from not being actually used or referenced, to the corruption of the musical style originally intended for replication. MIDI files downloaded off the internet offer no guarantee of being mistake free data. Only the proofreading of such sequences can assure its correctness.

Separate timbres must have their identity protected. Each timbre must be assigned to a different MIDI channel. Failure to observe this distinction will compromise the identification and matching of patterns since it forces the program to work with data in a format that does not correspond to the original score.
A database may include recurring objects rhythmically set to different note-values in multiple works or different movements from the same work . The classical Alberti Bass pattern, for example, can be found permeating most of Mozart’s piano sonatas. Despite the apparent similarities between its multiple recurrences, the Alberti Bass may appear in some cases articulated using sixteen-notes and in others with eighth notes, depending on the movement’s rhythmic context. Such discrepancies of note duration must be leveled out to a common denominator, an essential procedure for a successful recombinancy at the rhythmic level.

All works must be stored using the same tonal center. That is, the recombinancy of segments can only be successful if the segments are stored in the database according to a common key and mode. The large number of non-harmonic tones usually involved in ornamentation figures can mislead the identification of signatures (recurring patterns that characterize a particular composer’s corpus of works, typically two or more beats in length). Hence, trills, mordents, and other ornamentation figures should be removed from the sequence prior to its storage in the database. Once removed from the data its reinsertion in the output must be done by hand according to the user’s judgment.

Music in EMI is represented and stored as lists of events in a database. An event is David Cope’s representation of the essential attributes of one note in a list of five parameters:

(0 60 500 1 127)

The positioning of parameters inside a list indicates their attributes. From left to right, the parameters in a list represent the attack time (on-time) in milliseconds, the MIDI key number (60 equals to middle C, 61 to C sharp, 59 to B, and so forth), the duration (where 1000 usually represents a quarter-note, 500 an eighth note, 2000 an half-note, etc.), the MIDI channel number (which allows for instrumental or voice differentiation), and the note velocity (where 0 is the softest, and 127 the loudest possible). On-times are relative to the metronomic indication in use. With the metronome set to quarter-note = 120, for example, an on-time of 1000 would actually be played 500 milliseconds after zero.

Typically events in a list appear sorted by their on-times.