Making-of 'Chat in G-Minor'

Teaching machines how to make mistakes

Chat in G-Minor is a browser-based audiovisual installation for two iMacs. It makes use of modern JavaScript features such as the Web Audio API.


Two clients transmit and receive an image in an alternating way. The transmission is based on sound rather than a digital data connection. This means that an image that is to be transmitted needs to be translated to sounds in a way that is predetermined for both parties in order for the receiving side to be able to decode the sounds back into an image. As the idea evolved, we introduced different techniques concerning the translation which turned out to be the critical and most interesting part of the installation.


Generally speaking, the app logic is composed of two parts: a receiving (Receiver) and a transmitting (Transmitter) part whereas these change with every iteration. So if client A is a Receiver in iteration 1, client B is a Transmitter. In iteration 2, roles would switch, client A is Transmitter, client B is a Receiver.


As mentioned before, the translation is a critical part of the installation and as such was the main focus of our research. Translation from a time-independent (e.g. an image) to a time-based (e.g. a sound) medium required introducing a simple form of a sequential logic. In order to translate an image over sound, the image is translated, transmitted, received and translated back pixel by pixel, one at a time. This leaves individual 24 bit RGB color values (red, green and blue channel with 256 steps per channel) as subject of translation.

As a first step, the 24 bit RGB spectrum is reduced to 6 steps per color channel instead of 256, to reduce the possible color range from 16,7 Million colors to 216 colors. This also complies to the "web-safe colors" Standard introduced in the mid-90s.

Approach 1

RGB color values use three channels, red, green and blue to additively describe colors. The first approach involved splitting up the color channels and transmit in parallel. By splitting up the audible frequency range (e.g. ~16Hz to 16.000Hz) into three evenly sized segments (16Hz - 5.344Hz, 5344Hz - 10.672Hz, 10.673Hz - 16.000Hz) and using each of the segments to transmit one channel of a color value, color values (e.g. red values ranging from 0 - 6 steps) can easily be mapped to a frequency range (16Hz - 5.344Hz). This is the simplest form of translation and results in extremely atonal sounds ranging over the complete audible frequency range.

Approach 2

Mapping all 216 possible color values to a music tone on a scale and stepping up only in whole tones, the range would exceed 30 octaves in tonal range, quickly ending up in the ultrasound frequency range. To solve that, binary logic is used to represent a color value. Imagine a dictionary of chords where a chord is a binary represantation of the individual notes of that chord being played or muted (below called "binary chord"). Now one chord on one basetone already can represent 15 unqiue tone combinations:

  • D3 F3 A3 D4 - 1111
  • D3 F3 -- D4 - 1101
  • D3 F3 -- -- - 1100
  • D3 -- A3 D4 - 1011
  • D3 -- A3 -- - 1010
  • D3 -- -- D4 - 1001
  • D3 -- -- -- - 1000
  • -- F3 A3 D4 - 0111
  • -- F3 A3 -- - 0110
  • -- F3 -- D4 - 0101
  • -- F3 -- -- - 0100
  • -- -- A3 D4 - 0011
  • -- -- A3 -- - 0010
  • -- -- -- D4 - 0001

Note that it is not intended to include a binary 0000, e.g. -- -- -- -- since no notes would be played, hence 4 bit equals 15 possibilities. Now each of those 15 combinations can be uniquely linked to a color value. With 216 color values and 15 possible color values per basetone the tonal range spreads to about 2,5 octaves in a natural g-Minor Scale.

We then settled with this approach.

Dictionary example with 8 color values

Color Values:



For the receiving party to be able to receive pixel color values it needs to know when the transmission is about to start. For that, the transmitter plays a high pitch sound above the audible frequency range (~ 20.000Hz) that the receiver is listening to. As soon as the receiver picks up the Signal, the transmission is about to start after a predefined signal duration (~ 1s).


The Transmitter uses the Canvas Method CanvasRenderingContext2D.getImageData() to get the color values of the previous iteration. It iterates over all colors, building up a transmission array, based on the dictionary above. After building up the transmission array the Signal is played to indicated to the receiving party that the transmission is about to start. Using Window.requestAnimationFrame() color values are transmitted one by one using the excellent Tone.js library as an higher abstraction layer of the Web Audio API.


After receiving the Signal and therefore being able to tell the exact moment the transission is about to start, the Receiver also uses the window's global method Window.requestAnimationFrame() to analyze the microphone input at a predefined rate.

This is the flow for each iteration (color value / pixel):

  • Using the AnalserNode FFT Float Data is being obtained and stored in an array fftData.
  • In the frequency range of all possible tones (available via the dictionary) a peak decibel value is obtained.
  • Played notes are extracted from fftData by crosschecking available notes from the dictionary and local peak values that are not quieter as the absolute peak minus a predefined threshold.
  • A reverse lookup on the dictionary returns the color value.