Living Sound Pictures

"Living Sound Pictures" is an application intended for mobile devices, that composes and synthesizes sound real-time based on real world input parameters. Currently in rough prototype stage, the project seeks to sonify the world around us in a manner like "sound walks", however, the sound is generated from the environment. Mobile devices allow lots of input parameters, like video, position, velocity, microphone and last but not least: Video. Intended usage is for the curious people but also the everyday walks, creating a interesting listeting alternative to MP3s. It also has interesting other perspectives, not only sonification of video for blind, but also harmonization and timbre based on colour, for colour blind people.

Contents

Current features

How it works

The program, as is, works as an multiplatform application. It is written in Unity3D to allow easy access to the camera/webcam, and to utilize previous work in the blunt library. The program consists of three parts, the analysis- and compositional engine, the actual composition and the interface. As is discussed in the paper, semantic mappings differ between people. It has therefore been important to make a distinction between analysis and meaning, to allow other componists to create their interpretation of the data. That means the program analyses the incoming data (video, audio, sensors) and makes it available to the composition. The composition is free to use it however it wants. The interface consists of a screen displaying the captured video along with a bunch of diagnostics that represent the sound-creating mechanisms, I will document followingly.


Resynthesis of columns. As is discussed in the paper, one method of mapping two-dimensional data to a lesser domain is 'scanning' over it, effectively mapping one axis to time. In the image to the right, we see the vertical column in the main picture alone. This represents an area that gets resynthesized. It is not two-dimensional though, the width merely represents an average. On each frame, the program analyses RGB pixel values for each y-value (pixel) in the column. Each RGB matrix gets mapped to a harmonic in the stereo wavetable synthesizer, that is, the N-th pixel controls the volume of the Nth harmonic to the base note in the additive wavetable synthesizer. Additionally, red modulates only left-channel harmonics, while green only modulates right -channel harmonics. Blue merely transposes the harmonic, in steps.

This is demonstrated in the video, at 0:45. The "freeze animation" button refers to this column, which normally translates around the screen following a sine-curve. This can be disabled. The three RGB graphs shown in the main picture display the intensity of the pixel values in the column, thus they show the distribution of colour and sketch the timbre of the additive synth.

Rhytmic/pattern detection. I was really hoping to have utilized OpenCV for this, but I had integration problems and was working on a deadline, so for now, the system is currently just a fourier transform of the horizontal axis. The fourier transform transforms a signal into frequency components - this means that repeating horizontal patterns in the image gets recognized, and the frequency with the highest magnitude is stored. The composition can then perform deviations (like more rhytmic sub divisions) based off this. This is demonstrated in the video at 1:14. The magnitude plot of the complex transform is displayed as a turquoise plot:

Colour analysis. The image is analysed horizontally for colour distributions, that deviates from grey-values. Thus, you can determine whether the image is comparatively more reddish, greenish or bluish. In the example song, these distributions drive the mix of some effects. The red value drives a even-overtone warmth distortion model together with a low-pass filter, to associate 'red' with warmth. The blue value drives a reverb mix, to associate 'blue' with atmospheric, cold and ethereal sounds. The green value drives a phaser, to associate 'green' with harmonic and heavenly state. These mappings are a subjective example, and are discussed in detail in the paper. It is shown at the bottom of the main picture.

Along with this, the dominant colour is selected based on a simple model, that values non-grey succesive per-colour pixel values greater, such that 'blobs' weight/score higher than even distribution. The dominant colour is displayed as a block. This value, in the example song, determines the harmonization of the song: Red equals major, blue equals minor, green equals neither (suspended chords). These effects are demonstrated in the video at 1:39.

Note. All of these parameters for the analysis can be adjusted visually and/or be scripted in the project. The rest of the interface includes a small diagnostic section that includes CPU and FPS measures, and also the label frequency division. This value corresponds to tshe maximum magnitude bin in the FFT of the horizontal direction, as explained earlier.

Lastly, it includes an oscilloscope of the generated sound:

Debugging and trouble-shooting

Intended prototype

As shown in the demonstrational video, it probably feels rather clumsy to actually film footage actively to hear sound. One way out of this is to take a so-called living sound-picture, however it only allows for so much modulation. Instead, I imagined creating a small and separate clip you can attach somewhere on your body. The clip contains a camera and microphone, and would possibly be wireless. This would allow continuous modulation without being an inconvenience for the user.

Development status

Source code repository is not quite up yet, you can contact me however if you're interested. As always, I'm interested in collaborating on my projects - especially this one. It needs some people interested in composing/brainstorming for the platform! I don't actively develop it currently, except for the underlying library, but I would love to see the project blossom.