Sine-Wave Speech, The Clangers, EH Gombrich, EVP & Wartime Radio
First click on, play and listen to the recording Sequence 1 (below). What, if anything, do you think you can hear? Then click on, play and listen to Sequence 2. Sequence 2 is in fact the original recording from which the first clip you heard was derived. Now you know what the original voice was actually saying, listen to Sequence 1 again. Once listeners know what these recordings are meant to say, most people report that much more vivid impressions of meaningful speech pop-up (so-to-speak) out of audio sequences that initially most listeners find almost unintelligible.
Sequence 1 was created using a technique called Sine-Wave Speech, or Sine-Wave Synthesis, and although the “Rorschach Audio” book does mention this subject (pages 25 & 26), following Pradheep Shanmugalingam’s excellent presentations on Sine-Wave Speech for The Voice Symposium and Resonance FM radio broadcast (see earlier posts), I wish the book had made more of this subject, because the phenomenon demonstrates the importance of prior knowledge in speech perception very effectively (in fact I stumbled across a very similar phenomenon messing around with electronics as a teenager, but unfortunately never kept the tapes – I’ll discuss that if and when I get round to reproducing the experiment).
Sine-Wave Synthesis is a technique that was developed by the experimental psychologist Philip Rubin of Haskins Laboratories in the early 1970s, then further explored, notably by his colleague Robert Remez. The technique involves synthesising artificial, computer-generated replicas of human speech, by analysing the audio spectrum and tracking the major energy bands or formants in a given speech recording, then creating usually 2 or 3 pure sine-wave signals that follow the rhythmic contours of those original formants. Technical details notwithstanding, as yours truly pointed-out in the group chit-chat before the Resonance FM show, the effect sounds a bit like the space mice in Oliver Postgate’s charming and atmospheric BBC children’s TV show “The Clangers” (and that comparison is not trivial, since the sound effects used in “The Clangers” are all the more effective for creating fairly passable illusions of plausible speech, and for several reasons “The Clangers” really is a classic piece of contemporary sound art, albeit for children).
In context of Electronic Voice Phenomena demonstrations (refer to the first post on this website), listeners are almost always prompted how to interpret the ambiguous and often badly distorted stray communications chatter that EVP researchers present as recordings of ghosts. In the case of EVP researcher Konstantin Raudive’s “Breakthrough” record, these prompts were announced by narrator Nadia Fowler before each EVP clip. In the case of the EVP tapes recorded by Raymond Cass, introductory prompts did, as I understand it, exist on Raymond’s original cassettes, but, although these prompts were edited-out before his tapes were re-published on “The Ghost Orchid” CD, his prompts still exist in the form of the printed track titles on “The Ghost Orchid” CD sleeve-notes.
As shown by Sine-Wave Speech demonstrations, if we know what an ambiguous speech recording is meant to say, as long as the meaning attributed to that recording is reasonably plausible, it’s much more likely that we’ll interpret ambiguous material in a way that appears to corroborate the opinion of the person who told us what to expect in the first place. Sine-Wave Speech demonstrations illustrate the extent to which experiences and prior knowledge condition perceptions, so, in context of considering EVP, it’s relevant to quote the art historian EH Gombrich’s full description of his work for The BBC Monitoring Service during WW2, as described in his book “Art & Illusion”. The most important passage to highlight here is where Gombrich states that if his colleagues sought a second opinion about the meaning of an ambiguous speech recording, unlike EVP researchers, they would not tell subsequent listeners what they personally thought that voice might have said…
“It so happens I had an opportunity to study this aspect of perception in a severely practical context during the war. I was employed for six years by the British Broadcasting Corporation in their Monitoring Service, or listening post, where we kept constant watch on radio transmissions from friend and foe. It was in this context that the importance of guided projection in our understanding of symbolic material was brought home to me. Some of the transmissions which interested us most were barely audible, and it became quite an art, or even a sport, to interpret the few whiffs of speech sound that were all we really had on the wax cylinders on which these broadcasts had been recorded. It was then we learned to what extent our knowledge and expectations influence our hearing. You had to know what might be said in order to hear what was said. More exactly, you selected from your knowledge of possibilities certain word combinations and tried projecting them into noises heard. The problem was a twofold one – to think of possibilities and to retain one’s critical faculty. Anyone whose imagination ran away with him, who could hear any words – as Leonardo could in the sound of bells – could not play that game. You had to keep your projection flexible, to remain willing to try out fresh alternatives, and to admit the possibility of defeat. For this was the most striking experience of all: once your expectation was firmly set and your conviction settled, you ceased to be aware of your own activity, the noises appeared to fall into place and be transformed into the expected words. So strong was this effect of suggestion that we made it a practice never to tell a colleague our own interpretation if we wanted him to test it. Expectation created illusion.” Ernst Gombrich “Art & Illusion” 1960
Incidentally, according to Wikipedia, and to an interview with Oliver Postgate conducted by Clive Banks (no relation), the voices used in The Clangers were created using swanee whistles, or piston flutes, played, much like Sine-Wave Speech, in a way that “followed the rhythm and intonation of a script in the English language” (with resultant perceptions being reinforced by a conventional spoken narration). It is also consistent with various “Rorschach Audio” type and psychoacoustics phenomena, that “when the series was shown without narration to a group of overseas students, many of them felt that the Clangers were speaking their particular language”, and that a “non-worded but [musically] scored script seemed to allow the Clangers to say almost anything”, including (predictably) “swear words”. At the beginning of the third episode a door to the Clangers’ cave system fails to open, Major Clanger says “oh, sod it, the bloody thing’s stuck again”, and kicks the door to make it open, with the words being sufficiently well articulated that apparently the BBC asked Oliver Postgate to have these sounds removed (well, at least, that’s what Oliver Postgate said 36 years later anyway).
In terms of illustrating the distinction between sensation and perception, at the risk perhaps of stating the obvious, “sensation” is when (for instance) we hear, but fail to understand, raw speech sounds (raw “sense data”) which have been articulated in an unfamiliar language; whereas “perception” is when we understand words spoken in a language we already know. Likewise, Sine-Wave Speech demonstrations also allow audiences to feel the difference between sensation and perception, but, because they don’t require us to take time to bone-up on any previously unfamiliar language, they demonstrate that distinction much faster and arguably more vividly (and of course, the name of the TV series “The Clangers” is in itself a reference to sound).
Here are some more examples of Sine-Wave Speech, presented by Matt Davis of The Medical Research Council’s Cognition & Brain Sciences Unit –
On a point of detail, the “Rorschach Audio” research paper published in 2001 refers to sine-wave speech in terms of quoting researcher Celia Woolf’s description of how “some listeners perceive time varying sinusoidal tones as speech”.
Joe Banks, 23 March 2013