Skip to content

The Conversation: Mark Grimshaw

I first stumbled onto Mark’s work while searching for video game scholarship on “visemes,” the design term for a visual analog of a phoneme. Being somewhat familiar with Michel Chion’s work on audio-visual synchronization, I was curious what video game scholars were making of comparable (if more complex?) sound and synchronization issues in video game design. I was to delighted to discover Mark’s co-written essay “Uncanny Speech,” on just this subject, in his edited volume Game Sound Technology and Player Interaction (2011), and even more pleased to find that Mark had published what appears to be the first book-length study of game sound, The Acoustic Ecology of the First-Person Shooter (2008), a text that will no doubt become indispensable to sound studies, and is already making its way onto film, new media, and video game studies syllabi.

JAVIER O’NEIL-ORTIZ: With a background in sound engineering and degrees in music and music technology, what led you to work on game sound, specifically? How has your industry experience informed your approach to the study of game sound?

MARK GRIMSHAW: I’m always very interested in new areas of academic study and, at the turn of the century, I started to get interested in the possibilities for the study of games as a university course. I was working in a music department at the time with oversight of the recording and music technology degrees and, writing a new degree in computer games (the Computer and Video Games degree that is still running at the University of Salford), I naturally sought to ensure that sound had a prominent role. As for my industry experience (studio recording engineer) informing my approach to the study of game sound, I don’t think it has. There are more important influences. I grew up and later lived in various countries around the world, including several in Africa, and an abiding memory is the cacophony of an African acoustic ecology. From the mingled sounds of birds early in the morning, to the musical xylophone-like croaking of frogs in a swamp, to the crickets and other life in the night, to the rustling of snakes on the sides of paths, I grew up alive to the meaning of sound. Just why do people today shut out the acoustic world with ear-phones (making themselves deaf in the process)?

JAVIER: In your work you note that the studies of sound in film and in video games have taken parallel paths. You also make considerable use of theories of film sound in your study of gaming, while pointing to their limitations. Could you say more about the two disciplines’ points of convergence and divergence? How have theories of film sound proved helpful and/or inadequate, and how do you see theories of game sound shedding light (or noise) on film sound studies?

MARK: If I recall the reference correctly, in stating that studies in film and video game have taken parallel paths, I wasn’t referring to the study of sound specifically. I was referring to the fact that the study of each deals with sound after it has started dealing with the image. This also parallels the development of the technology for each – image first then sound and image. I found film theory (about sound and in particular the work of Chion and Altman) a useful way to engage with, what was then, the new theoretical area of game sound. Useful theoretical notions from film theory and other areas include acousmatic sound, causal/semantic/reduced sound, isomorphism and caricature sounds. Where film sound theories and techniques fall down with respect to game sound is in the matter of interaction. Films are linear with soundscapes that vary little if at all – games tend to be non-linear with soundscapes that are different at each playing of the game because of the input of the player or players. Game sound is intimately tied to the actions of the player so has a very different meaning for the player, one that is a verification or, indeed, reification of his/her role – indeed presence – in the gameworld. This is an experience or mode of being that film sound cannot give, where we are almost always cast in the role of passive non-participant. With the coming of 3D film, I suspect we will shortly begin to see a renewed attempt at non-linear film and, depending upon how far that is successful and how far the technology can support it, we may well see a more active role for the film spectator and some attempts to include interactive game-like sound. I don’t hold my breath however. I found Heavy Rain an experience of little interaction, more like a film than a game and, for this reason, very disappointing – spending large portions of my precious game-playing time watching long FMV inserts does not interest me. I do think, however, that some of the work that is currently going on in terms of using psychophysiological data from the player to adjust properties of, even synthesize, the sound could be usefully used in cinemas to give a more personalised sound experience. Say you are watching a horror film and the cinema seat (or sensors on the 3D glasses you are wearing) senses you are not frightened enough (or senses the entire audience is not frightened enough), the film’s sound engine ups the ‘fright’ parameter of the sound in the same way that we are attempting to do with games (an area one of my PhD students, Tom Garner, is working in).

JAVIER: In your co-written article on “Uncanny Speech,” which you define, generally, as a “mismatch” of facial animation and lip synchronization, you distinguish between two forms of the uncanny. The first, you argue, undermines immersion, believability, and player identification with empathetic characters; whereas, the second artfully exploits the mismatch in vocalization, typically with zombies, human-like robots and other traditionally “uncanny” creatures. Coming from film, I am interested in this idea of a ‘productive’ acoustic uncanny (and am reminded of Chion’s claim, in The Voice in Cinema, that “The process of ‘embodying’ a voice is not a mechanistic operation, but a symbolic one.”). Could you say a little more about how, in video games, the uncanny can function as an aesthetic effect (rather than a technical error)?

MARK: I’ve discussed some of this in a 2009 paper entitled “The audio Uncanny Valley” in which I attempted to bring sound into the Uncanny Valley equation and into debates on the uncanny in general. Subsequent to the ‘Uncanny Speech’ article, my colleague Angela Tinwell has done more empirical work on uncanniness and the relationship between speech/mouth articulation and facial animation. I have a PhD student (Tom Garner) who’s working on biofeedback to enhance perceptions of fear through real-time manipulation of sound according to the player’s psychophysiology. I mention all this, not (merely) to point to my colleagues’ work but to point out that there is an increasing interest in the relationship between sound (and let’s not forget silence!) and the fear emotion while also preparing the ground for answers to your later question.

Chion, I believe is talking about perception (as opposed to sensation) and, to grossly generalize, his comment is in the context of the ‘ventriloquism effect’ or, as Chion terms it, ‘synchresis’ – mechanically, and in the cinema, the production and reproduction of image and sound are distinct and physically separate. Perceptually, though, and assuming the technical production (e.g. dubbing) is precise, they are one and the same event. So, in the case of an actor on screen, the voice emanating from the loudspeakers is embodied by the image and with all the symbolic and contextual ephemera that implies. I wonder if he’s right, or has used the right means of expression to ‘embody’ his concept. The idea of ‘embodying’ a voice gives rise, in my mind at least, to the image of body entering voice thus giving it much needed form and substance. I see it the other way – that voice enters body, giving the body its form and substance. It is the body that is resonated rather than the voice embodied. In this way, the body acquires meaning and significance.

Who watches silent film without some feeling of discomfort or attempting to fill in the gaps with imagined sound effects or supposed speech? Silent film, in any case, was rarely if ever silent with a whole array of devices to add a sonic element – organ, piano, orchestra, manual sound FX, prompter and lecturer. In contrast, radio (and I refer to talk radio) goes from strength to strength and, while radio can give rise to imaginings of corporeal form behind the voices and sounds, it is not a requirement and there is no feeling of discomfort or lack in listening to radio. Why do we not feel this when looking at a photograph? Because silence is the absence of event. While the eyes provide a focus, our ears provide the environment in which we live and if that environment falls atypically silent then there is something up and probably something up that we should be fearful of – this is prehistorically deeply embedded in our psyche (and, I suspect, in the psychology of many other animals too). There is no movement in a photograph. With a film, we can see events occurring all the time so, when it falls silent or we are unable to resonate the bodies on screen via a process of accurate perceptual synchronization between sound and movement, we are right to be wary and fearful. If designers can use technical explanations of the Uncanny Valley effect to promote unease and fear then this is how the uncanny can be used for aesthetic effect.

JAVIER: What video game strikes you as having an especially sophisticated sound design? Can you recall a moment in gameplay that stands out as particularly elegant and well-designed?

MARK: That would have to be Space Invaders; an early example – primitive by today’s standards, yes – of the use of sound to resonate the pixels on screen. Perhaps ‘resonate’, in the sense of re-sounding the moving image, is the wrong term in this case. Unlike the images of actors in a film that cry out to be re-sounded, pixels generated on a screen do not have this motivation. However, as a series of events unfolding in front of our focussed gaze, resonation is required. The sound in Space Invaders is perfectly matched to the events – in particular, the accelerando – such that it is the sound that drives the player’s rising emotion. So much of my pocket-money as a child went into those machines.

JAVIER: Where do you see your work going from here? Is there a second project in the works or forming on the horizon? Has your study of sound in the first–person shooter pointed the way to a new problem or question to explore?

MARK: There are a number of areas I’m involved in ranging from games to uses of sound in other environments. As mentioned, I’m involved in work on using biofeedback from players to enhance the fear-promoting aspects of sound in games. But I’m also interested in other uses of sound for persuasive purposes, uses that have arisen from my study of sound in FPS games. I’m involved in a project that makes use of sound to cut CO2 emissions (strange but true). It involves the use of ‘persuasive’ sound to encourage car drivers to get out of their cars into buses or to use bicycles or even walk. I’m interested in how synthesized speech can be ‘emotionalized’ and thus be used with relational agents in all sorts of applications from persuasion to learning. I’ve recently become interested in how we function creatively. In my other life, I design and program a Virtual Research Environment (if I’m allowed to advertise, it’s called WIKINDX). It’s basically an online collection of bibliographic resources and my next task is to start sonifying that collection (i.e. the concepts contained within) in an attempt to use sound to aid in the creation of new ideas and concepts. Images/graphics have long been used to visualize and thus simplify complex concepts or otherwise ungraspable static data sets yet sound is rarely used because of the temporal aspect (conversely, because of this temporal aspect, sonification does have great advantages over visualization when it comes to the representation of data unfolding over time and any change in that data). How, though, can sound be used to represent an academic or philosophical concept? And, if you can achieve this, how can such sonification be effectively used to facilitate creative processes?

Mark Grimshaw qualified with a BMus(Hons) at University of Natal, South Africa, an MSc Music Technology from York University, UK and a PhD on the acoustic ecology of the First-Person Shooter at the University of Waikato, New Zealand. After working as a sound engineer in Italy, he moved into academia where he is currently Reader in Creative Technologies at the University of Bolton, UK. He writes extensively on sound in computer games and also writes free, open source software for virtual research environments (WIKINDX). Mark is the author of The Acoustic Ecology of the First-Person Shooter (2008), the first book-length study of game sound, and he recently edited the anthology of computer game audio, Game Sound Technology and Player Interaction (2011). He is currently editing the Oxford Handbook of Virtuality for Oxford University Press.