Trajectory Capture in Frontal Plane Geometry For Visually Impaired

Speaker: Martin Talbot

Users who are blind, or whose visual attention is otherwise occupied, can benefit from an auditory representation of their immediate environment. To create it a video camera senses the environment, which is converted into synthetic audio streams that represent objects in the environment. What aspects of the audio signal best encode this information? This paper compares four encodings, focusing on the difficult task of perceiving the simultaneous motion of several objects.

The evaluation is experimental: subjects hear trajectories of objects moving in a virtual 2D plane, encoded as simultaneous audio streams with complex frequency spectra, and identify the represented motions. One encoding uses panning for horizontal motion and pitch for vertical motion (the Pratt effect). A second uses best-fit head related transfer functions (HRTFs) to localize stream positions. The third combines the first two, using pitch to redundantly code elevation in a HRTF presentation. Finally, the fourth enhances the third, using best-fit HRTF to 'vertically pan' each audio stream at constant but unique elevations, for superior audio segregation.

The fourth method outperforms the other three according to two measures, the accuracy of subjects' perceptions, and the number of replays needed to achieve those perceptions. With this method subjects can perceive up to three different simultaneously-presented motions after minimal practice. The results show first that the Pratt effect is a more robust method than HRTF for representing vertical motion, and second, that, combined with the Pratt effect, vertical panning using a HRTF improves motion perception