This project came out of an interest in developing bird monitoring devices which could be set out in the field in a similar manner to trail cameras, but instead of recording images they would record sound. Physically, devices like this already exist, but at the time we started this there was not anything commercially available to accurately analyze the recorded data algorithmically. Ultimately we were interested in classifying such recordings in two ways: what (in particular what birds) are making sounds, and where was the sound being made. This part of the project is an attempt at the latter classification.
The sound data I am working with was created by a friend. He set two microphones one meter apart from each other, then moved along a grid playing a recorded robin call at each grid point. I do not currently have the true classifications of the location of each call, but knowing the general setup have been attempting to see if I can find techniques that result in plausible locations.
So far I have spent most of my time on this project trying to develop a sense of how the signals change as the location of the calls vary. My first attempts were based on the idea that the sound would be reaching the two microphones at different times and so if I compared the correlation between the signals and varied the offset of one of the channels, there should be a higher correlation at the offset corresponding to the difference in distances between the microphones and the sound source.
Below are charts showing the results of the process on sounds from several different grid points. I varied the offsets through a range that should encompass the maximum possible range of offsets given that the microphones were 1 meter apart. I am not sure yet how this data should be interpreted. I believe repeating this process with a new set of recordings should help show what patterns may be useful. For instance: Is the shape of how the correlations change with offsets useful? Why does the correlation vary with that frequency? Does the peak correlation correspond accurately to the actual distance offset of the sound with respect to the microphones?
While trying to determine whether this approach would be useful, I was curious to see how the shift in offsets would effect how I experienced the sounds. While listening to the unprocessed recording, I was able to hear changes in the location of the sound as it moved through the different grid points. What would happen if I took the left and right channels and offset them programmatically? It turns out that by shifting the offsets and re-playing, I was unable to detect much of a difference. However, I repeated the experiment but adjusted the volume of the channels without adjusting the offset. With adjusted volumes I did perceive a difference in where it seemed like the sound originated. This suggests that comparing relative volume of the channels may be a good alternate strategy for locating the sound position.