On the Role of Audio Frontends in Bird Species Recognition

Publication type: 
Houtan Ghaffari & Paul Devos

Ghaffari, H. and Devos, P. (2024), On the role of audio frontends in bird species recognition, Ecological Informatics, https://doi.org/10.1016/j.ecoinf.2024.102573.


Automatic acoustic monitoring of bird populations and their diversity is in demand for conservation planning. This requirement and recent advances in deep learning have inspired sophisticated species recognizers. However, there are still open challenges in creating reliable monitoring systems of natural habitats. One of many open questions is whether predominantly used audio features like mel-filterbanks are appropriate for such analysis since their design follows human’s perception of the sound, making them susceptible to discarding fine details from other animals’ vocalization. Although research shows that different audio features work better for particular tasks and datasets, it is hard to attribute all advantages to input features since the experimental setups vary. A general solution is to design a learnable audio frontend to extract task-relevant features from raw waveform since it contains all the information in other audio features. The current paper thoroughly analyzes the role of such frontends in bird species recognition, which helped to evaluate the adequacy of traditional time-frequency representations (static frontends) in capturing the relevant information from bird vocalization. In particular, this work shows that the main performance gain in learnable audio frontends comes from the normalization and compression operations rather than the data-driven frequency selectivity and functional form of filters. We observed no significant discrepancy between the frequency bands of the learned and static frontends for bird vocalization. Although the performance of learnable frontends was much higher, we will show that adequate normalization and compression enhance the accuracy of traditional frontends by more than 16% to achieve comparable results for bird species recognition. Ablation studies of the frontends under different configurations and detailed analysis of noise robustness provide evidence for the conclusions, validate the use of mel-filterbanks and similar features in prior works, and provide guidelines for designing future species recognizers. The code is available at https://github.com/houtan-ghaffari/bird-frontends.

Year of publication : 
Magazine published in: 
Ecological Informatics