Abstract
In soundscape ecology analysis, the use of acoustic features is well established and offers important baselines to ecological analyses. However, in many cases, the problem is difficult due to high-class overlap in terms of time-frequency characteristics, as well as the presence of noise. Deep neural networks have become state-of-the-art for feature learning in many multi-class applications, but they often present issues such as over-fitting or achieve unbalanced performances for different classes, which can hamper the deployment of such models in realistic scenarios. In the context of counting the number of classes in observations, the quantification task is attracting attention and was shown to be effective in other applications. This paper investigates the use of quantification combined with classification loss in order to train a convolutional neural network to classify species of birds and anurans. Results indicate quantification has advantages over both acoustic features alone and the use of regular classification networks, in particular in terms of generalization and class recall making it a suitable choice for segregation tasks related to soundscape ecology. Moreover, we show that a more compact network can outperform a deeper one for fine-grained scenarios of birds and anurans species.
| Original language | English |
|---|---|
| Pages (from-to) | 1923-1937 |
| Number of pages | 15 |
| Journal | Neural Computing and Applications |
| Volume | 34 |
| Issue number | 3 |
| DOIs | |
| Publication status | Published - Feb 2022 |
Keywords
- Bioacoustics
- Ecoacoustics
- Mel-spectrogram
- Sound detection