Abstract
Research on habitat monitoring via passive acoustics has generated vast audio resources for soundscape ecology, calling for automated methods to aid data analysis. While Deep Neural Networks excel in classification tasks, their application to audio collected in the wild presents several challenges compared to other audio sources. Nature recordings present ambient noise, sparsity of targeted events, various vocalizations attributed to the same species, and fine-grained sound variance. In addition to sound characterization, we lack annotated datasets of suitable size to train networks accurately for detecting and identifying animal species. To leverage the best from these models, this work investigates different audio input representations, particularly spectrogram-based and acoustic indices, which are pre-processed features extracted from audio sources. We evaluate the impact of combining both input categories, often treated separately, in various architectures, employing quantification in the training process as well as transfer learning. With that, we propose guidelines for using neural networks to classify species based on their sound patterns, even for a small dataset. We have evaluated these guidelines with a dataset collected in Brazil under different environmental conditions and a dataset for detecting and classifying acoustic scenes and events. The empirical results ratify that the pre-trained network learns better (accuracy up to 0.91); that using acoustic features can improve the results marginally (up to 13 percentage points of difference) depending on the time-frequency input and main architecture; and that combining spectrogram representations with acoustic features yields the best results (accuracy up to 0.91).
| Original language | English |
|---|---|
| Article number | 103232 |
| Journal | Ecological Informatics |
| Volume | 90 |
| DOIs | |
| Publication status | Published - Dec 2025 |
Keywords
- Convolutional neural networks
- Feature fusion
- Sound event detection
Fingerprint
Dive into the research topics of 'Enhancing sound-based classification of birds and anurans with spectrogram representations and acoustic indices in neural network architectures'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver