TY - CHAP
T1 - Detection of Wake Word Jamming
AU - Sagi, Prathyusha
AU - Sankar, Arun
AU - Roedig, Utz
N1 - Publisher Copyright:
© 2024 Owner/Author.
PY - 2024/11/22
Y1 - 2024/11/22
N2 - Personal Voice Assistants (PVAs) such as Apple's Siri, Amazon's Alexa and Google Home continuously monitor the acoustic environment for a wake word to start interaction with the user. However, the wake word detection is susceptible to disruptions caused by acoustic interference. Interference might be by noise (e.g. background music, chatter, engine sounds) or a targeted jamming signal designed to disrupt PVA operations. As PVAs are increasingly used for critical applications such as medicine and the military, it is necessary to identify an attack. In this work, we re-design the wake word detection algorithm such that it is not only robust against an attack but is also able to identify an attack. Only if it is possible to identify an ongoing attack it is possible to employ appropriate countermeasures, i.e. remove the attacker. We modify the wake word detection model to function as a three-class classifier that accurately differentiates between clean wake words, wake words mixed with jamming signals and non-wake words. We further improve the classification results by examining the Direction of Arrival (DOA) and the Short Time Energy (STE ) of the audio signal. DOA and STE information is usually available on off-the-shelf PVA which enables implementation of the proposed methods on existing devices.
AB - Personal Voice Assistants (PVAs) such as Apple's Siri, Amazon's Alexa and Google Home continuously monitor the acoustic environment for a wake word to start interaction with the user. However, the wake word detection is susceptible to disruptions caused by acoustic interference. Interference might be by noise (e.g. background music, chatter, engine sounds) or a targeted jamming signal designed to disrupt PVA operations. As PVAs are increasingly used for critical applications such as medicine and the military, it is necessary to identify an attack. In this work, we re-design the wake word detection algorithm such that it is not only robust against an attack but is also able to identify an attack. Only if it is possible to identify an ongoing attack it is possible to employ appropriate countermeasures, i.e. remove the attacker. We modify the wake word detection model to function as a three-class classifier that accurately differentiates between clean wake words, wake words mixed with jamming signals and non-wake words. We further improve the classification results by examining the Direction of Arrival (DOA) and the Short Time Energy (STE ) of the audio signal. DOA and STE information is usually available on off-the-shelf PVA which enables implementation of the proposed methods on existing devices.
KW - acoustic denial of service (dos)
KW - acoustic jamming
KW - adversarial training
KW - automatic speech recognition (asr)
KW - direction of arrival (doa)
KW - personal voice assistant (pva)
KW - short time energy (ste)
KW - wake word recognition
UR - https://www.scopus.com/pages/publications/85215520992
U2 - 10.1145/3690134.3694825
DO - 10.1145/3690134.3694825
M3 - Chapter
AN - SCOPUS:85215520992
T3 - CPSIoTSec 2024 - Proceedings of the 6th Workshop on CPS and IoT Security and Privacy, Co-Located with: CCS 2024
SP - 134
EP - 141
BT - CPSIoTSec 2024 - Proceedings of the 6th Workshop on CPS and IoT Security and Privacy, Co-Located with
PB - Association for Computing Machinery, Inc
T2 - 6th Workshop on CPS and IoT Security and Privacy, CPSIoTSec 2024
Y2 - 14 October 2024 through 18 October 2024
ER -