TY - JOUR
T1 - Dynamic Recognition of Speakers for Consent Management by Contrastive Embedding Replay
AU - Shahmansoori, Arash
AU - Roedig, Utz
N1 - Publisher Copyright:
© 2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Voice assistants overhear conversations, and a consent management mechanism is required. Consent management can be implemented using speaker recognition. Users that do not give consent enroll their voice, and all their further recordings are discarded. Building speaker recognition-based consent management is challenging as dynamic registration, removal, and reregistration of speakers must be efficiently handled. This work proposes a consent management system addressing the aforementioned challenges. A contrastive-based training is applied to learn the underlying speaker equivariance inductive bias. The contrastive features for buckets of speakers are trained a few steps into each iteration and act as replay buffers. These features are progressively selected using a multi-strided random sampler for classification. Moreover, new methods for dynamic registration using a portion of old utterances, removal, and reregistration of speakers are proposed. The results verify memory efficiency and dynamic capabilities of the proposed methods and outperform the existing approaches from the literature in terms of convergence rate and number of required parameters.
AB - Voice assistants overhear conversations, and a consent management mechanism is required. Consent management can be implemented using speaker recognition. Users that do not give consent enroll their voice, and all their further recordings are discarded. Building speaker recognition-based consent management is challenging as dynamic registration, removal, and reregistration of speakers must be efficiently handled. This work proposes a consent management system addressing the aforementioned challenges. A contrastive-based training is applied to learn the underlying speaker equivariance inductive bias. The contrastive features for buckets of speakers are trained a few steps into each iteration and act as replay buffers. These features are progressively selected using a multi-strided random sampler for classification. Moreover, new methods for dynamic registration using a portion of old utterances, removal, and reregistration of speakers are proposed. The results verify memory efficiency and dynamic capabilities of the proposed methods and outperform the existing approaches from the literature in terms of convergence rate and number of required parameters.
KW - Consent management
KW - contrastive embedding replay
KW - dynamic learning
KW - multi-strided sampling
KW - voice assistant systems
UR - https://www.scopus.com/pages/publications/85174853269
U2 - 10.1109/TNNLS.2023.3317493
DO - 10.1109/TNNLS.2023.3317493
M3 - Article
C2 - 37788192
AN - SCOPUS:85174853269
SN - 2162-237X
VL - 35
SP - 18538
EP - 18552
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 12
ER -