TY - CHAP
T1 - Active learning with visualization for text data
AU - Huang, Lulu
AU - Matwin, Stan
AU - De Carvalho, Eder J.
AU - Minghim, Rosane
PY - 2017/3/13
Y1 - 2017/3/13
N2 - Labeled datasets are always limited, and oftentimes the quantity of labeled data is a bottleneck for data analytics. This especially affects supervised machine learning methods, which require labels for models to learn from the labeled data. Active learning algorithms have been proposed to help achieve good analytic models with limited labeling efforts, by determining which additional instance labels will be most beneficial for learning for a given model. Active learning is consistent with interactive analytics as it proceeds in a cycle in which the unlabeled data is automatically explored. However, in active learning users have no control of the instances to be labeled, and for text data, the annotation interface is usually document only. Both of these constraints seem to affect the performance of an active learning model. We hypothesize that visualization techniques, particularly interactive ones, will help to address these constraints. In this paper, we implement a pilot study of visualization in active learning for text classification, with an interactive labeling interface. We compare the results of three experiments. Early results indicate that visualization improves high-performance machine learning model building with an active learning algorithm. Copyright is held by the owner/author(s). Publication rights licensed to ACM.
AB - Labeled datasets are always limited, and oftentimes the quantity of labeled data is a bottleneck for data analytics. This especially affects supervised machine learning methods, which require labels for models to learn from the labeled data. Active learning algorithms have been proposed to help achieve good analytic models with limited labeling efforts, by determining which additional instance labels will be most beneficial for learning for a given model. Active learning is consistent with interactive analytics as it proceeds in a cycle in which the unlabeled data is automatically explored. However, in active learning users have no control of the instances to be labeled, and for text data, the annotation interface is usually document only. Both of these constraints seem to affect the performance of an active learning model. We hypothesize that visualization techniques, particularly interactive ones, will help to address these constraints. In this paper, we implement a pilot study of visualization in active learning for text classification, with an interactive labeling interface. We compare the results of three experiments. Early results indicate that visualization improves high-performance machine learning model building with an active learning algorithm. Copyright is held by the owner/author(s). Publication rights licensed to ACM.
KW - Active learning
KW - Text classification
KW - Visualization
UR - https://www.scopus.com/pages/publications/85016940229
U2 - 10.1145/3038462.3038469
DO - 10.1145/3038462.3038469
M3 - Chapter
AN - SCOPUS:85016940229
T3 - ESIDA 2017 - Proceedings of the 2017 ACM Workshop on Exploratory Search and Interactive Data Analytics, co-located with IUI 2017
SP - 69
EP - 74
BT - ESIDA 2017 - Proceedings of the 2017 ACM Workshop on Exploratory Search and Interactive Data Analytics, co-located with IUI 2017
PB - Association for Computing Machinery, Inc
T2 - ACM Workshop on Exploratory Search and Interactive Data Analytics, ESIDA 2017
Y2 - 13 March 2017
ER -