TY - GEN
T1 - Representative itemset mining
AU - Huang, Hong
AU - O'Sullivan, Barry
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2017/1/11
Y1 - 2017/1/11
N2 - Frequent itemset mining is one of the most common of data mining tasks. In its simplest form one is given a table of data in which the columns represent attributes and each row specifies a value for each attribute, each attributevalue pair being referred to as an item. The task is to find sets of these items that occur frequently in the data, where frequency is specified as a minimum occurrence threshold. Such frequent sets of items are referred to as "frequent itemsets". Many efficient techniques have been developed for finding all frequent itemsets. However, a practical problem is that the results sets can be exponentially large in the number of items. In this paper we propose representative frequent itemset mining in which the set of itemsets returned provide examples of the space of all possible frequent itemsets. Specifically, every item that appears in a frequent itemset at least once is shown in at least one representative itemset. If there are frequent itemsets without a particular item, one such example will be presented. One can generalise our framework to seek representative sets in which pairs, triples, etc. of frequent itemsets are presented. One can see the representative frequent itemset framework as a generalisation of traditional frequent itemset mining that provides an additional parameter for controlling the size of the result set. Specifically, one has access to the traditional frequency threshold, but also the maximum arity of the tuples of itemsets being exemplified. We propose a dedicated algorithm that significantly outperforms using a state-of-The-Art itemset miner in generating representative itemsets.
AB - Frequent itemset mining is one of the most common of data mining tasks. In its simplest form one is given a table of data in which the columns represent attributes and each row specifies a value for each attribute, each attributevalue pair being referred to as an item. The task is to find sets of these items that occur frequently in the data, where frequency is specified as a minimum occurrence threshold. Such frequent sets of items are referred to as "frequent itemsets". Many efficient techniques have been developed for finding all frequent itemsets. However, a practical problem is that the results sets can be exponentially large in the number of items. In this paper we propose representative frequent itemset mining in which the set of itemsets returned provide examples of the space of all possible frequent itemsets. Specifically, every item that appears in a frequent itemset at least once is shown in at least one representative itemset. If there are frequent itemsets without a particular item, one such example will be presented. One can generalise our framework to seek representative sets in which pairs, triples, etc. of frequent itemsets are presented. One can see the representative frequent itemset framework as a generalisation of traditional frequent itemset mining that provides an additional parameter for controlling the size of the result set. Specifically, one has access to the traditional frequency threshold, but also the maximum arity of the tuples of itemsets being exemplified. We propose a dedicated algorithm that significantly outperforms using a state-of-The-Art itemset miner in generating representative itemsets.
UR - https://www.scopus.com/pages/publications/85013649867
U2 - 10.1109/ICTAI.2016.28
DO - 10.1109/ICTAI.2016.28
M3 - Conference proceeding
AN - SCOPUS:85013649867
T3 - Proceedings - 2016 IEEE 28th International Conference on Tools with Artificial Intelligence, ICTAI 2016
SP - 142
EP - 148
BT - Proceedings - 2016 IEEE 28th International Conference on Tools with Artificial Intelligence, ICTAI 2016
A2 - Esposito, Anna
A2 - Alamaniotis, Miltos
A2 - Mali, Amol
A2 - Bourbakis, Nikolaos
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 28th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2016
Y2 - 6 November 2016 through 8 November 2016
ER -