A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very eective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction ids (tids) and automatic pruning of irrelevant data. The main problem with these approaches is when intermediate results of vertical tid lists become too large for memory, thus aecting the algorithm scalability.
In this paper we present a novel vertical data representation called Diset, that only keeps track of dierences in the tids of a candidate pattern from its generating frequent patterns. We show that disets drastically cut down the size of memory required to store intermediate results. We show how disets, when incorporated into previous vertical mining methods, increase the performance signicantly. |