Authors
Dragi Kocev,
Saso Dzeroski,
Jan Struyf,
Dunja Mladenic,
Dunja Mladenic,
Publication date
2006
Publisher
Total citations
Description
We investigate how inductive databases (IDBs) can support global models, such as decision trees. We focus on predictive clustering trees (PCTs). PCTs generalize decision trees and can be used for prediction and clustering, two of the most common data mining tasks. Regular PCT induction builds PCTs top-down, using a greedy algorithm, similar to that of C4. 5. We propose a new induction algorithm for PCTs based on beam-search. This has three advantages over the regular method:(1) it returns a set of PCTs satisfying the user constraints instead of just one PCT;(2) it better allows for pushing of user constraints into the induction algorithm; and (3) it is less susceptible to myopia. In addition, we propose similarity constraints for PCTs, which improve the diversity of the resulting PCT set.