Paper Status Tracking
Contact us
[email protected]
Click here to send a message to me 3275638434
Paper Publishing WeChat

Article
Affiliation(s)

Munehiro Nakamura, Ph.D., Department of Information Science, Kanazawa Institute of Technology.
Atsushi Otsuka, B.D., Department of Natural Science and Technology, Kanazawa University.
Haruhiko Kimura, Ph.D., Department of Natural Science and Technology, Kanazawa University.

ABSTRACT

With the arrival of big-data society, methods for classifying real-world problems have attracted much attention for researchers and developers in various fields. In recent years, much effort has been devoted for improving performances of classification algorithms by adding functions or modifying their weaknesses. However, since a large variety of classification algorithms has been available, it is difficult for non-experts to find classification algorithms that achieve good results on a given data set. Therefore, if there is a system which automatically selects the best classification algorithm for a given data set, non-experts would receive various benefits such as saving time and effort. This paper presents a system of predicting the best possible classification algorithm for a given data set with respect to the accuracy. To the best of our knowledge, this is the first approach focused on predicting the best one. The main target users of the proposed system are non-experts who do not have knowledge and experience in data mining. The proposed system utilizes useful meta-features selected from existing meta-features to increase the performance of the prediction. The feature selection is conducted by a wrapper approach with the genetic search algorithm. In the proposed system, K-nearest neighbor algorithm is used to learn the selected meta-features and build a classification model for predicting future data. Experiments using 58 real-world data sets show that the proposed system predicted the best classification algorithm with 60.34% accuracy from the top five in 30 classification algorithms.

KEYWORDS

feature selection, wrapper method, meta-feature, classifier, k-nearest neighbor

Cite this paper

References

 

Alexandros, K., & Melanie, H. (2001). Feature selection for meta-learning. Lecture in Notes in Computer Science, 2035, 222-233.

Bache, K., & Lichman, M. (2013). UCI machine learning repository. Retrieved from http://archive.ics.uci.edu/ml/

David, E. G. (1989). Genetic algorithms in search, optimization and machine learning. Boston, M.A.: Addison-Wesley Longman Publishing Co., Inc..

David, H. W. (1996). The lack of a priori distinctions between learning algorithms. Neural Computing, 8(7), 1341-1390.

Faisal, S., Matthias, R., Christian, K., & Thomas, B. (2010). Pattern recognition engineering. Proceedings from RapidMiner Community Meeting and Conference (RCOMM-10). Dortmund, Germany.

Geoffrey, H., Bernhard, P., Richard, K., Eibe, F., & Mark, H. (2002). Multiclass alternating decision trees. Proceedings from ECML02: The 13th European Conference on Machine Learning (pp. 161-172). London, UK.

Hilan, B., & Alexandros, K. (2001). Estimating the predictive accuracy of a classifier. Lecture Notes in Computer Science, 2167, 25-36.

Joao, G. (2004). Functional trees. Machine Learning, 55(3), 219-250.

Leo, B. (2001). Random forests. Machine Learning, 45(1), 5-32.

Mark, H., Eibe, F., Geoffrey, H., Bernhard, P., Peter, R., & Ian, H. W. (2009). The WEKA data miming software: An update. SIGKDD Explorations, 11(1), 10-18.

Matthias, R., Faisal, S., Markus, G., Thomas, B., & Andreas, D. (2014). Automatic classifier selection for non-experts. Pattern Analysis and Applications, 17(1), 83-96.

Niels, L., Mark, H., & Eibe, F. (2005). Logistic model trees. Machine Learning, 95(1-2), 161-205.

Pavel, B. B., Carlos, S., & Joaquim, P. C. (2003). Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results. Machine Learning, 50(3), 251-277.

Sarah, D. A., Faisal, S., Matthias, R., & Markus, G. (2010). Landmarking for meta-learning using RapidMiner. Proceedings from RapidMiner Community Meeting and Conference (RCOMM-10). Dortmund, Germany.

Shawkat, A., & Kate, A. S. (2006). On learning algorithm selection for classification. Applied Soft Computing, 6(2), 119-138.

Yonghong, P., Peter, A. F., Carlos, S., & Pavel, B. (2002). Improved dataset characterisation for meta-learning. Lecture in Notes in Computer Science, 2534, 193-208.

About | Terms & Conditions | Issue | Privacy | Contact us
Copyright © 2001 - David Publishing Company All rights reserved, www.davidpublisher.com
3 Germay Dr., Unit 4 #4651, Wilmington DE 19804; Tel: 001-302-3943358 Email: [email protected]