Paper Status Tracking

Journals by Title

Journals by Subject

Contact us

	[email protected]
	3275638434

	Paper Publishing WeChat

Useful Links

Creative Commons Attribution-NonCommercial 4.0 International License

Article

Open Access

Multi-valued Document Classification Based on Generalized Bradley-Terry Classifiers Utilizing Accuracy Information

Author(s)

Tairiku Ogihara, Kenta Mikawa, Masayuki Goto, Gou Hosoya

Full-Text PDF Download XML 5884 Views

DOI:10.17265/1537-1514/2013.09.007

Affiliation(s)

Tairiku Ogihara, B.E. degree, Department of Industrial and Management Science, Faculty of Science and Engineering, Waseda University.
Kenta Mikawa, M.E. degree, Department of Industrial and Management Science, Faculty of Science and Engineering, Waseda University.
Masayuki Goto, Dr.E. degree, Department of Industrial and Management Science, Faculty of Science and Engineering, Waseda University.
Gou Hosoya, Dr.E. degree, Department of Management Science, Faculty of Engineering, Tokyo University of Science.

ABSTRACT

Due to the development of computer network, a large amount of documents are treated in many fields. The number of digital document data stored in databases is enormous, accordingly it is difficult for analysts to read all documents and classify it by hand. Therefore, it is necessary to develop the technology of automatic document classification by using computers these days. From the above needs, many classifiers with good performance have been proposed, i.e., Relevance Vector Machine (RVM) and Support Vector Machine (SVM) that are known as good binary classifiers. For multi-valued document classification problems, it is known that a multi-valued classifier by combining several binary classifiers has a good performance. In this study, the method to construct an efficient combination of binary classifiers based on improving Generalized Bradley-Terry (GBT) model, which has high extensibility, is focused. This model is an expansion of Bradley-Terry (BT) model. Though the BT model has a limitation on combination of classes, the GBT model enables us to utilize any binary classifier which classifies into two arbitrary subsets in the class set. Generally, when several binary classifiers learn from the training dataset, there would be the difference of accuracy between these binary classifiers, due to the existence of categories that cannot be easily classified. However, the conventional method of multi-valued classification by GBT binary classifiers does not take the accuracy of each classifier into consideration. To avoid this problem, a new way of multi-valued classification method by considering each classifier’s accuracy is proposed. The purpose of this study is to construct a good multi-valued classifier by calculating the accuracy of each classifier and utilizing it as the weight. In order to verify the effectiveness of the proposed method, the simulation experiment by using newspaper articles is conducted.

KEYWORDS

Generalized Bradley-Terry (GBT) model, multi-valued classification, Relevance Vector Machine (RVM), document classification, the accuracy of each classifier, combination of binary classifiers

Cite this paper

References

Allwein, E. L., Schapire, R. E., & Singer, Y. (2000). Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1, 113-141.

Bishop, C. M. (2006). Pattern recognition and machine learning (pp. 56-67). New York: Springer-Verlag.

Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs: The method of paired comparisons. Biometrika, 39, 324-345.

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Journal of Machine Learning Research, 20, 273-297.

Crammer, K., & Singer, Y. (2002). On the learnability and design of output codes for multiclass problems. Machine Learning, 47, 201-233.

Dietterich, T. G., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263-286.

Hastie, T., & Tibshirani, R. (1998). Classification by pairwise coupling. The Annals of Statistics, 26, 451-471.

Huang, T. K., Weng, R. C., & Lin, C. J. (2006). Generalized bradley-terry models and multi-class probability estimates. Journal of Machine Learning Research, 7, 85-115.

Ikeda, S. (2010). Combining binary machines for multiclass: Statistical model and parameter estimation. The Institute of Statistical Mathematics, 58, 157-166.

Lee, Y., Lin, Y., & Wahba, G. G. (2001). Multicategory support vector machines. Technical Report 1040, Department of Statistics, University of Madison, Wisconsin.

Quinlan, J. R. (1993). Programs for machine learning. San Mateo, C.A.: Morgan Kaufmann.

Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing explorations in the microstructure of cognition. Cambridge, M.A.: MIT Press.

Silva, C., & Ribeiro, B. (2006). Scaling text classification with relevance vector machines (pp. 4186-4191). Proceedings from SMC2006: IEEE International Conference on Systems, Man, and Cybernetics.

Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211-244.