Parallel implementation of a gradient boosted trees algorithm

Authors

  • P.N. Druzhkov Lobachevsky State University of Nizhni Novgorod

Keywords:

decision tree, gradient boosting, parallel computation, MPI, distributed memory PDF (in Russian) (211KB) PDF. zip (in Russian) (177KB)

Abstract

The software implementation of a parallel gradient boosted trees algorithm that requires a distributed data storage and is intended mostly to large machine learning tasks is described. Computational experimental results are given to show an advantage in the performance and scalability of the proposed implementation over some other open-source implementations while using large datasets. Experimental quality evaluations are given also to show the competitiveness of the proposed implementation. The paper was recommended for publication by the Program Committee of the HPC-2012 Forum (http://agora.guru.ru/hpc2012).

Author Biography

P.N. Druzhkov

References

  1. Panda B., Herbach J.S., Basu S., Bayardo R.J. PLANET: massively parallel learning of tree ensembles with MapReduce // Proc. of the 35th International Conference on Very Large Data Base (VLDB). New York: ACM Press, 2009. 1426-1437.
  2. Dollar P., Wojek C., Schiele B., Perona P. Pedestrian detection: a benchmark // Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPRТ09). New York: IEEE Press, 2009. 304-311.
  3. Friedman J. Greedy function approximation: the gradient boosting machine // Annals of Statistics. 2001. 29, N 5. 1189-1232.
  4. Hastie T., Tibshirani R., Friedman J. The elements of statistical learning: data mining, inference, and prediction. New York: Springer, 2009.
  5. Breiman L., Friedman J.H., Olshen R.A., Stone C.J. Classification and regression trees. Belmont: Wadsworth, 1984.
  6. Tyree S., Weinberger K.Q., Agrawal K., Paykin J. Parallel boosted regression trees for web search ranking // Proc. of the 20th Int. Conf. on World Wide Web (WWW). New York: ACM Press, 2011. 387-396.
  7. MPIGBT. Параллельная реализация алгоритма обучения градиентного бустинга деревьев решений на распределенную память (http://ml.vmk.unn.ru/index.php/ru/resources-ru).
  8. pGBRT: Parallel Gradient Boosted Regression Trees (http://machinelearning.wustl.edu/pmwiki.php/Main/Pgbrt).
  9. Druzhkov P.N., Eruhimov V.L., Kozinov E.A., Kustikova V.D., Meyerov I.B., Polovinkin A.N., Zolotykh N.Yu. On some new object detection features in OpenCV Library // Pattern Recognition and Image Analysis. 2011. 21, N 2. 377-379.
  10. Дружков П.Н., Золотых Н.Ю., Половинкин А.Н. Программная реализация алгоритма градиентного бустинга деревьев решений // Вестн. Нижегородского гос. ун-та им. Н.И. Лобачевского. 2011. № 1. 193-200.
  11. Дружков П.Н., Золотых Н.Ю., Половинкин А.Н. Реализация параллельного алгоритма предсказания в методе градиентного бустинга деревьев решений // Вестн. Южно-Уральского гос. ун-та. Серия: Математическое моделирование и программирование. 2011. № 37. 82-89.
  12. Chapelle O., Chang Y. Yahoo! learning to rank challenge overview // J. of Machine Learning Research. 2011. N 14. 1-24.
  13. Microsoft Learning to Rank Datasets (http://research.microsoft.com/en-us/projects/mslr).
  14. Воеводин Вл.В., Жуматий С.А., Соболев С.И., Антонов А.С., Брызгалов П.А., Никитенко Д.А., Стефанов К.С., Воеводин Вад.В. Практика суперкомпьютера «Ломоносов» // Открытые системы. 2012. № 7. 36-39.
  15. Busa-Fekete R., Szarvas Gy., Elteto T., Kegl B. An apple-to-apple comparison of learning-to-rank algorithms in terms of normalized discounted cumulative gain // Proc. of ECAI-12 Workshop on Preference Learning: Problems and Applications in AI. 2012

Published

20-11-2013

How to Cite

Дружков П. Parallel Implementation of a Gradient Boosted Trees Algorithm // Numerical Methods and Programming (Vychislitel’nye Metody i Programmirovanie). 2013. 14. 109-114

Issue

Section

Section 2. Programming