Workload balancing in GPU implementation of breadth-first search

Authors

  • M.A. Chernoskutov N.N. Krasovskii Institute of Mathematics and Mechanics of UB RAS (IMM UB RAS)
  • D.G. Ermakov N.N. Krasovskii Institute of Mathematics and Mechanics of UB RAS (IMM UB RAS)

Keywords:

breadth-first search, parallel algorithm, graphics processing units PDF (in Russian) (315KB) PDF. zip (in Russian) (275KB)

Abstract

Parallel processing of unstructured data пшмут in a graph-like form can be a severe computational challenge because of significant overheads caused by the irregular nature of graph algorithms and the hardware latency of intensive data access. The GPU implementation of the load balancing method that allows one to dramatically improve the parallel breadth-first search algorithm compared to its sequential analog on CPU is considered. This work was partially supported by the Russian Foundation for Basic Research (project 14–07–00435) and by UB RAS (projects 12–P–1–1029 and RCP–13–P18). Numerical experiments were performed using the «Uran» supercomputer installed at IMM UB RAS. This paper was recommended for publication by the Program Committee of the International Scientific Conference «Scientific Services and Internet: all bounds of parallelism» ((http://agora.guru.ru/abrau2013)).

Author Biographies

M.A. Chernoskutov

D.G. Ermakov

References

  1. Cormen T.H., Leiserson C.E., Rivest R.L., Stein C. Introduction to algorithms. Cambridge: MIT Press, 2001.
  2. Merril D., Garland M., Grimshaw A. High performance and scalable GPU graph traversal. Technical Report CS-2011-05. Charlottesville: University of Virginia, 2011.
  3. Luo L., Wong M., Hwu W. An effective GPU implementation of breadth-first search // Proc. of the 47th Design Automation Conference. New York: ACM Press, 2010. 52-55.
  4. Harish P., Narayanan P.J. Accelerating large graph algorithms on the GPU using CUDA // Proc. of the 14th International Conference on High Performance Computing. Berlin: Springer, 2007. 197-208.
  5. Hong S., Kim S.K., Oguntebi T., Olukotun K. Accelerating CUDA graph algorithms at maximum warp // Proc. of the 16th ACM Symposium on Principles and Practice of Parallel Programming. New York: ACM Press, 2011. 267-276.
  6. 10th DIMACS Implementation Challenge (http://www.cc.gatech.edu/dimacs10/index.shtml).
  7. Leskovec J., Chakrabarti D., Kleinberg J.M., Faloutsos C. Realistic, mathematically tractable graph generation and evolution using Kronecker multiplication // Proc. of the Conference on Principles and Practice of Knowledge Discovery in Databases. Porto, 2005. 133-145.
  8. Murphy R., Wheeler K., Barrett B., Ang J. Introducing the Graph 500. Cray User’s Group (CUG). Albuquerque: Sandia National Laboratory, 2010.

Published

17-09-2013

How to Cite

Черноскутов М., Ермаков Д. Workload Balancing in GPU Implementation of Breadth-First Search // Numerical Methods and Programming (Vychislitel’nye Metody i Programmirovanie). 2013. 14. 54-62

Issue

Section

Section 2. Programming