Performance evaluation of breadth-first search on Intel Xeon Phi

Authors

  • E.A. Golovina Research Center for Electronic Computing
  • A.S. Semenov Research Center for Electronic Computing
  • A.S. Frolov Research Center for Electronic Computing

Keywords:

Breadth-First Search, graph algorithms, Intel Xeon Phi

Abstract

Breadth-First Search (BFS) is one of the most important kernels in graph computing. It is the main kernel of the Graph500 rating that evaluates performance of large supercomputers and multiprocessor nodes in terms of traversed edges per second (TEPS). In this paper we present the results of BFS performance evaluation on a recently released high-performance Intel Xeon Phi coprocessor. We examine the previously proposed Queue-based and Read-based approaches to BFS implementation. We also apply several optimization techniques, such as manual loop unrolling and prefetching, that significantly improve performance on Intel Xeon Phi. On a representative graph set, Intel Xeon Phi 7120P demonstrates 78% maximum and 37% average speedup as compared to the Intel Xeon E5-2660 processor. We achieve 4366 MTEPS on Intel Xeon Phi 7120P for a graph with scale 25, and have 89th place on the November 2013 Graph500 list. This is the fourth place among research teams in the class of single node x86-based systems. The authors would like to thank the Svet Computers company for the provided IntellectDigital SciPhi 470 desktop supercomputer with Intel Xeon Phi 7120P coprocessor.

Author Biographies

E.A. Golovina

A.S. Semenov

A.S. Frolov

References

  1. Graph500 benchmark (http://www.graph500.org/).
  2. Головина Е.А., Семенов А.С., Фролов А.С. Исследование производительности задачи поиска вширь в графе на сопроцессоре Intel Xeon Phi // Тр. Межд. суперкомпьютерной конференции «Научный сервис в сети Интернет: все грани параллелизма», 23-28 сентября 2013 г., г. Новороссийск. М.: Изд-во Моск. гос. ун-та, 2013. 135-141.
  3. Agarwal V., Petrini F., Pasetto D., Bader D. Scalable graph exploration on multicore processors // Proc. 2010 ACM/IEEE International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’10). New York: IEEE Press, 2010. 1-11.
  4. Saule E., Catalyeurek U. An early evaluation of the scalability of graph algorithms on the Intel Phi architecture // IEEE 26th International Parallel and Distributed Processing Symposium Workshops &; PhD Forum (IPDPSW’12). New York: IEEE Press, 2012. 1629-1639.
  5. Hong S., Oguntebi T., Olukotun K. Efficient parallel graph exploration on multi-core CPU and GPU // Proc. Int. Conf. on Parallel Architectures and Compilation Techniques (PACT’11). New York: IEEE Press, 2011. 78-88.
  6. Beamer S., Asanoviacutec K., Patterson D. Direction-optimizing breadth-first search // Proc. Int. Conf. on High Performance Computing, Networking, Storage and Analysis (SC’12). New York: IEEE Press, 2012. 1-10.
  7. Saule E., Kaya K., Catalyurek U. Performance evaluation of sparse matrix multiplication kernels on Intel Xeon Phi // arXiv:1302.1078, 5 Feb 2013 (http://arxiv.org/pdf/1302.1078.pdf).
  8. Frolov A., Gilmendinov M. DISBench: benchmark for memory performance evaluation of multicore multiprocessor // Lecture Notes in Computer Science. Vol. 7979. Heidelberg: Springer, 2013. 197-207.
  9. Cuthill E., McKee J. Reducing the bandwidth of sparse symmetric matrices // Proc. 24th National Conference (ACM’69). New York: ACM Press, 1969. 157-172.
  10. Chakrabarti D., Zhan Y., Faloutsos C. R-MAT: a recursive model for graph mining // SIAM International Conference on Data Mining. Vol. 6. Philadelphia: SIAM, 2004. 442-446.
  11. Bader D., Madduri K. Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessor // Lecture Notes in Computer Science. Vol. 3769. Heidelberg: Springer, 2005. 465-476.
  12. Bader D., Madduri K. SNAP: Small-world network analysis and partitioning: an open-source parallel graph framework for the exploration of large-scale networks // Proc. Int. Parallel and Distributed Processing Symposium (IPDPS). New York: IEEE Press, 2008. 1-12.
  13. Bader D., Madduri K. GTGraph: a synthetic graph generator suite. 2006 // (http://www.cse.psu.edu/ madduri/software/GTgraph).
  14. Компания «Cвет Компьютерс» (http://svetcorp.net).

Published

29-01-2014

How to Cite

Головина Е., Семенов А., Фролов А. Performance Evaluation of Breadth-First Search on Intel Xeon Phi // Numerical Methods and Programming (Vychislitel’nye Metody i Programmirovanie). 2014. 15. 49-58

Issue

Section

Section 1. Numerical methods and applications