Ramanujan Bipartite Graph Products for Efficient Block Sparse Neural Networks

Dharma Teja Vooturi, Girish Varma, Kishore Kothapalli

Concurrency & Computation: Practice & Experience, 2021

April, 2021

arXiv pdf

Bibtex

@article{Vooturi21, author = {Vooturi, Dharma Teja and Varma, Girish and Kothapalli, Kishore}, title = {Ramanujan bipartite graph products for efficient block sparse neural networks}, journal = {Concurrency and Computation: Practice and Experience}, volume = {n/a}, number = {n/a}, pages = {e6363}, keywords = {block sparsity, graph product, Ramanujan bipartite graph, sparse neural networks}, doi = {https://doi.org/10.1002/cpe.6363}, url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.6363}, eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.6363} }

Abstract

Sparse neural networks are shown to give accurate predictions competitive to denser versions, while also minimizing the number of arithmetic operations performed. However current GPU hardware can only exploit structured sparsity patterns for better efficiency. We propose a framework for generating structured multi-level block sparse neural networks by using the theory of graph products. Our Ramanujan Bipartite Graph Product (RBGP) framework uses products of Ramanujan graphs to obtain the best connectivity for a given level of sparsity. This essentially ensures that the i.) the networks has the structured block sparsity for which runtime efficient algorithms exists ii.) the model gives high prediction accuracy, due to the better expressive power derived from the connectivity of the graph iii.) the graph data structure has a succinct representation that can be stored efficiently in memory. We use our framework to design a specific connectivity pattern called RBGP4 which makes efficient use of the memory hierarchy available on GPU. We benchmark our approach on image classification and machine translation tasks with an edge (Jetson Nano 2GB) as well as server (V100) GPUs. When compared to commonly used sparsity patterns like unstructured and block, we obtain significant speedups while achieving the same level of accuracy.