Index Dependent Nested Loops Parallelization with an Even Distributed Number of Steps
DOI:
https://doi.org/10.31449/inf.v45i4.3130Abstract
Parallel processing of algorithms is an effective way to achieve higher performance on multiprocessor systems rather. During parallelization, it is critical to minimize the difference between the processing time for threads. It is necessary to choose a method that can efficiently distribute the workload evenly across the threads. This paper deals with a special kind of nested loops where the internal loop iterator depends on the outer loop iterator. In such cases, the process can be represented as an upper (or lower) triangular matrix. This paper introduces a method for partitioning the outer loop according to the indices in an almost optimal manner, so that the partial loops in each thread will take nearly the same number of steps. In addition, we examine the potential of a perfect partition and try to determine the maximum (but still meaningful) partition size.References
T. Schmidt, J. Stoye, Quadratic time algorithms for finding common intervals in two and more sequences, Proc of CPM 2004 LNCS 3109(2004) 347–358.doi:10.1007/978-3-540-27801-6_26.
C. Lengauer, Loop parallelization in the polytope model, CONCUR’93 (1993) 398–416.
K. Bondalapati, Parallelizing dsp nested loops on reconfigurable architectures using data context switching, Proceedings of the 38th DesignAutomation Conference (IEEE Cat. No.01CH37232) (2001).doi:10.1145/378239.378483.
J. Ramanujam, Optimal software pipelining of nested loops, in: Proceedings of 8th International Parallel Processing Symposium, 1994, pp.335–342.doi:10.1109/IPPS.1994.288280.
S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, P. Sadayappan, Effective automatic parallelization of stencilcomputations, Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (2007) 235–244doi:10.1145/1250734.1250761.
B. J. Gaska, N. Jothi, M. S. Mohammadi, K. Volk, Handling nested parallelism and extreme load imbalance in an orbital analysis code,arXiv:1707.09668 (2017).
B. Bylina, J. Bylina, Strategies of parallelizing nested loops on the multicore architectures on the example of the wz factorization for thedense matrices, Proceedings of the Federated Conference on Computer Science and Information Systems (2015) 629–639doi:10.15439/2015F354.
A. Kejariwal, P. D’Alberto, A. Nicolau, C. D. Polychronopoulos, A geometric approach for partitioning n-dimensional non-rectangular iterationspaces, in: Languages and Compilers for High Performance Computing, Springer Berlin Heidelberg, Berlin, Heidelberg, 2005, pp. 102–116.
V. Beletskyy, M. P. Parallelizing, Perfectly nested loops with non-uniform dependences (2002).
J. Liu, J. Wickerson, G. A. Constantinides, Loop splitting for efficient pipelining in high-level synthesis, Proceedings of the Federated Con-ference on Computer Science and Information Systems (2016).doi:10.1109/FCCM.2016.27.
S. D. Polychronopoulos, Loop coalescing: a compiler transformation for parallel machines, University of Illinois at Urbana-Champaign, Centerfor Supercomputing Research and Development, Technical Report CSRD-635 (1987).
M. K. Philippe Clauss, Ervin Altıntas, Automatic collapsing of non-rectangular loops, Parallel and Distributed Processing Symposium(IPDPS), Orlando, United States (2017) 778–787.
N. Kafri, J. A. Sbeih, Simple optimal partitioning approch to perfect tringular iteration space, Proceedings of the 2008 High Performance,Computing & Simulation Conference©ECMS, Waleed W. Smari (Ed.), ISBN: 978-0-9553018-7-2 / ISBN: 978-0-9553018-6-5 (CD) (2008)124–131.
R. Sakellariou, A compile-time partitioning strategy for non-rectangular loop nests, Proceedings of the 11th International Parallel ProcessingSymposium, IPPS’97 (Geneva), IEEE Computer Society Press (1997) 633–637.
A. Jackson, O. Agathokleous, Dynamic loop parallelisation, arXiv:1205.2367 (2012).
T. H. Tzen, L. M. Ni, Dependence uniformization: a loop parallelization technique, IEEE Transactions on Parallel and Distributed Systems4 (5) (1993) 547–558.doi:10.1109/71.224217.
B. Bylina, J. Bylina, Parallelizing nested loops on the intel xeon phi on the example of the dense wz factorization, Proceedings of the FederatedConference on Computer Science and Information Systems 8 (2016) 655–664.doi:10.15439/2016F436.
M. Wolf, M. Lam, A loop transformation theory and an algorithm to maximize parallelism, IEEE Transactions on Parallel and DistributedSystems 2 (4) (1991) 452–471.doi:10.1109/71.97902.
H. Blinn, How to solve a quadratic equation?, IEEE Computer Graphics and Applications (Volume: 25 , Issue: 6) (2005)
Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika