GPU hardware and CUDA architecture provide a powerful platform to develop parallel algorithms. Implementation of heuristic and metaheuristic algorithms on GPUs are limited in literature. Nowadays developing parallel algorithms on GPU becomes very important. In this paper, NP-Hard Quadratic Assignment Problem (QAP) that is one of the combinatorial optimization problems is discussed. Parallel Multistart Simulated Annealing (PMSA) method is developed with CUDA architecture to solve QAP. An efficient method is developed by providing multistart technique and cooperation between threads. The cooperation is occurred with threads in both the same and different blocks. This paper focuses on both acceleration and quality of solutions. Computational experiments conducted on many Quadratic Assignment Problem Library (QAPLIB) instances. The experimental results show that PMSA runs up to 29x faster than a single-core CPU and acquires best known solution in a short time in many benchmark datasets. (C) 2018 Karabuk University. Publishing services by Elsevier B.V.