In recent years, parallel processing has been widely used in the computer industry. Software developers, have to deal with parallel computing platforms and technologies to provide novel and rich experiences. We present a novel algorithm to solve dense linear systems using Compute Unified Device Architecture (CUDA). High-level linear algebra operations require intensive computation. In this study Graphics Processing Units (GPU) accelerated implementation of LU linear algebra routine is implemented. LU decomposition is a decomposition of the form A=LU where A is a square matrix. The main idea of the LU decomposition is to record the steps used in Gaussian elimination on A in the places where the zero is produced. L and U are lower and upper triangular matrices respectively. This means that L has only zeros above the diagonal and U has only zeros below the diagonal. We have worked to increase performance with proper data representation and reducing row operations on GPU. Because of the high arithmetic throughput of GPUs, initial results from experiments promised a bright future for GPU computing. It has been shown useful for scientific computations. GPUs have high memory bandwidth and more floating point units as compared to the CPU. We have tried our study on different systems that have different GPUs and CPUs. The computation studies were also evaluated for different linear systems. When we compared the results obtained from both systems, a better performance was obtained with GPU computing. According to results, GPU computation approximately worked 3 times faster than the CPU computation. Our implementation provides significant performance improvement so we can easily use it to solve dense linear system. (C) 2011 Published by Elsevier Ltd.