Principal Component Analysis (PCA) is an algorithm involving heavy mathematical operations with matrices. The data extracted from the face images are usually very large and to process this data is time consuming. To reduce the execution time of these operations, parallel programming techniques are used. CUDA is a multipurpose parallel programming architecture supported by graphics cards. In this study we have implemented the PCA algorithm using both the classical programming approach and CUDA based implementation using different configurations. The algorithm is subdivided into its constituent calculation steps and evaluated for the positive effects of parallelization on each step. Therefore, the parts of the algorithm that cannot be improved by parallelization are identified. On the other hand, it is also shown that, with CUDA based approach dramatic improvements in the overall performance of the algorithm arepossible.