资 源 简 介
This project demonstrated optimization techniques for Nvidia’s Telsa 10 and Fermi devices using the CUDA programming language.Techniques utilized were:
1. Use of shared memory, L1 and L2 caches.
2. Avoid divergence branch
3. memory coalescing
4. data prefetching
5. conflict free shared memory.
Optimizing techniques were demonstrated on two different applications-BFS and Matrix-Multiply which resulted in reduction in execution time for both applications. This speed up is gained by cutting down on the amount of time it takes to access the global memory of the GPU.