资 源 简 介
We are now living in the SIMD world. The cores from NVIDIA GPU is scalar core, while there are several processing elements in AMD GPU, labeled as vector/streaming core. The new released multi-core CPU from Intel features AVX, with the register file increasing from 128 bits to 256 bits. To distinguish these SIMD from the large-scale SIMD in vector machine and SIMT from NVIDIA GPUs, we name them as small-scale SIMD or short SIMD, i.e. SSIMD.
When mapping OpenCL programs to many-core processors, programmers typically map one work-item onto one core (scalar core or stream core). Although work-items in terms of work-groups are mapped to one core on multi-core processors, they will be scheduled to the responding core in the one-by-one pattern. If we code in the traditional CUDA-style (without any vectorization), only a small proportion of the computational potential, in theory, is tapped for these machines with SSIMD. Therefore, if we do not want