资 源 简 介
cuQU is a simple shared-memory queue for messaging between CPU and GPU, coded in CUDA.
Working on proof-of-concept test program, I ended up coding a cuqu::queue template class, backed by a CPU pinned-memory area, used as a circular buffer, where:
1. The CPU is able to push() a chunk of data if space is available.
1. The GPU is able to retrive a chunk of data with a blocking and globally synchronizing pop() primitive.
Main limitation is the GPU global synchronization[1], which imposes a limit on the grid of thread blocks, in such a way to avoid a deadlock. This is due to the NVidia scheduler not preempting the thread blocks.
I thought it could be useful to a broader audience, so I"m committing it here under the Apache License 2.0, to be compatible with Thrust and the other wonderful libraries available in the CUDA ecosystem.
I have a numbe