cuda processing is efficient for long kernels that are processed in the frequency domain, freqd in nebula. programs with such kernels (reverbs for example), take a lot of cpu power, so you can move some of that to the gpu. just experiment.
you can take a look on the kernelpage, how long the kernels are and how they are processed (timedomain or freqdomain).
system 1: windows 8 32 bit - samplitude prox/x3, tracktion6/7, reaper system 2: mac osx yosemite - reaper(32+64bit), tracktion6/7(32+64bit)
both systems on: macbook pro (late 2009), core 2 duo 3,06 ghz, 4 gb ram, graphic: nvidia geforce 9600M GT 512 MB