Tried flicking through my AITB EAR Rooms collection. Doing 20 second renders, the speed at which it renders depends on the program loaded. Short tails are faster than longer ones of course.
I have a Core i7-860 @ 3.5ghz, and a GTX760. Over all, if I do a render with 'Nebula Reverb' instance (4096 DSP buffer) and then do a render with my 'Cuda' instance (same 4096 DSP buffer, just picked 'OPT FREQD' to '15CUDA ac 2Mono') I see that the Cuda instance is consistently '1x realtime' faster.
So if my NebulaReverb renders at 2.4x realtime, my Cuda instance will do 3.4x realtime. If my cpu does 0.7x realtime, my Cuda instance will do 1.6x - 1.7x realtime.
Does that sound OK or did I screw something up somewhere? The TestCuda64.exe program said it was ok and device opened OK, and then just sits there.
On realtime playback (using Timp's Marsh Spring MED 6 kernels Spring reverb and some other reverb as well), the CUDA one actually has more CPU usage at the start of playback. Both Nebula DSPBuffer are 8192. And when stable the CPU usage is the as (both take 0.02%).
Offline render for CUDA is a bit faster in a more noticeable way.
Since it only works (kinda) well on Reverb programs, and the most I use Nebula for is console/preamp/tape/eq I can't really use it.
And the gains I get with reverbs is not enough to keep working on it. The difference between 2.4x realtime and 3.4x realtime is an improvement, don't get me wrong. And if it's the difference between the project running realtime or not I'm happy to have it . But it's not the _WOW_ factor some people hope it to be.
I also seem to have a problem where I can only put on Cuda instance on a project. If I load a second instance (in the same track or another track) it starts stuttering. If I use the regular 'nebula reverb' instance it loads fine.
Mind you, this is with a regular nvidia geforce card. I can't try with Quadro's or Tesla's, the real compute-power cards .
Do you really need cuda for IR reverbs? Nice to know how much GPU power and memory it costs but how much CPU does it save you? Keeping the Cuda card busy to work requires CPU power too , and I think IR convolution is so computationally inexpensive these days it isn't a problem.
I have a Reaper test running here 44.1 khz, 256 ASIO buffer. I'm running 64 channels with a simple mono wave file on them.
Playing those 64 tracks without any effects costs around 25%/26% cpu power, with my Core i7 laptop cpu clocking to 2.5 ghz.
Now, if I put a single Reverb on a track, with a _stereo_ reverb of 4.91s, and start adding that on the 2nd track, and the 3rd track, etc.. until my CPU is full, I can get to track _38_. That are 38 instances of a _long_ reverb tail in stereo, and a total of 64 tracks playing... On a (admitted, powerful) laptop cpu.
I'm thinking Cuda would just give more CPU overhead in IR-convolution than that it actually allows to put more instances in.
Kinda the same with short-tail Nebula programs. If you have very short kernels (as in, 100ms or lower) it takes more effort to load the data to the card, do the calculating and transfer the result back to the cpu, than it does to actually just calculate on the cpu .
Some long IR-reverbs are pretty CPU-heavy. The IR-reverb I use doesnt use CUDA, it uses OpenCL which is more effective.
GPU Impulse Reverb VST is an effect plugin that calculates convolution reverbs by using your graphics card as DSP for realtime reverb calculation with a CPU usage of near 0%.
Low latency, only one ASIO block size Supports Stereo & True Stereo processing (quad-channel impulse responses) Supports 16, 24 and 32 bit responses Supports as many instances as your GPU can handle 2-Band EQ Adjustable Attack/Release & Length Envelope
We weren't comparing Nebula to (gpu)-IR. We just got offtopic a bit .
That graph you sent, says _nothing_ about general performance of OpenCL vs Cuda. Trust me, they are about the same. The algorithm you're running has way more too do with the performance and the hardware you're running it on, than the SDK you use to do it . AMD cards are optimized for OpenCL, and they just have stronger compute-performance these days (it was the other way around in the GTX5xx and HD6xxx era). On retail Geforce cards nvidia traded compute performance for game performance for a bit, which makes perfect sense as that's what they are used for . If you want compute performance from a nvidia card get a Quadro or Tesla .
Anyway, I'm still surprised that (you guys think that) OpenCL (OR Cuda) accelerated convolution is worthwhile these days. Nebula I can understand. Like Giancarlo is saying, Nebula is constantly executing over 100's of them in a single instance and blending between them and layering them... and more .
Are there free or demo OpenCL / Cuda IR reverbs so I can try them out? If my laptop can run 38 of them with a _LONG_ tail in stereo and then the trackcount is having more an effect than the IR convolution, I don't think I will gain a lot by switching to GPU acceleration.