ker2x
Fractal Molossus
 
Posts: 795
|
 |
« Reply #30 on: July 16, 2010, 07:45:48 PM » |
|
I just got a new laptop (Asus EeePC 1201PN, with a Ion2 GPU) and a legal version of Visutal Studio 2008 and 2010. So i'll be able to use the OpenCL/Cuda profiler and ... tadammmm : http://developer.nvidia.com/object/nsight.html \o/ \o/ So the development of the buddhabrot is on hold while i learn to use nsight and redevelop the app in C/C++ (eek!)
|
|
|
|
|
Logged
|
|
|
|
|
cbuchner1
|
 |
« Reply #31 on: July 16, 2010, 07:51:26 PM » |
|
Congratulations!
nSight support for 2010 is not official, although there is a thread in the nVidia forum how to do a manual setup.
You're staying with OpenCL, right? Now that you have an nVidia GPU you might also want to look into the direction of CUDA, maybe.
|
|
|
|
|
Logged
|
|
|
|
ker2x
Fractal Molossus
 
Posts: 795
|
 |
« Reply #32 on: July 16, 2010, 08:03:52 PM » |
|
Congratulations!
nSight support for 2010 is not official, although there is a thread in the nVidia forum how to do a manual setup.
You're staying with OpenCL, right? Now that you have an nVidia GPU you might also want to look into the direction of CUDA, maybe.
I already had a NVidia (8800GTX), and the NVidia book i bought (Programming with massively parallel processors) focus on Cuda, with just a few page about openCL. But i'll try my best to focus on OpenCL. I may do some cuda for early development, if i find it better than openCL, but i'll always rewrite the kernel in openCL  (Edit: i'll probably buy the pro version, the price is fair and it can profile openCL) (Edit2: i hope it will work on the Ion2  )
|
|
|
|
« Last Edit: July 16, 2010, 08:09:23 PM by ker2x »
|
Logged
|
|
|
|
|
cbuchner1
|
 |
« Reply #33 on: July 16, 2010, 10:14:29 PM » |
|
Last time I checked, nVidia's nSight required the GPU to be debugged to be separate from the display device. Either by connecting two PCs remotely, or by having a second GPU in the PC. Apparently the second option won't apply to your netbook.
|
|
|
|
|
Logged
|
|
|
|
ker2x
Fractal Molossus
 
Posts: 795
|
 |
« Reply #34 on: July 16, 2010, 10:19:27 PM » |
|
i have a bigger problem ... http://developer.nvidia.com/forums/index.php?showtopic=4899Can't install the dev drivers for some unknown reason  Last time I checked, nVidia's nSight required the GPU to be debugged to be separate from the display device. Either by connecting two PCs remotely, or by having a second GPU in the PC. Apparently the second option won't apply to your netbook.
yes, it's in the knowledge base... mmm... i'll see... for now i can't do anything at all 
|
|
|
|
« Last Edit: July 16, 2010, 10:21:01 PM by ker2x »
|
Logged
|
|
|
|
|
cbuchner1
|
 |
« Reply #35 on: July 16, 2010, 10:40:29 PM » |
|
sometimes editing the nvdisp.inf file (or whatever it's called for the mobile drivers) can help. Also there is a site laptopvideo2go.com which specializes in providing pre-modified inf files for most nVidia driver releases. These are usually getting posted in a forum thread. the forums on forums.nvidia.com seem to be more active than those on developer.nvidia.com. You're more likely to find help there.
|
|
|
|
« Last Edit: July 16, 2010, 10:49:01 PM by cbuchner1 »
|
Logged
|
|
|
|
ker2x
Fractal Molossus
 
Posts: 795
|
 |
« Reply #36 on: July 17, 2010, 12:14:01 AM » |
|
sometimes editing the nvdisp.inf file (or whatever it's called for the mobile drivers) can help. Also there is a site laptopvideo2go.com which specializes in providing pre-modified inf files for most nVidia driver releases. These are usually getting posted in a forum thread. the forums on forums.nvidia.com seem to be more active than those on developer.nvidia.com. You're more likely to find help there. thank you thank you thank you \o/ It works ! It's not the developper driver but, at least, my gfx card is seen as an openCL capable GPU by Geeks3D GPU Caps viewer and the openCL demo works \o/ I now sucessfully installed the Cuda SDK (which include the openCL SDK) *hugs* 
|
|
|
|
|
Logged
|
|
|
|
ker2x
Fractal Molossus
 
Posts: 795
|
 |
« Reply #37 on: July 17, 2010, 12:32:05 AM » |
|
I ran some tests : 8800GTX Bandwidth (PCIE 16x) : host -> device : 1600MB/s device -> host : 1250MB/s device -> device : 10GB/s Ion (PCIE 1x?) : host -> device : 128MB/s device -> host : 160MB/s device -> device : 7GB/s Hum... this is going to be fun 
|
|
|
|
« Last Edit: July 17, 2010, 12:33:48 AM by ker2x »
|
Logged
|
|
|
|
|
cbuchner1
|
 |
« Reply #38 on: July 17, 2010, 11:54:05 AM » |
|
I ran some tests : 8800GTX Bandwidth (PCIE 16x) : host -> device : 1600MB/s device -> host : 1250MB/s device -> device : 10GB/s
Huh, device<->device bandwidth on a GTX 8800 should be higher. I'd expect something in the 70GB/sec range (86.4 GB/s theoretical peak)
|
|
|
|
« Last Edit: July 17, 2010, 12:18:56 PM by cbuchner1 »
|
Logged
|
|
|
|
ker2x
Fractal Molossus
 
Posts: 795
|
 |
« Reply #39 on: July 17, 2010, 09:04:50 PM » |
|
I built a simple separate console app to test the xorshift Random Number Generator, it works.  __kernel void xorshift( uint s1, uint s2, uint s3, uint s4, const int bufferSize, __global uint* outputUInt, __global float* outputFloat ) { uint st;
for(int i=0; i < bufferSize; i++) { st = s1 ^ (s1 << 11); s1 = s2; s2 = s3; s3 = s4; s4 = s4 ^ (s4 >> 19) ^ ( st ^ (st >> 18)); outputUInt[i] = s4; outputFloat[i] = s4 / 4294967295.0; } }
|
|
|
|
|
Logged
|
|
|
|
|
|
ker2x
Fractal Molossus
 
Posts: 795
|
 |
« Reply #41 on: July 18, 2010, 04:05:45 PM » |
|
it become interesting. 4 millions samples/second at 1000 iterations on my slow Ion2 gpu. Time to add colors  Edit : I tested this http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx to disable the driver timeout when the gpu code take too much time. It worked on my Win7 (i had to create the key). Edit2 : i added a samples/s counter. it's more like 2.5 Millions/s instead of 4 Millions  Edit3 : 15 Millions/s on my 8800 GTX \o/ Edit4 : 18 Millions/s on a GTX260 
|
|
|
« Last Edit: July 18, 2010, 09:48:25 PM by ker2x »
|
Logged
|
|
|
|
ker2x
Fractal Molossus
 
Posts: 795
|
 |
« Reply #42 on: July 18, 2010, 08:22:49 PM » |
|
A not-so-deep zoom minIter = 100; maxIter = 10000; realMin = -1.2f; realMax = -0.8f; imaginaryMin = -0.4f; imaginaryMax = -0.1f; (Yes, for some reason, the positive y axis is the negative imaginary axis. i'll fix it) Considering the impressive computation speed as long as the point is out of the screen (so no read-write in the global memory) i do not see an obvious way to improve the deep zoom speed. (the statistical methods like hasting-metropolis-thingy involve a lot of memory access, afaik, so it's probably not good). I need to do some rewrite and learning to implement colors, so colors will wait a few days. 
|
|
|
|
Logged
|
|
|
|
|
cbuchner1
|
 |
« Reply #43 on: July 18, 2010, 09:30:34 PM » |
|
Considering the impressive computation speed as long as the point is out of the screen (so no read-write in the global memory) i do not see an obvious way to improve the deep zoom speed.
When most of your pixels are out of the screen (i.e. deep zoom) you're bounded only by the floating point throughput of the card. You may want to make sure that all threads are busy at all times (e.g. by making a thread which ran out of work pick up a new random number and continue from there).
|
|
|
|
|
Logged
|
|
|
|
ker2x
Fractal Molossus
 
Posts: 795
|
 |
« Reply #44 on: July 18, 2010, 10:00:24 PM » |
|
Considering the impressive computation speed as long as the point is out of the screen (so no read-write in the global memory) i do not see an obvious way to improve the deep zoom speed.
When most of your pixels are out of the screen (i.e. deep zoom) you're bounded only by the floating point throughput of the card. You may want to make sure that all threads are busy at all times (e.g. by making a thread which ran out of work pick up a new random number and continue from there). I run much more thread than available core (usually 1 million per frame, much more if i have a low (<1000) maxiter). When a tread complete, a new thread pick a new pair of random number (according to its unique global_id) from the 2 randomBuffer. The 2 randomBuffers are filled by another kernel. So i never need to transfer randoms from host to device.
|
|
|
|
|
Logged
|
|
|
|
|