Patryk Kizny
|
|
« Reply #30 on: November 23, 2015, 04:44:26 PM » |
|
BTW, can you guys recommend any options for less precise but way faster trigonometry? I was wondering that since sin/cos is that slow and it's used very often for stuff that is not critical, maybe it can be approximated with faster functions?
Any ideas?
I thought also about maybe using a texture for encoding say 2M values and then interpolate even linearly between them. Would checking a bitmap plus interpolation be faster?
|
|
|
Logged
|
Visual Artist, Director & Cinematographer specialized in emerging imaging techniques.
|
|
|
Syntopia
|
|
« Reply #31 on: November 23, 2015, 05:14:19 PM » |
|
Trigonometric function are not slow on the GPU? See e.g. http://docs.nvidia.com/cuda/cuda-c-programming-guide/#arithmetic-instructions. i.e. 4x the cost of an add, and 2x the cost of bitshift (if you are on Nvidia architecture). Also notice (Fermi architecture): "Special Function Units (SFUs) execute transcendental instructions such as sin, cosine, reciprocal, and square root. Each SFU executes one instruction per thread, per clock; a warp executes over eight clocks. The SFU pipeline is decoupled from the dispatch unit, allowing the dispatch unit to issue to other execution units while the SFU is occupied.". Given the design of a GPU (a massive amount of threads, and free context switch per cycle), that means you in many cases get trigonometric operations for *free*, because other parts of you code can carry on - at least if the SFU is not the bottleneck. Speculating about GPU efficiency is hard, and is often counter-intuitive. Don't assume that branching is always costly (it only is if the warps diverge - and sometimes the Nvidia compile will use predicated instructions), and don't assume trigonometrics is costly. The only sane thing is to try out your ideas, and measure the impact. Even then other GPU's may behave completely different.
|
|
|
Logged
|
|
|
|
hobold
Fractal Bachius
Posts: 573
|
|
« Reply #32 on: November 23, 2015, 05:42:12 PM » |
|
And BTW, GPUs may sacrifice some accuracy for speed especially with trigonometry. For example sin() may have low absolute error, but much worse relative error on the GPU. In other words, the popular approximation sin(x) == x for very small magnitudes of x may fail on GPUs, despite the fact that sin() works beautifully for rotation matrixes and the like.
This depends somewhat on the programming environment. For example, OpenCL specifies guaranteed high accuracy for math functions like sin(), but older versions of CUDA don't. I don't know about GLSL.
(Of course higher accuracy comes at the cost of speed, so OpenCL also allows use of the sloppy faster versions, if the programmer explicitly asks for them. I'd expect that modern CUDA does the same.)
|
|
|
Logged
|
|
|
|
Syntopia
|
|
« Reply #33 on: November 23, 2015, 05:53:54 PM » |
|
The numbers (4x slower than add) given above were for the fast CUDA functions (__sinf instead of sinf, or when using --use-fast-math). I assume GLSL always use the fast path, since it meant for computer graphics, and not scientific computation.
|
|
|
Logged
|
|
|
|
3dickulus
|
|
« Reply #34 on: November 24, 2015, 03:09:06 AM » |
|
@Syntopia I recall, from another thread, you mentioned that you had tried an fp64 (double) version of shaders in Fragmentarium but it was very slow my question is: was it worth it vis a vis image quality? as an artist I don't care how long it takes to render that "perfect shot" but as an animator I do want max speed.
|
|
|
Logged
|
|
|
|
|
3dickulus
|
|
« Reply #36 on: December 01, 2015, 03:34:39 AM » |
|
OKey,
Here's what I figured out: - The problem is definitely present only in Synthclipse (although I had experienced it in the past with Fragmentarium at some point and ignored it). - The problem is caused by the mipmap thing. i.e when I set fragmentarium map using the code:
uniform sampler2D myTexture; file[test.png] #TexParameter myTexture; GL_TEXTURE_MIN_FILTER GL_LINEAR_MIPMAP_LINEAR
It behaves exactly like the problem I am experiencing with synthclipse. Unfortunately Synthclipse won't let me enter #TexParameter
just a heads up on mipmaps, setting just one parameter, as you indicated, is not correct, it needs all the right stuff... this is the proper way afaik... /// in a frag this should create a mipmapped texture with 1000 levels uniform sampler2D myTexture; file[test.png] #TexParameter myTexture GL_TEXTURE_MAX_LEVEL 1000 #TexParameter myTexture GL_TEXTURE_WRAP_S GL_REPEAT #TexParameter myTexture GL_TEXTURE_WRAP_T GL_REPEAT #TexParameter myTexture GL_TEXTURE_MAG_FILTER GL_LINEAR #TexParameter myTexture GL_TEXTURE_MIN_FILTER GL_LINEAR_MIPMAP_LINEAR
|
|
|
Logged
|
|
|
|
Patryk Kizny
|
|
« Reply #37 on: December 15, 2015, 07:42:09 PM » |
|
Thanks!
|
|
|
Logged
|
Visual Artist, Director & Cinematographer specialized in emerging imaging techniques.
|
|
|
Crist-JRoger
|
|
« Reply #38 on: March 23, 2016, 09:02:01 PM » |
|
Hi! I try to run GI by Eiffie on ATI. So it's noisy renderer and it uses rand() noise for dithering. So question is: how rewrite this code for wang_hash_fp function? //random seed and generator vec2 randv2=fract(cos((gl_FragCoord.xy+gl_FragCoord.yx*vec2(1000.0,1000.0))+vec2(time)*10.0+vec2(iRay,iRay))*10000.0); vec2 rand2(){// implementation derived from one found at: lumina.sourceforge.net/Tutorials/Noise.html randv2+=vec2(1.0,1.0); return vec2(fract(sin(dot(randv2.xy ,vec2(12.9898,78.233))) * 43758.5453), fract(cos(dot(randv2.xy ,vec2(4.898,7.23))) * 23421.631)); } upd.: found another in Sky-Pathtracer. Looks better But wang_hash question still important. vec2 seed = viewCoord*(float(subframe)+1.0);
vec2 rand2() {
seed+=vec2(-1,1); return rand2(seed); };
|
|
« Last Edit: March 23, 2016, 09:58:24 PM by Crist-JRoger »
|
Logged
|
|
|
|
3dickulus
|
|
« Reply #39 on: March 24, 2016, 02:04:34 AM » |
|
from http://www.fractalforums.com/index.php?topic=22721.msg88910#msg88910 (ty Syntopia) #extension GL_ARB_shader_bit_encoding : enable #extension GL_EXT_gpu_shader4 : enable #extension GL_ARB_gpu_shader5 : enable
uint wang_hash(uint seed) { seed = (seed ^ 61u) ^ (seed >> 16u); seed *= 9u; seed = seed ^ (seed >> 4u); seed *= 0x27d4eb2du; seed = seed ^ (seed >> 15u); return seed ; }
// Wrapper for getting from float to ints. This certainly looses precision. I imagine we could do better here. float wang_hash_fp(float v) { uint ix = floatBitsToUint(v); return float(wang_hash(ix)) / 4294967296.0; }
float rand(vec2 co){ // implementation found at: lumina.sourceforge.net/Tutorials/Noise.html // modified for seeding with wang hash function provided by Syntopia return fract(sin(dot(co.xy ,vec2(wang_hash_fp(co.x),wang_hash_fp(co.y)))) * 43758.5453); }
float rand(vec3 co){ // implementation found at: lumina.sourceforge.net/Tutorials/Noise.html // modified for seeding with wang hash function provided by Syntopia return fract(sin(dot(co,vec3(wang_hash_fp(co.x),wang_hash_fp(co.y),wang_hash_fp(co.z)))) * 43758.5453); }
you might be able to just use the above routines or with Eiffie's code it might look like... //random seed and generator vec2 randv2=fract(cos((gl_FragCoord.xy+gl_FragCoord.yx*vec2(1000.0,1000.0))+vec2(time)*10.0+vec2(iRay,iRay))*10000.0); vec2 rand2(){// implementation derived from one found at: lumina.sourceforge.net/Tutorials/Noise.html randv2+=vec2(1.0,1.0); return vec2(fract(sin(dot(randv2.xy ,vec2(wang_hash_fp(randv2.x),wang_hash_fp(randv2.y)))) * 43758.5453), fract(cos(dot(randv2.xy ,vec2(wang_hash_fp(randv2.x),wang_hash_fp(randv2.y)))) * 23421.631)); } or... vec2 seed = viewCoord*(float(subframe)+1.0);
vec2 rand2() {
seed+=vec2(-1,1); return vec2(wang_hash_fp(seed.x),wang_hash_fp(seed.y)); }; just guessing so some experimenting will be required, if you find an optimal configuration I would like to add these routines to the MathUtils.frag
|
|
|
Logged
|
|
|
|
Crist-JRoger
|
|
« Reply #40 on: March 24, 2016, 07:42:21 PM » |
|
Something wrong with this code vec2 seed = viewCoord*(float(subframe)+1.0);
vec2 rand2() {
seed+=vec2(-1,1); return vec2(wang_hash_fp(seed.x),wang_hash_fp(seed.y)); }; because result is: 200 subframes 2000 subframes This looks much better vec2 seed = viewCoord*(float(subframe)+1.0);
vec2 rand2() {
seed+=vec2(-1,1); return rand2(seed); }; 200 subframes 2000 subframes The code below is horrible, looks little better than original. I didn't rendered it: //random seed and generator vec2 randv2=fract(cos((gl_FragCoord.xy+gl_FragCoord.yx*vec2(1000.0,1000.0))+vec2(time)*10.0+vec2(iRay,iRay))*10000.0); vec2 rand2(){// implementation derived from one found at: lumina.sourceforge.net/Tutorials/Noise.html randv2+=vec2(1.0,1.0); return vec2(fract(sin(dot(randv2.xy ,vec2(wang_hash_fp(randv2.x),wang_hash_fp(randv2.y)))) * 43758.5453), fract(cos(dot(randv2.xy ,vec2(wang_hash_fp(randv2.x),wang_hash_fp(randv2.y)))) * 23421.631)); } Any ideas to apply WangHash into vec2 rand2() most optimal?
|
|
|
Logged
|
|
|
|
3dickulus
|
|
« Reply #41 on: March 25, 2016, 04:55:40 AM » |
|
here's 2 images I just rendered @ 200 subframes 1. default is as distributed in Examples/eiffieGI2/eiffieGI.frag 2. using wang hash as described above like... //random seed and generator vec2 randv2=fract(cos((gl_FragCoord.xy+gl_FragCoord.yx*vec2(1000.0,1000.0))+vec2(time)*10.0+vec2(iRay,iRay))*10000.0); vec2 rand2(){// implementation derived from one found at: lumina.sourceforge.net/Tutorials/Noise.html randv2+=vec2(1.0,1.0); return vec2(fract(sin(dot(randv2.xy ,vec2(wang_hash_fp(randv2.x),wang_hash_fp(randv2.y)))) * 43758.5453), fract(cos(dot(randv2.xy ,vec2(wang_hash_fp(randv2.x),wang_hash_fp(randv2.y)))) * 23421.631)); } using nVidia GeForce 760 just going to try Sky-Pathtracer now...
|
|
|
Logged
|
|
|
|
3dickulus
|
|
« Reply #42 on: March 25, 2016, 05:15:41 AM » |
|
yes, the SkyPathtracer looks like cloth... vec2 seed = viewCoord*(float(subframe)+1.0);
vec2 rand2() {
seed+=vec2(-1,1); return vec2(wang_hash_fp(seed.x),wang_hash_fp(seed.y)); }; but this does not... vec2 seed = viewCoord*(float(subframe)+1.0);
vec2 rand2n() { seed+=vec2(wang_hash_fp(seed.x),wang_hash_fp(seed.y)); return rand2(seed); };
1. default as distributed in Examples/ 2. modified wang hash 2 although SkyPathtracer looks nice it is very slow compared to DE-Kn2
|
|
|
Logged
|
|
|
|
3dickulus
|
|
« Reply #43 on: March 25, 2016, 06:26:51 AM » |
|
as I said, just guessing so some experimenting will be requiredthe effect of using wang hash does fix the IQ clouds blockyness bug
|
|
|
Logged
|
|
|
|
quaz0r
Fractal Molossus
Posts: 652
|
|
« Reply #44 on: March 25, 2016, 06:58:57 AM » |
|
|
|
|
Logged
|
|
|
|
|