Logo by yv3 - Contribute your own Logo!

END OF AN ERA, FRACTALFORUMS.COM IS CONTINUED ON FRACTALFORUMS.ORG

it was a great time but no longer maintainable by c.Kleinhuis contact him for any data retrieval,
thanks and see you perhaps in 10 years again

this forum will stay online for reference
News: Visit the official fractalforums.com Youtube Channel
 
*
Welcome, Guest. Please login or register. April 20, 2024, 10:51:21 AM


Login with username, password and session length


The All New FractalForums is now in Public Beta Testing! Visit FractalForums.org and check it out!


Pages: 1 2 3 [4] 5 6 ... 10   Go Down
  Print  
Share this topic on DiggShare this topic on FacebookShare this topic on GoogleShare this topic on RedditShare this topic on StumbleUponShare this topic on Twitter
Author Topic: compiling Kalles Fraktaler with mingw  (Read 53913 times)
Description: success report
0 Members and 5 Guests are viewing this topic.
PieMan597
Conqueror
*******
Posts: 122



WWW
« Reply #45 on: May 04, 2017, 02:05:08 PM »

Is is at all possible for this to support open cl acceleration?
Logged
claude
Fractal Bachius
*
Posts: 563



WWW
« Reply #46 on: May 04, 2017, 03:20:28 PM »

Is is at all possible for this to support open cl acceleration?

Not without quite a lot of work.

I did start refactoring the formulas into separate files using C preprocessor to avoid code duplication, not finished or published yet.  But it allows a formula to be written concisely in 2 short files:

formula/Mandelbrot2.h
Code:
#ifndef MANDELBROT2_H
#define MANDELBROT2_H
#include "formula0.h"
#define TYPE 0
#define POWER 2
#include "formula.h"
#endif

formula/Mandelbrot2.cpp
Code:
#include "Mandelbrot2.h"
#define GLITCH 0.0000001
#define REFERENCE(T) \
Xrn = Xr2 - Xi2 + Cr; \
Xin = (Xr + Xi).Square() - Xr2 - Xi2 + Ci;
#define PERTURBATION(T) \
xrn = (2 * Xr + xr) * xr - (2 * Xi + xi) * xi + cr; \
xin = 2 * ((Xr + xr) * xi + Xi * xr) + ci;
#include "formula.inc"

Plus 6 calls like F(type,power) in the relevant places, and 1 line to add the formula to the GUI.

Possibly this simplified formula code could be used for OpenCL too - or at least make it easier to add support.  But I don't know how to use OpenCL on Windows, much less how to get it to work correctly in WINE.

Another change I made is to remove the threaded reference calculation - the non-threaded sped it up by a large factor (about 3x faster by elapsed wall-clock time for Dinkydau's Hyperbolic Tiling Pistile location) - either there is too much synchronisation overhead, or WINE is terrible at it.  I published a new build with this change (but not the formula refactoring), would be great to have some benchmarks from real Windows users, maybe is a WINE issue: https://mathr.co.uk/kf/kf.html
« Last Edit: May 04, 2017, 11:31:11 PM by claude, Reason: new build » Logged
Kalles Fraktaler
Fractal Senior
******
Posts: 1458



kallesfraktaler
WWW
« Reply #47 on: May 05, 2017, 01:03:11 PM »

Another change I made is to remove the threaded reference calculation - the non-threaded sped it up by a large factor (about 3x faster by elapsed wall-clock time for Dinkydau's Hyperbolic Tiling Pistile location) - either there is too much synchronisation overhead, or WINE is terrible at it.  I published a new build with this change (but not the formula refactoring), would be great to have some benchmarks from real Windows users, maybe is a WINE issue: https://mathr.co.uk/kf/kf.html
Cool, I downloaded the latest and made comparisons.
The I tested magnum opus, the reference is calculated in about 5 seconds on the new version, compared to about 7 with the old, so that is an improvement.
Second test was with a very deep location, 5.55E3198, with more than a million iterations calculated.
It took 1m21s on the new version and only 45 seconds on the old, so that is unfortunately about half the time.

I think the trick is to choose a good point where to use multi-threaded or not.

Addition: Also the "Hyperbolic tiling with golden ratio tiling pistil" is faster with multithreaded calculation on my windows machine, however the difference is only some 30%

* newton e3198.kfr (9.31 KB - downloaded 58 times.)
« Last Edit: May 05, 2017, 03:01:55 PM by Kalles Fraktaler » Logged

Want to create DEEP Mandelbrot fractals 100 times faster than the commercial programs, for FREE? One hour or one minute? Three months or one day? Try Kalles Fraktaler http://www.chillheimer.de/kallesfraktaler
http://www.facebook.com/kallesfraktaler
claude
Fractal Bachius
*
Posts: 563



WWW
« Reply #48 on: May 07, 2017, 04:57:05 PM »

Cool, I downloaded the latest and made comparisons.
The I tested magnum opus, the reference is calculated in about 5 seconds on the new version, compared to about 7 with the old, so that is an improvement.
Second test was with a very deep location, 5.55E3198, with more than a million iterations calculated.
It took 1m21s on the new version and only 45 seconds on the old, so that is unfortunately about half the time.

Here (WINE) it takes over twice as long with old version.  So I think the conclusion is that it is a WINE issue:

version / total time as reported by kf / refs / notes
20170504 / 19m51s289 / ref 12 / without multithreaded reference calculations
20170406 / 46m02s592 / ref 12 / wineserver uses 60% CPU when computing reference

Quote
I think the trick is to choose a good point where to use multi-threaded or not.

Yes absolutely, at least make it available as an option in the menus (so that WINE users can disable it), if automagic config isn't an option.  I don't know much about Windows GUI programming but I'll try to hack it in this week.

EDIT: as of now Windows users should use the 20170406 release, WINE users should use the 20170504 release, for best performance...

EDIT: confirmed on IRC as a WINE issue:
17:19 < slackner> ClaudiusMaximus: without having reviewed the whole code, my theory would be that multithreading is done in such a way that a lot of time is wasted with synchronization
17:19 < slackner> ClaudiusMaximus: event functions always use the wineserver atm, so this would explain the performance issues
17:19 < ClaudiusMaximus> yes that's definitely the case, i'm just suprised at the huge difference between Windows and WINE (around 4x)
17:19 < ClaudiusMaximus> ok, and wineserver is single-threaded?
17:20 < nsivov> it is
17:20 < slackner> ClaudiusMaximus: yes. in most cases that shouldn't matter though, usually wineserver calls are rare and can be executed very quickly
17:21 < ClaudiusMaximus> yes this code is a bit pathological i guess

EDIT:  I investigated other synchronisation methods (alternatives to SetEvent()/WaitObject()), here's some rough results:
Code:
first test comparing barrier methods: time to complete newton 5e3198 with 12 refs

version / wall-clock time / cpu load for ref / barrier method / barrier submethod
yieldin2 / 21m12s288 / 245% / centralized barrier / yielding spinlocks
barrier2 / 25m01s879 / 270% / centralized barrier / non-yielding spinlocks
yielding / 22m29s161 / 220% / semaphore barrier / yielding spinlocks
barriers / 31m17s617 / 250% / semaphore barrier / non-yielding spinlocks
20170504 / 19m51s289 / 100% / without multithreaded reference calculations
20170406 / 46m02s592 / 160% (of which 60% wineserver)                                   

second test investigating cpu thread affinity set in various ways:   

test using htilepistil.kfr 1e2293 ldbl
isbest / method / time for first pixels / cpu load for ref / notes
* / affinity /  3m31s / 200% / 3 threads on 2 cores (any thread / any core)
  / affinity /  4m35s / 290% / 3 threads on 3 cores (one thread / one core)
  / taskset4 /  4m00s / 230% / wine uses 4 cores
  / taskset3 /  3m50s / 220% / wine uses 3 cores
  / taskset2 /  3m30s / 200% / wine uses 2 cores
  / taskset1 /  4m29s / 100% / wine uses 1 core
  / yieldin2 /  3m45s / 230%
  / 20170504 /  3m34s / 100%
  / 20170406 / 18m47s / 175% (of which 65% wineserver) 

test using e10000.kfr 1e10002 floatexp
isbest / method / time for first ref 100% / cpu load for ref / notes
  / affinity / 3m00s / 200% / 3 threads on 2 cores (any thread / any core)
* / affinity / 2m06s / 290% / 3 threads on 3 cores (one thread / one core)
  / yieldin2 / 2m21s / 260%
  / 20170504 / 4m18s / 100%   
  / 20170406 / 6m25s / 150% (of which 40% wineserver)

test using newton 5e3198 ldbl
isbest / method / time for first pixels / cpu load for ref / notes
* / affinity / 0m48s / 200% / 3 threads on 2 cores (any thread / any core)
* / affinity / 0m48s / 290% / 3 threads on 3 cores (one thread / one core)
  / yieldin2 / 0m52s / 245%
  / barrier2 / 1m08s / 270%
  / 20170504 / 0m55s / 100%
  / 20170406 / 2m58s / 165% (of which 65% wineserver)

To conclude I'll use SetThreadAffinityMask(3) for ldbl and SetThreadAffinityMask(1<<threadid) for floatexp, which gives the best performance in these tests in WINE 64bit / Debian Stretch / AMD64 quad core processor.  I'll post again with a new build soon.
« Last Edit: May 08, 2017, 06:38:48 AM by claude, Reason: barriers » Logged
claude
Fractal Bachius
*
Posts: 563



WWW
« Reply #49 on: May 08, 2017, 02:54:14 PM »

New build up at https://mathr.co.uk/kf/kf.html with barrier() semantics for synchronisation of the threaded reference calculations; this performs well on WINE, hopefully on real Windows too.  Let me know how it works for you.
Logged
Adam Majewski
Fractal Lover
**
Posts: 221


WWW
« Reply #50 on: May 08, 2017, 06:06:13 PM »

New build up at https://mathr.co.uk/kf/kf.html with barrier() semantics for synchronisation of the threaded reference calculations; this performs well on WINE, hopefully on real Windows too.  Let me know how it works for you.

wine kf.exe
fixme:heap:HeapSetInformation 0x240000 0 0x23fe10 4
fixme:heap:HeapSetInformation (nil) 1 (nil) 0
fixme:advapi:RegisterEventSourceW ((null),L"Bonjour Service"): stub
fixme:advapi:ReportEventA (0xcafe4242,0x0004,0x0000,0x00000064,(nil),0x0001,0x00000000,0x54dfc0,(nil)): stub
fixme:advapi:ReportEventW (0xcafe4242,0x0004,0x0000,0x00000064,(nil),0x0001,0x00000000,0x6f9c0,(nil)): stub
fixme:winsock:WSAIoctl WS_SIO_UDP_CONNRESET stub
fixme:winsock:WSAIoctl -> SIO_ADDRESS_LIST_CHANGE request: stub
fixme:iphlpapi:DeleteIpForwardEntry (pRoute 0x54e350): stub
fixme:iphlpapi:CreateIpForwardEntry (pRoute 0x54e318): stub
fixme:advapi:ReportEventA (0xcafe4242,0x0004,0x0000,0x00000064,(nil),0x0001,0x00000000,0x54dfc0,(nil)): stub
fixme:advapi:ReportEventW (0xcafe4242,0x0004,0x0000,0x00000064,(nil),0x0001,0x00000000,0x70240,(nil)): stub
fixme:service:EnumServicesStatusW resume handle not supported
fixme:service:EnumServicesStatusW resume handle not supported
fixme:advapi:ReportEventA (0xcafe4242,0x0004,0x0000,0x00000064,(nil),0x0001,0x00000000,0x54dfc0,(nil)): stub
fixme:advapi:ReportEventW (0xcafe4242,0x0004,0x0000,0x00000064,(nil),0x0001,0x00000000,0x70240,(nil)): stub
fixme:netapi32:NetGetJoinInformation Semi-stub (null) 0x54e038 0x54e030
fixme:winsock:WSAIoctl WS_SIO_UDP_CONNRESET stub
fixme:advapi:ReportEventA (0xcafe4242,0x0001,0x0000,0x00000064,(nil),0x0001,0x00000000,0x54d4d0,(nil)): stub
fixme:advapi:ReportEventW (0xcafe4242,0x0001,0x0000,0x00000064,(nil),0x0001,0x00000000,0x70240,(nil)): stub
err:eventlog:ReportEventW L"mDNSCoreReceiveResponse: Received from 192.168.1.8:5353   16 zelman.local. AAAA FE80:0000:0000:0000:889C:3EA7:6CEA:BDC9"
fixme:advapi:ReportEventA (0xcafe4242,0x0001,0x0000,0x00000064,(nil),0x0001,0x00000000,0x54d4d0,(nil)): stub
fixme:advapi:ReportEventW (0xcafe4242,0x0001,0x0000,0x00000064,(nil),0x0001,0x00000000,0x70240,(nil)): stub
err:eventlog:ReportEventW L"mDNSCoreReceiveResponse: ProbeCount 2; will deregister    4 zelman.local. Addr 192.168.1.8"
fixme:advapi:ReportEventA (0xcafe4242,0x0001,0x0000,0x00000064,(nil),0x0001,0x00000000,0x54d3e0,(nil)): stub
fixme:advapi:ReportEventW (0xcafe4242,0x0001,0x0000,0x00000064,(nil),0x0001,0x00000000,0x70240,(nil)): stub
err:eventlog:ReportEventW L"Local Hostname zelman.local already in use; will try zelman-2.local instead"
fixme:advapi:ReportEventA (0xcafe4242,0x0001,0x0000,0x00000064,(nil),0x0001,0x00000000,0x54d4d0,(nil)): stub
fixme:advapi:ReportEventW (0xcafe4242,0x0001,0x0000,0x00000064,(nil),0x0001,0x00000000,0x70240,(nil)): stub
err:eventlog:ReportEventW L"mDNSCoreReceiveResponse: Received from 192.168.1.8:5353   14 8.1.168.192.in-addr.arpa. PTR zelman.local."
fixme:advapi:ReportEventA (0xcafe4242,0x0001,0x0000,0x00000064,(nil),0x0001,0x00000000,0x54d4d0,(nil)): stub
fixme:advapi:ReportEventW (0xcafe4242,0x0001,0x0000,0x00000064,(nil),0x0001,0x00000000,0x70240,(nil)): stub
err:eventlog:ReportEventW L"mDNSCoreReceiveResponse: Unexpected conflict discarding   16 8.1.168.192.in-addr.arpa. PTR zelman-2.local."

Logged
claude
Fractal Bachius
*
Posts: 563



WWW
« Reply #51 on: May 08, 2017, 06:29:02 PM »

wine kf.exe
...snip...

does wine work with other programs?

which wine version do you have?

looks like network autodiscovery stuff, but kf.exe doesn't use any networking, so I guess this is a wine issue...
Logged
Adam Majewski
Fractal Lover
**
Posts: 221


WWW
« Reply #52 on: May 08, 2017, 07:01:06 PM »

does wine work with other programs?

which wine version do you have?

looks like network autodiscovery stuff, but kf.exe doesn't use any networking, so I guess this is a wine issue...

KF program works for me ( maybe it was not seen from output )
wine --version
wine-1.6.2

Logged
claude
Fractal Bachius
*
Posts: 563



WWW
« Reply #53 on: May 08, 2017, 07:21:00 PM »

KF program works for me ( maybe it was not seen from output )
wine --version
wine-1.6.2



Oh, if it works then no worries smiley


For reference, I'm  using
$ wine --version  # most of the time
wine-1.8.7 (Debian 1.8.7-2)
$ wine-development --version  # rare tests
wine-2.0 (Debian 2.0-3)
Logged
Dinkydau
Fractal Senior
******
Posts: 1616



WWW
« Reply #54 on: May 24, 2017, 03:32:25 PM »

The scientific notation bug appears to be back since 20170330.1
Edit: I downloaded 20170330, the version where the bug would have been fixed, but it has scientific notation too.
Logged

claude
Fractal Bachius
*
Posts: 563



WWW
« Reply #55 on: May 24, 2017, 06:18:20 PM »

The scientific notation bug appears to be back since 20170330.1
Edit: I downloaded 20170330, the version where the bug would have been fixed, but it has scientific notation too.

Ok, I think it's because I switched from scientific to default, which is a mix of non-scientific and scientific depending on size - I'll switch it to explicitly fixed next time I touch the code.  Thanks for testing.
Logged
claude
Fractal Bachius
*
Posts: 563



WWW
« Reply #56 on: July 03, 2017, 09:20:42 AM »

new build at https://mathr.co.uk/kf/kf.html

Quote
kf-2.11.1+gmp.20170703 XML formulas
   
    - formulas now generated at compile time from
      formula definition XML using XSL stylesheet
   
    - used fixed format floats instead of scientific
   
    - try to hide command prompt window on Windows

The XML formula definition file is here:  https://code.mathr.co.uk/kalles-fraktaler-2/blob/refs/heads/formulas:/formula/formula.xml
As you can see it strips out all the repetitive code.

It should be possible to generate OpenCL code from these formula definitions, at least for double precision (no long double in CL, and not sure if there is operator overloading for floatexp).
Logged
claude
Fractal Bachius
*
Posts: 563



WWW
« Reply #57 on: July 06, 2017, 04:02:48 PM »

some very preliminary (mostly broken) OpenCL fiddling here:

https://code.mathr.co.uk/kalles-fraktaler-2/log/refs/heads/formulas

to get it to build needs a Haskell development setup (ghc and cabal), as well as xsltproc - this is for a custom preprocesser to work around OpenCL 1.2's lack of operator overloading.

you need to git clone https://github.com/martijnberger/clew next to the kalles-fraktaler-2 directory also, as I couldn't figure out how to link OpenCL otherwise

OpenCL seems to work for very shallow zooms, even with force floatexp enabled, but as soon as going past 1e3 it flakes out - disabling series approximation and glitch correction helps slightly, but it's mostly made of fail

won't have time to hack on this code for a few weeks, so leaving it here in case someone manages to get it working in the meantime...

EDIT here's a binary: https://mathr.co.uk/kf/kf-with-opencl.zip

« Last Edit: July 06, 2017, 04:09:07 PM by claude, Reason: binary » Logged
Dinkydau
Fractal Senior
******
Posts: 1616



WWW
« Reply #58 on: July 08, 2017, 04:38:47 PM »

The latest version crashes upon loading KFR files with a great depth such as E8383.

Also something else:
I'm interested in running kalles fraktaler in a virtual machine. I want to do that because it would allow me to save the state of the operating system while a heavy render is going on. If there's something that causes the computer to stop working such as power loss I can resume the render from the last saved state.

I compared some render times. There are 4 relevant types of calculations:
1. full precision reference pixels
2. series approximation
3. other pixels
4. Newton-raphson zooming, which could be split up into 4.1: finding period, 4.2: Newton-Raphson steps, 4.3: determining size of the minibrot.

First I tried windows 10 because it's easy to get a legal (trial) ISO of it.
Total render time is a little slower. Reference computation seems to be significantly slower.
Newton-Raphson zooming is about 5 times slower. (100 seconds versus 20 in a test). Especially finding the period is a lot slower. The other parts are slower as well.
This holds for several versions.
Mandel Machine and Fractal eXtreme perform pretty much exactly like how they do in my not-virtual windows 7, so the windows 10 virtual machine will be useful to render with Mandel Machine but not with Kalles fraktaler.

Then I installed Debian stretch with wine because it's what Claude uses, so I assumed it must be the most suitable to run his Kalles fraktaler fork.
Newton-Raphson zooming is still slower but only 2.5 times as slow. Also here, especially finding the period is a lot slower, but not as slow as in windows 10.
Comparing virtual Debian and not virtual Windows 7, the total render time is equal when using the latest version of KF+gmp to compare and a KFR file with depth E1886. Compared to this test:
Version 20170406 is about 8% faster in windows 7 (not virtual) but A LOT slower in Debian.
Version 20170508 is about 8% faster in windows 7 (not virtual) and 7% slower in Debian.

20170406 is the best version on windows 7, seemlingly equal to 20170508. 20170508 is the best version in Debian because it's the fastest one that doesn't hang upon loading great depths.
In the end 20170508 in Debian seems to be about 12% slower than 20170406 in not virtual Windows 7. I would accept that percentage if it means I won't have to be afraid to lose renders anymore but more tests with greater depths are required to be sure if it's worth doing. If the latest version didn't hang, it would be better.

Newton-Raphson zooming is so much faster on windows 7 (not virtual) that it's worth taking the risk of losing the progress right now.
I wondered how this could be and it came to my mind that Fractal eXtreme and Mandel Machine have all the important parts (that take a lot of time to run) written in assembly language. Could that have something to do with the fact that those programs don't seem to have any performance hits?

I will see if I can find my windows 7 installation DVD and make a virtual machine with that too to compare.
Logged

claude
Fractal Bachius
*
Posts: 563



WWW
« Reply #59 on: July 09, 2017, 08:29:55 PM »

The latest version crashes upon loading KFR files with a great depth such as E8383.

At least here (Debian 9 Stretch WINE) it looks like it hangs, but watching the system process monitor one can see that it calculates the reference (with threads) and then proceeeds to calculate the pixels (with lots of threads).  But the main thread seems to be blocked, so no GUI updates occur.  Testing with the e10000.kfr location.  Not sure what will happen when the pixels finish calculating...

Quote
Also something else:
I'm interested in running kalles fraktaler in a virtual machine. I want to do that because it would allow me to save the state of the operating system while a heavy render is going on. If there's something that causes the computer to stop working such as power loss I can resume the render from the last saved state.

Yes that's a good way to do it without having to add such support to the program (which would be a big task afaict).

I suspect the slower parts have to do with syscalls (context switches to the kernel of the virtual machine), that might come from things that update the GUI.  Perhaps some throttling/speed limiting could benefit here, with the Newton-Raphson zooming period finding especially.  I'll put it on the todo list...

Quote
If the latest version didn't hang, it would be better.

Of course.  Thanks for the bug report, indeed I hadn't tested the deep zoom levels.
Logged
Pages: 1 2 3 [4] 5 6 ... 10   Go Down
  Print  
 
Jump to:  

Related Topics
Subject Started by Replies Views Last post
Kalles Fraktaler 2 Kalles Fraktaler « 1 2 ... 29 30 » Kalles Fraktaler 438 131474 Last post July 31, 2014, 12:29:56 AM
by cKleinhuis
Kalles Fraktaler 2.5.7 Kalles Fraktaler « 1 2 » Kalles Fraktaler 20 22923 Last post October 25, 2017, 07:26:34 PM
by Mrz00m
Kalles Fraktaler 2.7 Kalles Fraktaler « 1 2 3 » Kalles Fraktaler 35 33392 Last post October 13, 2014, 04:45:04 PM
by youhn
compiling Kalles Fraktaler 2.7.3 on Linux with mingw Kalles Fraktaler « 1 2 » claude 24 14754 Last post December 31, 2014, 12:42:33 PM
by Kalles Fraktaler
compiling Kalles Fraktaler with GCC Kalles Fraktaler 3dickulus 0 5396 Last post January 03, 2015, 09:13:24 PM
by 3dickulus

Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM
Page created in 0.216 seconds with 29 queries. (Pretty URLs adds 0.014s, 2q)