1)
Message boards :
Number crunching :
Incorrectly reported CPU time
(Message 6230)
Posted 7 Oct 2014 by valterc Post: I recently had at least two workunits that reported very high execution times, but that didn't actually run so long. See this, for example: http://moowrap.net/result.php?resultid=30220457 or http://moowrap.net/result.php?resultid=30226010 In the first case you may see: Run time 12 hours 26 min 3 sec, but the task started at 12:24:59 CEST (10:24:59 UTC) and ended at 12:59:14 CEST (10:59:14 UTC) thus running normally (~30 minutes for the HD6950). Something should be wrong in date/time calculations, maybe time zones issues. |
2)
Message boards :
News :
New app v1.4 with OpenCL support deployed
(Message 6184)
Posted 16 Sep 2014 by valterc Post: I tested the new app on different platforms and everything went well. This new version solved a lot of annoying issues, you did a good job! Two more things will be useful: - A way to select between opencl and ati/cal/stream (stream is much better for HD6xxxx cards) - A way to choose the right core for opencl applications in the preferences pages. For instance the default (auto) selected core (core #1 CL 1-pipe) is slower than core #3 (according to benchmark results) on my 290-X. |
3)
Message boards :
Number crunching :
Dual+ GPUs (Problems and possible solutions)
(Message 6152)
Posted 5 Sep 2014 by valterc Post: That's great! Any improvements will help on keeping this project alive, and surely on attracting more users. Just one thing: Do you have any benchmarks of the OpenCL application running on pre-Tahiti AMD/ATI GPUS (5xxx, 6xxx)? If there will be a significant drop in performance it may be worthy to also keep the ATI/CAL applications alive (the one based on dnetc518-win32-x86-stream, maybe adding the -devicenum to its runtime flags) Thank you |
4)
Message boards :
Number crunching :
Dual+ GPUs (Problems and possible solutions)
(Message 6145)
Posted 2 Sep 2014 by valterc Post: You might send this to the Boinc Mailing List as the dual Nvidia gpu's in a single machine problems you are seeing has been a thorn for a LONG time!! For some people at some projects it works just fine, for other people at the same project it does not work at all. You fill up the cache of one gpu and the other gpu sits idle. The ONLY solution so far is too put the two gpu's on different projects, then they both work fine Well, this issue is not related at all with Nvidia. The problem here is that the DNETC application will try, by default, to use ALL the GPUs it sees. Being NOT a BOINC application (it needs a wrapper in order to run under boinc) it simply will not care about any boinc directive/setup. Using Boinc v7+ the app will request 2 gpus, but will not start at all, unless you specify <gpu_usage>1</gpu_usage> inside an app_config.xml. Even in this case the application will use ALL the gpus, thus conflicting with other gpu applications BOINC may want to run. To be more specific, suppose that I attach to Milkyway and Moo! on a dual GPU machine, with boinc v7. - If I do nothing: Boinc downloads both MOO and MW wus, MOO requests 2 GPUs but will not start at all (this may be a BOINC issue). Result: MOO will time out eventually and both GPUs run MW - If I modify the app_config.xml as per above: MOO is now requesting to run on just one GPU, application will start, say on the first GPU but actually using *ALL* GPUs (two). Boinc thinks that one gpu is free and starts another MOO or a MW on the second GPU. So you may end up having one GPU running two different tasks at the same time.... |
5)
Message boards :
Number crunching :
Dual+ GPUs (Problems and possible solutions)
(Message 6141)
Posted 1 Sep 2014 by valterc Post: Hi all. There are at least three annoying problems on running Moo on a dual+ gpu setup. I have experienced this situation on a box with two ATI/AMD 6900 series (Cayman) - Using Boinc v7+ the application won't start unless using an app_config.xml (This is something that should be avoided for normal users) - The dnetc application will try to use all the gpus, simply ignoring any boinc setup, thus creating conflicts with others gpu applications. - If the application uses all the gpus there are 'waiting for other thread' problems if the number of input packets is not even (in case of power of two gpus) or if the crunching speed of the gpus is different. What I propose is a very simple change, ie. just get rid at all of the multi-gpu capability of the dnetc application. The application itself uses a configuration file (dnetc-gpu-1.3.ini) which is copied in the slots directory before running it. The content of the current one is: [buffers] If we add a "max-threads=1" line into the [processor-usage] section we tell the application to use *just* one thread (which means on gpu). However, with this addition, it will *ever* use device 0. But if it is started with the -devicenum <n> command line argument (run on device <n> only), all the problem related to this issue will be solved. This should be just a minor change on the wrapper code. |
6)
Message boards :
Number crunching :
Dual GPU setup with the latest BOINC client
(Message 5736)
Posted 17 Dec 2013 by valterc Post: Thanks Mikey for your hints. I will try this later, and report back, even if I have the bad feeling that the Moo tasks will use both gpus regardless of the boinc configuration... we'll see.. Anyway, do you agree with me that this kind of micromanagement is far too complicated for the casual user? |
7)
Message boards :
Number crunching :
Dual GPU setup with the latest BOINC client
(Message 5730)
Posted 16 Dec 2013 by valterc Post: As many of you already knows running Moo with a dual+ gpu hw with the latest boinc client is really complicated. We have the following possible scenarios: 1- revert to BOINC 6.xxx thus losing any openCL capability. 2- do nothing: tasks are downloaded but never start. 3- app_config with <gpu_usage>1.00</gpu_usage> (I tested it on a couple of HD6950 and noticed a *huge* performance drop) 4- same as above but also adding <max_concurrent>1</max_concurrent> (this works but *only* if Moo is the only gpu project you run) The problem is that, regardless of the configuration you use (3 or 4) a Moo workunit will always notices that you have a dual gpu setup and tries to start two "crunchers" (in the third scenario the result is 4 threads fighting for just 2 gpus). This is something that comes by design from the original dnetc.com client. I don't think that something can be done here, by the Moo administrators, for solving this.... But having the source code of the client and a little bit of knowledge and time to spent (that I don't have by now....) shouldn't be too difficult to just insert a couple of lines in order to completely disable the multithread capabilities of the client, thus solving a lot of problems here. Running Moo out of the box is getting more and more complicated... The lack of proper support of the latest AMD Tahiti (and newer) gpus and problems like the one I just described above are continuously pushing away users from this project... |
8)
Message boards :
News :
Transitioner problem solved
(Message 4970)
Posted 28 May 2013 by valterc Post: I have one (just one) wu in 'pending validation' state since the 23rd of May. This: http://moowrap.net/results.php?userid=1662&offset=0&show_names=1&state=2&appid=. Maybe there are others around... |
9)
Message boards :
Number crunching :
132k sec. to finish a wu?
(Message 2761)
Posted 24 Feb 2012 by valterc Post: More instuctions about the script: a) install perl (it's a scripting programming language, similar to php, get it from the activestate site, it's free, download the free community edition for your arch 32 or 64 bit) b) paste the code into notepad and save it somewhere giving a name like cpu_watch.pl c) using notepad modify, if needed, some of the variables of the script, check the boinc location and the limit (now it's 5 minutes of cpu time) d) you can try it by double clicking it... e) schedule it to run every some time (like 10 minutes) using the windows task scheduler or the windows at command fom a cmd shell. if you run the script every 10 minutes with the 5 minutes limit you will loose at worst ~15 minutes of gpu crunching. in the same place you put the script you will find the logfile (cpu_watch.log). If the script suspends a wu you can later decide if you want to abort or continue it. hope this helps, tell me if you find problems |
10)
Message boards :
Number crunching :
132k sec. to finish a wu?
(Message 2749)
Posted 23 Feb 2012 by valterc Post: I also got few of them, maybe 2 or 3 in the last two months. It seems that, for yet unknown reasons, the computation will be directed to the cpu instead of the gpu. I don't know how to solve this. Meanwhile I wrote a little script that simply suspend a task if its cpu time is greater than some user defined limit. Here it is: use POSIX qw(strftime); my $path = "C:\\Program Files\\boinc"; my $url = 'http://moowrap.net/'; my $task = 'dnetc'; my $limit = 5*60; # time in seconds (cpu time) @tasks = `"$path\\boinccmd" --get_results`; for $i (0..$#tasks) { if($tasks[$i]=~/^(\d+)\)/) { if($tasks[++$i] =~ /$task/) { $tasks[$i] =~ /name: (.*)/; $name = $1; $tasks[$i + 12] =~ /state: (.*)/; #check if running if($1 == 1) { $tasks[$i + 16] =~ /time: (.*)/; $cputime = $1; print $name, " ",$cputime; if($cputime > $limit) { print " (suspending)\n"; `"$path\\boinccmd" --result $url $name suspend`; open(LOG, '>>cpu_watch.log'); $now = strftime "%Y-%m-%d %H:%M:%S - ", localtime; print LOG $now, ' ', $name; close(LOG); } else { print "\n"; } } } } } It's written in perl, get it from activestate, and schedule this to run every (say) 10 minutes.... It should be self-explanatory, just drop me a note if you need help. edit: aarghh. the code tag don't understand indentation spaces.... |
11)
Message boards :
Number crunching :
ATI 12.1 - Preview
(Message 2383)
Posted 23 Jan 2012 by valterc Post: Thank you all for your suggestions, I will try to do something about this next monday an will let you know the results. This note is just to inform everyone that after the recent application change (1.3) any problem I had simply disappeared... Everything seems going on smoothly. |
12)
Message boards :
Number crunching :
ATI 12.1 - Preview
(Message 2222)
Posted 13 Jan 2012 by valterc Post: Thank you all for your suggestions, I will try to do something about this next monday an will let you know the results. However, with my setup, I don't have cpu utilization problems with 12.1p, but I get errors like this from time to time: 06:11:12 (3760): premature exit detected, app exit status: 0xc0000005 To summarize: - Catalyst 10.12 - no errors, high cpu usage - Catalyst 12.1p - random errors (but still getting some credits even if marked invalid), low cpu usage Thank you all again |
13)
Message boards :
Number crunching :
ATI 12.1 - Preview
(Message 2216)
Posted 13 Jan 2012 by valterc Post: For Zydor: So, are you suggesting that I revert back to Catalyst 10.12 and use an app_info.xml to limit the cpu usage? I can give it a try... is this one a good starting point ? http://moowrap.net/download/app_info-win32-ati14-example.xml |
14)
Message boards :
Number crunching :
ATI 12.1 - Preview
(Message 2210)
Posted 13 Jan 2012 by valterc Post: The HD6950 is a good card (similar to the HD5870). I have problems only here at moo.... At first I had just one card and everything worked well (no errors, very low cpu usage). When I added another one the moo exe started to use, at full, every cpu core I left free (but crunching regularly on both the cards). This happened with the 10.12 driver (1.4.900) That's why I switched the driver to the 12.1 preview (low gpu usage but errors from time to time....) (I also see that Zydor has similar problems with this driver) Maybe there is another driver that I can try... any suggestions? |
15)
Message boards :
Number crunching :
ATI 12.1 - Preview
(Message 2188)
Posted 11 Jan 2012 by valterc Post: Hi all, I recently installed the 12.1 preview on this host http://moowrap.net/results.php?hostid=674, 2xHD6950 not crossfired (using the vga dummy plug), on a i7-930 system (1 core free out of 8 hyperthreaded), Boinc 6.10.60. Things are going this way: Moo crunches happily most of the time with very limited cpu usage. But sometime I get errors like this: 06:11:12 (3760): input buffer 10 packets (1792 bytes), checkpoint file 2 packets (384 bytes), output buffer 0 packets (0 bytes) 06:11:12 (3760): premature exit detected, app exit status: 0xc0000005 06:11:12 (3760): wrapper: running dnetc518-win32-x86-stream.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 2/10 A windows popup notices me that the dnetc task stopped to work. All those tasks get a "validation inconclusive" status. After the wingman completed his one the status becomes "invalid" with some credit granted..... (is it true?) Pleas give me advices about how to solve this.... |
16)
Message boards :
Number crunching :
strange complete times being seen
(Message 2102)
Posted 5 Jan 2012 by valterc Post: It seems that you're getting shorter workunits in the new box..... |
17)
Message boards :
Number crunching :
Faulty WUs
(Message 1698)
Posted 13 Dec 2011 by valterc Post: Something has changed. The log file (stderr) of the new work units is longer than some time ago (check a new one against an old one and see by yourself). See this http://moowrap.net/result.php?resultid=4876874 and http://moowrap.net/result.php?resultid=5451946. Stepping 1 packet at a time instead of 64. Also the pattern I see while looking at the gpu usage with afterburner is different (less linear) |