Message boards :
Number crunching :
OK -- that's twice now
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Jun 11 Posts: 34 Credit: 6,451,570,275 RAC: 6,656,931 |
Any information regarding the two outages? |
Send message Joined: 22 Jun 11 Posts: 2080 Credit: 1,844,407,912 RAC: 3,236 |
Any information regarding the two outages? I have not seen anything either but have moved my MilkyWay capable machines ever there and a couple of other machines here and there. I STILL have some here but not nearly as many as I used to have. I brought them ALL back after the first outage, but may wait a bit now before bringing them back. It is a pain to find machines not crunching! |
Send message Joined: 20 Jun 11 Posts: 34 Credit: 6,451,570,275 RAC: 6,656,931 |
Happened again this morning -- for a much shorter time. I'm simply letting my clients complete existing work and move over to MW, Collatz, and POEM and World Grid. Perhaps we will get some information regarding the outages. Moo has been very reliable, that offsets somewhat the lack of information that is made available to users. |
Send message Joined: 20 Jun 11 Posts: 34 Credit: 6,451,570,275 RAC: 6,656,931 |
Still no explanation of the troika of outages this past week. That isn't, shall we say, confidence inspiring.... |
Send message Joined: 22 Jun 11 Posts: 2080 Credit: 1,844,407,912 RAC: 3,236 |
Still no explanation of the troika of outages this past week. That isn't, shall we say, confidence inspiring.... I am beginning to agree, one would think they would have had time to post SOMETHING by now!!! |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
Hi, First, apologies for taking so long to post. There indeed were two incidents, first on the weekend of 24th Nov and then again for the night of 29th Nov. In both cases the server was unresponsive (even for my remote access attempts) and I had to request a reboot which I did as soon as I realized the situation. While I don't know exactly why the server froze usually these kinds of cases have been due to the server running out of memory and thus even the remote shell service fails to respond. I did update the basic services on the server after these incidents in the hope it'll prevent any known problems. I also need to figure out why I didn't get any notification from our monitoring system for these outages. At least on the first it took way too long for me to realize there was a problem (server went down on the weekend and I didn't check until Monday morning before I left for work). -w |
Send message Joined: 20 Jun 11 Posts: 34 Credit: 6,451,570,275 RAC: 6,656,931 |
Teemu, thanks for the report back. A few BOINC projects have had some problems of late -- glad this one is resolved (for now). |
Send message Joined: 22 Jun 11 Posts: 2080 Credit: 1,844,407,912 RAC: 3,236 |
Teemu, thanks for the report back. A few BOINC projects have had some problems of late -- glad this one is resolved (for now). AGREED...THANKS TEEMU!!!! |
Send message Joined: 27 Jul 11 Posts: 342 Credit: 252,653,488 RAC: 0 |
Many thanks forthe heads up, Teemu |
Send message Joined: 14 Oct 11 Posts: 5 Credit: 58,058,899 RAC: 0 |
i had to reinstall windows and forgot my app_info file. i tried to use another one but the message i get from moo! is: file referenced in app_info does not exist: dnetc_1.03_windows_intelx86__ati14.exe and i get another: file referenced in app_info does not exist: dnetc518-win32-x86-stream.exe. is there a valid app_info that works? i am using 5850 win7 64 |
Send message Joined: 22 Jun 11 Posts: 2080 Credit: 1,844,407,912 RAC: 3,236 |
i had to reinstall windows and forgot my app_info file. i tried to use another one but the message i get from moo! is: file referenced in app_info does not exist: dnetc_1.03_windows_intelx86__ati14.exe This one did when I got it: <app_info> <app> <name>dnetc</name> <user_friendly_name>Distributed.net Client</user_friendly_name> </app> <file_info> <name>dnetc_wrapper_1.3_windows_intelx86__ati14.exe</name> <executable/> </file_info> <file_info> <name>dnetc518-win32-x86-stream.exe</name> <executable/> </file_info> <file_info> <name>dnetc-gpu-1.3.ini</name> </file_info> <file_info> <name>job-ati14-1.00.xml</name> </file_info> <app_version> <app_name>dnetc</app_name> <version_num>102</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.050000</avg_ncpus> <max_ncpus>0.895864</max_ncpus> <plan_class>ati14</plan_class> <flops>1157115231469.729200</flops> <api_version>7.0.8</api_version> <file_ref> <file_name>dnetc_wrapper_1.3_windows_intelx86__ati14.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>dnetc518-win32-x86-stream.exe</file_name> <copy_file/> </file_ref> <file_ref> <file_name>dnetc-gpu-1.3.ini</file_name> <open_name>dnetc.ini</open_name> <copy_file/> </file_ref> <file_ref> <file_name>job-ati14-1.00.xml</file_name> <open_name>job.xml</open_name> <copy_file/> </file_ref> <coproc> <type>ATI</type> <count>1.000</count> </coproc> <gpu_ram>262144000.000000</gpu_ram> </app_version> </app_info> There IS a new way of running more then one unit at a time IF you are using Boinc version 7.0.42 or higher, it is the new 'app_config.xml' file and here is an example of one: <app_config> <app> <name>milkyway</name> <gpu_versions> <gpu_usage>0.5</gpu_usage> <cpu_usage>0.05</cpu_usage> </gpu_versions> </app> </app_config> Save the text file as app_config.xml in your Milkyway directory (….\BOINC\projects\milkyway.cs.rpi_milkyway). Essentially it tells the pc to run two gpu units at once and only use 0.5% of the cpu to feed the gpu. I THINK you can just change the word milkyway to moo and it should work. |
Send message Joined: 20 Jun 11 Posts: 34 Credit: 6,451,570,275 RAC: 6,656,931 |
Those periodic unexplained outages are back. |
Send message Joined: 22 Jun 11 Posts: 2080 Credit: 1,844,407,912 RAC: 3,236 |
Those periodic unexplained outages are back. Sorry but no blaming me this time, I have moved my gpu's elsewhere. |
Send message Joined: 20 Jun 11 Posts: 34 Credit: 6,451,570,275 RAC: 6,656,931 |
Yeah, I get it -- I realize that two months ago there was server code updating going on -- but no feedback since then. As we had another 'event' today, I suppose I'll set up for No New Work and then go into watch/wait mode here. Those periodic unexplained outages are back. |
Send message Joined: 20 Jun 11 Posts: 34 Credit: 6,451,570,275 RAC: 6,656,931 |
As to redirecting to other GPU projects -- perhaps our quiet admin agrees. That would explain the current lack of new work units. Then again, this is speculation as there is limited news about status here generally. |
Send message Joined: 22 Jun 11 Posts: 2080 Credit: 1,844,407,912 RAC: 3,236 |
As to redirecting to other GPU projects -- perhaps our quiet admin agrees. That would explain the current lack of new work units. Actually there are PLENTY of units: Work Total Tiny Small Normal Huge Tasks ready to send 2,690 903 467 478 842 if you aren't getting any there could be local or Moo reasons, but lack of units doesn't seem to be it. |
Send message Joined: 20 Jun 11 Posts: 34 Credit: 6,451,570,275 RAC: 6,656,931 |
Looks like the effort to get more activity and users has the unintended consequence of overloading the software or the hardware. Reporting work is problematic due to : can't open database. |
Send message Joined: 20 Jun 11 Posts: 34 Credit: 6,451,570,275 RAC: 6,656,931 |
It isn't that the volume has pushed the servers offline -- rather that it appears to be more like traffic congestion -- my reports eventually went through on the fourth of fifth try. I figure to back off of Moo for a bit until the peak volume abates. |