OK -- that's twice now

\n studio-striking\n

Message boards : Number crunching : OK -- that's twice now
Message board moderation

To post messages, you must log in.

AuthorMessage
BarryAZ

Send message
Joined: 20 Jun 11
Posts: 34
Credit: 6,432,496,909
RAC: 6,585,410
Message 4206 - Posted: 29 Nov 2012, 5:24:42 UTC

Any information regarding the two outages?
ID: 4206 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,844,401,576
RAC: 3,150
Message 4208 - Posted: 29 Nov 2012, 12:09:55 UTC - in response to Message 4206.  

Any information regarding the two outages?


I have not seen anything either but have moved my MilkyWay capable machines ever there and a couple of other machines here and there. I STILL have some here but not nearly as many as I used to have. I brought them ALL back after the first outage, but may wait a bit now before bringing them back. It is a pain to find machines not crunching!
ID: 4208 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 20 Jun 11
Posts: 34
Credit: 6,432,496,909
RAC: 6,585,410
Message 4211 - Posted: 29 Nov 2012, 19:40:49 UTC

Happened again this morning -- for a much shorter time. I'm simply letting my clients complete existing work and move over to MW, Collatz, and POEM and World Grid.

Perhaps we will get some information regarding the outages.

Moo has been very reliable, that offsets somewhat the lack of information that is made available to users.
ID: 4211 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 20 Jun 11
Posts: 34
Credit: 6,432,496,909
RAC: 6,585,410
Message 4218 - Posted: 30 Nov 2012, 16:58:00 UTC - in response to Message 4211.  

Still no explanation of the troika of outages this past week. That isn't, shall we say, confidence inspiring....
ID: 4218 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,844,401,576
RAC: 3,150
Message 4225 - Posted: 2 Dec 2012, 12:03:36 UTC - in response to Message 4218.  

Still no explanation of the troika of outages this past week. That isn't, shall we say, confidence inspiring....


I am beginning to agree, one would think they would have had time to post SOMETHING by now!!!
ID: 4225 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 388
Credit: 822,356,221
RAC: 0
Message 4245 - Posted: 7 Dec 2012, 16:14:21 UTC

Hi,

First, apologies for taking so long to post.

There indeed were two incidents, first on the weekend of 24th Nov and then again for the night of 29th Nov. In both cases the server was unresponsive (even for my remote access attempts) and I had to request a reboot which I did as soon as I realized the situation. While I don't know exactly why the server froze usually these kinds of cases have been due to the server running out of memory and thus even the remote shell service fails to respond.

I did update the basic services on the server after these incidents in the hope it'll prevent any known problems. I also need to figure out why I didn't get any notification from our monitoring system for these outages. At least on the first it took way too long for me to realize there was a problem (server went down on the weekend and I didn't check until Monday morning before I left for work).

-w
ID: 4245 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 20 Jun 11
Posts: 34
Credit: 6,432,496,909
RAC: 6,585,410
Message 4247 - Posted: 7 Dec 2012, 16:25:30 UTC

Teemu, thanks for the report back. A few BOINC projects have had some problems of late -- glad this one is resolved (for now).

ID: 4247 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,844,401,576
RAC: 3,150
Message 4252 - Posted: 8 Dec 2012, 12:33:32 UTC - in response to Message 4247.  
Last modified: 8 Dec 2012, 12:33:42 UTC

Teemu, thanks for the report back. A few BOINC projects have had some problems of late -- glad this one is resolved (for now).


AGREED...THANKS TEEMU!!!!
ID: 4252 · Rating: 0 · rate: Rate + / Rate - Report as offensive
John Clark

Send message
Joined: 27 Jul 11
Posts: 342
Credit: 252,653,488
RAC: 0
Message 4261 - Posted: 10 Dec 2012, 0:41:58 UTC

Many thanks forthe heads up, Teemu
ID: 4261 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jimbo

Send message
Joined: 14 Oct 11
Posts: 5
Credit: 58,058,899
RAC: 0
Message 4463 - Posted: 21 Jan 2013, 3:36:47 UTC
Last modified: 21 Jan 2013, 3:43:22 UTC

i had to reinstall windows and forgot my app_info file. i tried to use another one but the message i get from moo! is: file referenced in app_info does not exist: dnetc_1.03_windows_intelx86__ati14.exe
and i get another: file referenced in app_info does not exist: dnetc518-win32-x86-stream.exe. is there a valid app_info that works?
i am using 5850 win7 64
ID: 4463 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,844,401,576
RAC: 3,150
Message 4464 - Posted: 21 Jan 2013, 12:31:46 UTC - in response to Message 4463.  
Last modified: 21 Jan 2013, 12:32:12 UTC

i had to reinstall windows and forgot my app_info file. i tried to use another one but the message i get from moo! is: file referenced in app_info does not exist: dnetc_1.03_windows_intelx86__ati14.exe
and i get another: file referenced in app_info does not exist: dnetc518-win32-x86-stream.exe. is there a valid app_info that works?
i am using 5850 win7 64


This one did when I got it:
<app_info>
<app>
<name>dnetc</name>
<user_friendly_name>Distributed.net Client</user_friendly_name>
</app>
<file_info>
<name>dnetc_wrapper_1.3_windows_intelx86__ati14.exe</name>
<executable/>
</file_info>
<file_info>
<name>dnetc518-win32-x86-stream.exe</name>
<executable/>
</file_info>
<file_info>
<name>dnetc-gpu-1.3.ini</name>
</file_info>
<file_info>
<name>job-ati14-1.00.xml</name>
</file_info>
<app_version>
<app_name>dnetc</app_name>
<version_num>102</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.050000</avg_ncpus>
<max_ncpus>0.895864</max_ncpus>
<plan_class>ati14</plan_class>
<flops>1157115231469.729200</flops>
<api_version>7.0.8</api_version>
<file_ref>
<file_name>dnetc_wrapper_1.3_windows_intelx86__ati14.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>dnetc518-win32-x86-stream.exe</file_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>dnetc-gpu-1.3.ini</file_name>
<open_name>dnetc.ini</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>job-ati14-1.00.xml</file_name>
<open_name>job.xml</open_name>
<copy_file/>
</file_ref>
<coproc>
<type>ATI</type>
<count>1.000</count>
</coproc>
<gpu_ram>262144000.000000</gpu_ram>
</app_version>
</app_info>

There IS a new way of running more then one unit at a time IF you are using Boinc version 7.0.42 or higher, it is the new 'app_config.xml' file and here is an example of one:

<app_config>
<app>
<name>milkyway</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.05</cpu_usage>
</gpu_versions>
</app>
</app_config>

Save the text file as app_config.xml in your Milkyway directory
(….\BOINC\projects\milkyway.cs.rpi_milkyway). Essentially it tells the pc to run two gpu units at once and only use 0.5% of the cpu to feed the gpu. I THINK you can just change the word milkyway to moo and it should work.
ID: 4464 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 20 Jun 11
Posts: 34
Credit: 6,432,496,909
RAC: 6,585,410
Message 4612 - Posted: 2 Mar 2013, 20:03:11 UTC

Those periodic unexplained outages are back.
ID: 4612 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,844,401,576
RAC: 3,150
Message 4615 - Posted: 3 Mar 2013, 12:01:51 UTC - in response to Message 4612.  

Those periodic unexplained outages are back.


Sorry but no blaming me this time, I have moved my gpu's elsewhere.
ID: 4615 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 20 Jun 11
Posts: 34
Credit: 6,432,496,909
RAC: 6,585,410
Message 4624 - Posted: 5 Mar 2013, 17:34:31 UTC - in response to Message 4615.  

Yeah, I get it -- I realize that two months ago there was server code updating going on -- but no feedback since then.

As we had another 'event' today, I suppose I'll set up for No New Work and then go into watch/wait mode here.


Those periodic unexplained outages are back.


Sorry but no blaming me this time, I have moved my gpu's elsewhere.

ID: 4624 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 20 Jun 11
Posts: 34
Credit: 6,432,496,909
RAC: 6,585,410
Message 4669 - Posted: 18 Mar 2013, 16:30:35 UTC

As to redirecting to other GPU projects -- perhaps our quiet admin agrees. That would explain the current lack of new work units.

Then again, this is speculation as there is limited news about status here generally.
ID: 4669 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,844,401,576
RAC: 3,150
Message 4671 - Posted: 18 Mar 2013, 21:16:05 UTC - in response to Message 4669.  

As to redirecting to other GPU projects -- perhaps our quiet admin agrees. That would explain the current lack of new work units.

Then again, this is speculation as there is limited news about status here generally.


Actually there are PLENTY of units:
Work Total Tiny Small Normal Huge
Tasks ready to send 2,690 903 467 478 842

if you aren't getting any there could be local or Moo reasons, but lack of units doesn't seem to be it.
ID: 4671 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 20 Jun 11
Posts: 34
Credit: 6,432,496,909
RAC: 6,585,410
Message 4881 - Posted: 14 May 2013, 15:37:04 UTC - in response to Message 4206.  

Looks like the effort to get more activity and users has the unintended consequence of overloading the software or the hardware. Reporting work is problematic due to : can't open database.
ID: 4881 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 20 Jun 11
Posts: 34
Credit: 6,432,496,909
RAC: 6,585,410
Message 4883 - Posted: 14 May 2013, 16:36:14 UTC - in response to Message 4881.  

It isn't that the volume has pushed the servers offline -- rather that it appears to be more like traffic congestion -- my reports eventually went through on the fourth of fifth try. I figure to back off of Moo for a bit until the peak volume abates.
ID: 4883 · Rating: 0 · rate: Rate + / Rate - Report as offensive

Message boards : Number crunching : OK -- that's twice now


 
Copyright © 2011-2024 Moo! Wrapper Project