Stuck all night.....8 hours wasted.

log in

Advanced search

Questions and Answers : Windows : Stuck all night.....8 hours wasted.

Author Message
vaio [The Lone Gunman]
Send message
Joined: 3 May 11
Posts: 41
Credit: 165,019,076
RAC: 0
Message 449 - Posted: 23 May 2011, 18:10:23 UTC

I had a wu stuck on 89% for 8 hours on dual 5830's....not fun.
Have since relocated rig to MW which plays nicer with dual card configs.

Don't have the time to babysit machines.....too busy working to pay electric bill.
____________
Team Renegades
Forum

Profile Beyond
Avatar
Send message
Joined: 18 May 11
Posts: 46
Credit: 1,254,302,893
RAC: 0
Message 456 - Posted: 24 May 2011, 1:37:22 UTC - in response to Message 449.

I had a wu stuck on 89% for 8 hours on dual 5830's....not fun.
Have since relocated rig to MW which plays nicer with dual card configs.

Don't have the time to babysit machines.....too busy working to pay electric bill.

This is exactly what happened at DNETC. It's one reason why I asked if we can have an option to run 1 CPU/WU. Total throughput is higher too when each WU is run on a separate GPU.

Copycat-Digital for WCG*Project donor
Avatar
Send message
Joined: 11 May 11
Posts: 44
Credit: 291,412,341
RAC: 0
Message 458 - Posted: 24 May 2011, 7:21:18 UTC
Last modified: 24 May 2011, 8:02:25 UTC

WU 302471 was stuck on 5% for 5 hours on a single card until I aborted it.
First time this happened since I started on Moo

Edit

WU 303025 did the same. No GPU activity. It started running after I shut down & restarted Boinc manager the 2nd time. GPU runs cool on 64 deg.C No other activity on the PC

Maxwell [MM]
Send message
Joined: 2 May 11
Posts: 5
Credit: 100,847,975
RAC: 0
Message 487 - Posted: 26 May 2011, 21:01:02 UTC

And I'll add another one to the pot. Task 349548 ran for not quite four hours before I caught it. Stuck at 95% (this is on a 2x5970 machine).

Profile Teemu Mannermaa
Project administrator
Project developer
Project tester
Send message
Joined: 20 Apr 11
Posts: 356
Credit: 749,579,969
RAC: 143,264
Message 523 - Posted: 28 May 2011, 10:48:47 UTC

Hi,

I understand that wasting crunch time is not fun and my aim to minimize that as much as possible. That said, there's some known problem even with v1.2 that I'm planning to work on for the next version.

Task 349548 is something that checkpoint interval detection is supposed to catch. Next version hopefully calculates more accurately these intervals. Although, it was set to 2h and it seems it was not caught with 4h. Maybe there's a bug there too, need to take look. :( How odd that there's nothing on the log to indicate problems.

Tasks 303025 and 302471 are examples of CUDA errors that the client is not recovering from. I'm going to try to add faster detection of these to the wrapper because a simple restart of the client usually solves the problem.

-w

vaio [The Lone Gunman]
Send message
Joined: 3 May 11
Posts: 41
Credit: 165,019,076
RAC: 0
Message 609 - Posted: 8 Jun 2011, 6:41:45 UTC
Last modified: 8 Jun 2011, 6:45:55 UTC

So far this week I have had two wu's stuck for 12 hours at a time on dual 5830's.
Instead of 300-400k for the 24 hour period I got about 10k....assuming the wu's in question even validate.

That was a waste of expensive energy!

http://moowrap.net/results.php?hostid=462&offset=0&show_names=0&state=2&appid=

The last one that I just found simply disappeared when I suspended it, then resumed it to try to get it to finish.
____________
Team Renegades
Forum

Profile Beyond
Avatar
Send message
Joined: 18 May 11
Posts: 46
Credit: 1,254,302,893
RAC: 0
Message 615 - Posted: 9 Jun 2011, 14:52:56 UTC

Once again, this seems to be only a problem with multiple GPUs/WU. The WUs also perform less efficiently when running on more than a single GPU. Any chance Dnet can be prevailed upon to provide a switch in their client to allow crunching 1 WU/GPU?

vaio [The Lone Gunman]
Send message
Joined: 3 May 11
Posts: 41
Credit: 165,019,076
RAC: 0
Message 632 - Posted: 12 Jun 2011, 19:54:49 UTC

11 hours on a 5970.....how cool is that.
Get to my 100 mil and call it a day, sigh.
____________
Team Renegades
Forum

scottishwebcamslive.com
Avatar
Send message
Joined: 2 May 11
Posts: 21
Credit: 173,527,396
RAC: 0
Message 633 - Posted: 13 Jun 2011, 14:37:41 UTC

Hello,

I only have one box on this project but it does have twin 5970's so 4 GPU's
I used to have all sorts of trouble on dnetc they errored out every third or 4th work unit thats if the cards would not just freeze from time to time

here however has been a sea change i've had no problems from the box at all
i have had one or two errors but i caused them myself while fiddling with card timing

So just thought i'd come in and let teemu know not everybody with multiple cards is having these problems and i'm hoping you dont change something thats going to make this project as unstable as dnetc for my box :(

best regards
Ian
____________
----> Please Join team Scotland HERE

vaio [The Lone Gunman]
Send message
Joined: 3 May 11
Posts: 41
Credit: 165,019,076
RAC: 0
Message 716 - Posted: 27 Jun 2011, 8:06:49 UTC

...But some of us still are.

dual 5830's.....stuck 12 hours.
5k instead of 250k.

Lesson learned.....keep my dual gpu box elsewhere.
____________
Team Renegades
Forum

Profile ZydorProject donor
Avatar
Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 718 - Posted: 27 Jun 2011, 11:07:10 UTC - in response to Message 716.

I have found - with DNETC in particular - that pushing the card(s) too hard can cause problems with the Core DNET application (its the same core application for both DNETC and here). I had a number of hangs with DNETC - to say the least, I got two a day over there - but only one here since 5 May. After that one, I tweeked the voltage a little up and it seemed happy, not had a hang since, no idea if it was "the cause" or a one off hassle.

May not be the only problem with the Core application, may not even be a permanent problem frankly, so I am not putting it forward as a magical silver bullett, however Moo in comparison to DNETC is the Rock of Gibralter, so at the end of the day, personally I am well happy. Maybe worth bringing down the GPU 5 or 10 see how that goes.

Regards
Zy

Fire$torm [BlackOps]
Avatar
Send message
Joined: 2 May 11
Posts: 4
Credit: 830,725,613
RAC: 307,797
Message 721 - Posted: 27 Jun 2011, 22:11:53 UTC

Here is my 2 cents worth....

My System
Core i7 920@2.7Ghz
Sapphire HD 5830 w/Zalman VF300: Core clk-950Mhz, Mem clk-900Mhz
Windows 7 Home Premium x64
AMD Catalyst v11.5
MSI Afterburner v2.0.0
BM v6.10.58

Starting crunching Moo again and after the first 24 hours not a single issue. GPU temp hovers between 55~59C.

I think Zydor is correct. If you are OC your GPU(s) and having issues with Moo, try resetting all clks to stock and run for a day. Then try upping clks like 10~20Mhz and run for another day. Keep doing this until you encounter problems or hit your GPU OC limit.

Also GPU temp(s) can be an source of problems. The hotter a processor runs, the more likely it can generate errors.
____________


Questions and Answers : Windows : Stuck all night.....8 hours wasted.


Main page · Your account · Message boards


Copyright © 2011-2017 Moo! Wrapper Project