Questions and Answers :
Windows :
Stuck all night.....8 hours wasted.
Message board moderation
Author | Message |
---|---|
Send message Joined: 3 May 11 Posts: 41 Credit: 165,019,076 RAC: 0 |
I had a wu stuck on 89% for 8 hours on dual 5830's....not fun. Have since relocated rig to MW which plays nicer with dual card configs. Don't have the time to babysit machines.....too busy working to pay electric bill. Team Renegades Forum |
Send message Joined: 18 May 11 Posts: 46 Credit: 1,254,302,893 RAC: 0 |
I had a wu stuck on 89% for 8 hours on dual 5830's....not fun. This is exactly what happened at DNETC. It's one reason why I asked if we can have an option to run 1 CPU/WU. Total throughput is higher too when each WU is run on a separate GPU. |
Send message Joined: 11 May 11 Posts: 44 Credit: 291,412,341 RAC: 0 |
WU 302471 was stuck on 5% for 5 hours on a single card until I aborted it. First time this happened since I started on Moo Edit WU 303025 did the same. No GPU activity. It started running after I shut down & restarted Boinc manager the 2nd time. GPU runs cool on 64 deg.C No other activity on the PC |
Send message Joined: 2 May 11 Posts: 5 Credit: 100,847,975 RAC: 0 |
And I'll add another one to the pot. Task 349548 ran for not quite four hours before I caught it. Stuck at 95% (this is on a 2x5970 machine). |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
Hi, I understand that wasting crunch time is not fun and my aim to minimize that as much as possible. That said, there's some known problem even with v1.2 that I'm planning to work on for the next version. Task 349548 is something that checkpoint interval detection is supposed to catch. Next version hopefully calculates more accurately these intervals. Although, it was set to 2h and it seems it was not caught with 4h. Maybe there's a bug there too, need to take look. :( How odd that there's nothing on the log to indicate problems. Tasks 303025 and 302471 are examples of CUDA errors that the client is not recovering from. I'm going to try to add faster detection of these to the wrapper because a simple restart of the client usually solves the problem. -w |
Send message Joined: 3 May 11 Posts: 41 Credit: 165,019,076 RAC: 0 |
So far this week I have had two wu's stuck for 12 hours at a time on dual 5830's. Instead of 300-400k for the 24 hour period I got about 10k....assuming the wu's in question even validate. That was a waste of expensive energy! http://moowrap.net/results.php?hostid=462&offset=0&show_names=0&state=2&appid= The last one that I just found simply disappeared when I suspended it, then resumed it to try to get it to finish. Team Renegades Forum |
Send message Joined: 18 May 11 Posts: 46 Credit: 1,254,302,893 RAC: 0 |
Once again, this seems to be only a problem with multiple GPUs/WU. The WUs also perform less efficiently when running on more than a single GPU. Any chance Dnet can be prevailed upon to provide a switch in their client to allow crunching 1 WU/GPU? |
Send message Joined: 3 May 11 Posts: 41 Credit: 165,019,076 RAC: 0 |
11 hours on a 5970.....how cool is that. Get to my 100 mil and call it a day, sigh. Team Renegades Forum |
Send message Joined: 2 May 11 Posts: 21 Credit: 173,527,396 RAC: 0 |
Hello, I only have one box on this project but it does have twin 5970's so 4 GPU's I used to have all sorts of trouble on dnetc they errored out every third or 4th work unit thats if the cards would not just freeze from time to time here however has been a sea change i've had no problems from the box at all i have had one or two errors but i caused them myself while fiddling with card timing So just thought i'd come in and let teemu know not everybody with multiple cards is having these problems and i'm hoping you dont change something thats going to make this project as unstable as dnetc for my box :( best regards Ian ----> Please Join team Scotland HERE |
Send message Joined: 3 May 11 Posts: 41 Credit: 165,019,076 RAC: 0 |
...But some of us still are. dual 5830's.....stuck 12 hours. 5k instead of 250k. Lesson learned.....keep my dual gpu box elsewhere. Team Renegades Forum |
Send message Joined: 5 May 11 Posts: 233 Credit: 351,414,150 RAC: 0 |
I have found - with DNETC in particular - that pushing the card(s) too hard can cause problems with the Core DNET application (its the same core application for both DNETC and here). I had a number of hangs with DNETC - to say the least, I got two a day over there - but only one here since 5 May. After that one, I tweeked the voltage a little up and it seemed happy, not had a hang since, no idea if it was "the cause" or a one off hassle. May not be the only problem with the Core application, may not even be a permanent problem frankly, so I am not putting it forward as a magical silver bullett, however Moo in comparison to DNETC is the Rock of Gibralter, so at the end of the day, personally I am well happy. Maybe worth bringing down the GPU 5 or 10 see how that goes. Regards Zy |
Send message Joined: 2 May 11 Posts: 4 Credit: 1,084,865,946 RAC: 0 |
Here is my 2 cents worth.... My System Core i7 920@2.7Ghz Sapphire HD 5830 w/Zalman VF300: Core clk-950Mhz, Mem clk-900Mhz Windows 7 Home Premium x64 AMD Catalyst v11.5 MSI Afterburner v2.0.0 BM v6.10.58 Starting crunching Moo again and after the first 24 hours not a single issue. GPU temp hovers between 55~59C. I think Zydor is correct. If you are OC your GPU(s) and having issues with Moo, try resetting all clks to stock and run for a day. Then try upping clks like 10~20Mhz and run for another day. Keep doing this until you encounter problems or hit your GPU OC limit. Also GPU temp(s) can be an source of problems. The hotter a processor runs, the more likely it can generate errors. |