Message boards :
Number crunching :
Big WU´s
Message board moderation
Author | Message |
---|---|
Send message Joined: 26 May 11 Posts: 568 Credit: 121,524,886 RAC: 0 |
Has anybody any idea why we have received this small WU´s for such a long period. I think it is a little waist of GPU capacity. Maybe Teemu has anya idea about this? |
Send message Joined: 20 Apr 11 Posts: 388 Credit: 822,356,221 RAC: 0 |
Has anybody any idea why we have received this small WU´s for such a long period. Scheduler seems to think you are slower these days, which is most likely due to changes I've made. I recently changed the GFLOPs limit from 500 to 200 to get Huge WUs and I'm seeing a lot more people getting bigger work now. I do agree that fast cards should get bigger units and I intend to optimize these sizes sometime in the future further. Hopefully this quick fix helps in the meantime. (Including server load since the clients don't need to fetch work so often. Not that we are having big problems load wise but still..) -w |
Send message Joined: 22 Jun 11 Posts: 2080 Credit: 1,844,401,768 RAC: 3,163 |
Has anybody any idea why we have received this small WU´s for such a long period. I was thinking about the problem of the software not figuring out which unit to send to which gpu...can you setup small list that when work is requested it checks to see which list YOUR gpu is on and sends work based on that? So after the initial small units break in period, say 7 days, you would have one list that depending on if YOUR individual gpu is on it or not decides which size workunit you would get. ie on it get a big one, not on it get the smaller ones. I was thinking then that you as an Admin could move gpu's on or off the list based on what you see on your end. |
Send message Joined: 26 May 11 Posts: 568 Credit: 121,524,886 RAC: 0 |
Has anybody any idea why we have received this small WU´s for such a long period. It is strange, just after I made the post I found out that I got bigger WU´s. Maybe you waved your magic rod Teemu, anyhow it is OK now. Thanks |
Send message Joined: 3 May 11 Posts: 5 Credit: 143,962,508 RAC: 0 |
??? 19 Mar 2012 | 8:52:16 UTC Abbruch durch Benutzer 49,056.50 24.20 --- Distributed.net Client v1.03 (ati14) |
Send message Joined: 12 Jul 11 Posts: 112 Credit: 229,191,777 RAC: 0 |
Are fragmented WU rearing its ugly head again?? Rig 9291, my fastest rig, is getting hammerd by them. And not too bad on the others but they are still getting them as I am sure everyone is getting them. This has been going on for a while, sporatic WU, but only recently seeing more of them. THIS IS NO APRIL FOOLS JOKE...OR IS IT... |
Send message Joined: 3 May 11 Posts: 5 Credit: 143,962,508 RAC: 0 |
??? 1 Apr 2012 | 23:55:14 UTC Fehler beim Berechnen 51,922.05 20.30 --- Distributed.net Client v1.03 (ati14) |
Send message Joined: 22 Jun 11 Posts: 2080 Credit: 1,844,401,768 RAC: 3,163 |
??? I don't understand. |
Send message Joined: 12 Jul 11 Posts: 112 Credit: 229,191,777 RAC: 0 |
Something was going on?? Gpu reset? every second? thats just a little bit of the log, it went on for a while. I dont know.. That 58xx is screaming tho!, 1520 sec avg. He had 2 wu like that and aborted quite a bit. <stderr_txt> after a pause... [Apr 01 23:41:51 UTC] Thread 0 was terminated in unexpected way. Restarting after a pause... [Apr 01 23:41:52 UTC] Thread 0 was terminated in unexpected way. Restarting after a pause... [Apr 01 23:41:53 UTC] Thread 0 was terminated in unexpected way. Restarting after a pause... [Apr 01 23:41:54 UTC] Thread 0 was terminated in unexpected way. Restarting after a pause... [Apr 01 23:41:55 UTC] Thread 0 was terminated in unexpected way. Restarting after a pause... [Apr 01 23:41:56 UTC] Thread 0 was terminated in unexpected way. Restarting after a pause... [Apr 01 23:41:57 UTC] Thread 0 was terminated in unexpected way. Restarting after a pause... [Apr 01 23:41:58 UTC] Thread 0 was terminated in unexpected way. Restarting after a pause... [Apr 01 23:41:59 UTC] Thread 0 was terminated in unexpected way. Restarting after a pause... [Apr 01 23:42:00 UTC] Thread 0 was terminated in unexpected way. Restarting after a pause... [Apr 01 23:42:01 UTC] Thread 0 was terminated in unexpected way. Restarting after a pause... |
Send message Joined: 22 Jun 11 Posts: 2080 Credit: 1,844,401,768 RAC: 3,163 |
Something was going on?? Gpu reset? every second? thats just a little bit of the log, it went on for a while. I dont know.. I wish my 5870's would do that!! I just put 2 in the same machine and my RAC is going DOWN in that machine!!! It used to have 1 5870 in the machine, now it has 2! I am hoping it is short term!! |
Send message Joined: 2 May 11 Posts: 54 Credit: 117,821,513 RAC: 0 |
Are fragmented WU rearing its ugly head again?? Too bad we can't give the small WU's to people who just joined. Then after 3 months or so, they pass the "initiation phase". |
Send message Joined: 12 Jul 11 Posts: 112 Credit: 229,191,777 RAC: 0 |
The wu's with (triple digit)XXX_XXX do not have the same effect we were seeing when they first came out. Still finish apx.28/30 min, GPU work still drops around 88% Its all good. On a side note, summer is on here in North Georgia, and no way I can crunch and use some a/c..so for right now ill crunch 2 rigs at nite for 14 hours on, 10 off and try leaving my fastest rig up 24/7. Make the ole energy bill look good. |
Send message Joined: 3 May 11 Posts: 5 Credit: 143,962,508 RAC: 0 |
??? hier more info <core_client_version>7.0.20</core_client_version> <![CDATA[ <message> Maximum disk usage exceeded </message> <stderr_txt> after a pause... [Apr 01 23:41:51 UTC] Thread 0 was terminated in unexpected way. Restarting after a pause... [Apr 01 23:41:52 UTC] Thread 0 was terminated in unexpected way. Restarting after a pause... ... [Apr 01 23:53:12 UTC] Thread 0 was terminated in unexpected way. Restarting after a pause... [Apr 01 23:53:13 UTC] *Break* (found exit flag file) [Apr 01 23:53:13 UTC] RC5-72: Saved CE:E2A3B000:00000000:64*2^32 (36.50% done) 0.03:23:30.73 - [8,125,583 keys/s] [Apr 01 23:53:13 UTC] RC5-72: 1 packet (64.00 stats units) is in in.r72 [Apr 01 23:53:13 UTC] RC5-72: 12 packets (731.00 stats units) are in out.r72 [Apr 01 23:53:13 UTC] sorry, i dont speak english |
Send message Joined: 22 Jun 11 Posts: 2080 Credit: 1,844,401,768 RAC: 3,163 |
??? Your English is a WHOLE lot better than my German!!!!!!!!!!! One of the error messages is "Maximum disk usage exceeded" to me that means increase the amount of disk space available to Boinc. To do that go into the Boinc Manager, down by the clock, and click on the 'disk and memory usage' tab and 'use at most', 'leave at least' and 'use at most' boxes are where changes should be made. |
Send message Joined: 3 May 11 Posts: 5 Credit: 143,962,508 RAC: 0 |
thx for these information |
Send message Joined: 22 Jun 11 Posts: 2080 Credit: 1,844,401,768 RAC: 3,163 |
thx for these information I just hope it works!! |