Too high CPU core estimation

\n studio-striking\n

Message boards : Number crunching : Too high CPU core estimation
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Beyond
Avatar

Send message
Joined: 18 May 11
Posts: 46
Credit: 1,254,302,893
RAC: 0
Message 638 - Posted: 14 Jun 2011, 23:49:27 UTC

Some of my combination NVidia / ATI GPU machines alternate between Moo!, Collatz & MW. When Moo! is swapped in often one of the CPU tasks is suspended. I believe this is because Moo! is set to use .84 GPU even though it uses very little. When it pairs with the .17 GPU used by the PrimeGrid app that is always running on the NVidia it adds up to over 1 CPU and suspends a CPU process. There's no reason it should be calling for .84 CPU Can this estimation be lowered or do I have to go to the bother of an app_info.xml?
ID: 638 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 388
Credit: 822,356,221
RAC: 0
Message 644 - Posted: 17 Jun 2011, 12:09:29 UTC - in response to Message 638.  

There's no reason it should be calling for .84 CPU Can this estimation be lowered or do I have to go to the bother of an app_info.xml?


The code that calculates that value is default and I tried to figure out why it sets it to a high number like that. It's calculating that by assuming 1% of the task is run on CPU, which also scales up if you have more than one GPU. Unfortunately CPUs are so much underpowered compared to GPUs (especially ATI cards) that the value usually ends somewhere in .8 to .95 on most power crunching rigs. :(

I'll change that value on the server side. Probably will set it to a static 0.05 that's popular on other GPU projects.

-w
ID: 644 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Beyond
Avatar

Send message
Joined: 18 May 11
Posts: 46
Credit: 1,254,302,893
RAC: 0
Message 647 - Posted: 17 Jun 2011, 16:38:50 UTC - in response to Message 644.  

There's no reason it should be calling for .84 CPU Can this estimation be lowered or do I have to go to the bother of an app_info.xml?

The code that calculates that value is default and I tried to figure out why it sets it to a high number like that. It's calculating that by assuming 1% of the task is run on CPU, which also scales up if you have more than one GPU. Unfortunately CPUs are so much underpowered compared to GPUs (especially ATI cards) that the value usually ends somewhere in .8 to .95 on most power crunching rigs. :(

I'll change that value on the server side. Probably will set it to a static 0.05 that's popular on other GPU projects.

-w

Thanks Teemu. Interesting that the CPU reservation doesn't really affect how much CPU is being used, just that when it's set too high it keeps BOINC CPU tasks from running.
ID: 647 · Rating: 0 · rate: Rate + / Rate - Report as offensive
frankhagen

Send message
Joined: 2 May 11
Posts: 27
Credit: 1,151,788
RAC: 0
Message 648 - Posted: 17 Jun 2011, 18:39:56 UTC - in response to Message 647.  

Interesting that the CPU reservation doesn't really affect how much CPU is being used, just that when it's set too high it keeps BOINC CPU tasks from running.


acutally boinc does NOT monitor CPU usage, it just relies on what the project tells. OMFG!
ID: 648 · Rating: 0 · rate: Rate + / Rate - Report as offensive
scottishwebcamslive.com
Avatar

Send message
Joined: 2 May 11
Posts: 21
Credit: 173,527,396
RAC: 0
Message 650 - Posted: 18 Jun 2011, 8:54:46 UTC

Hello,

my cpu usage sits at 0.96 or even 0.97 at times
maybe I'm not understanding this but wont capping this at 0.05 strangle the GPU of the resources its using just now and therefore less work gets done ?

best regards
Ian
----> Please Join team Scotland HERE
ID: 650 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 651 - Posted: 18 Jun 2011, 12:41:55 UTC - in response to Message 650.  
Last modified: 18 Jun 2011, 12:49:59 UTC

.... my cpu usage sits at 0.96 or even 0.97 at times
maybe I'm not understanding this but wont capping this at 0.05 strangle the GPU of the resources its using just now and therefore less work gets done ?


Shouldnt do, if the Code assumption is 1%, in fact 0.01 would probably do it, but thats flying a little close to the wind, so 0.05 is a very safe value to use. The difference between 0.05 and 0.97 does not affect in reality how much CPU is used, the latter depends on the requests made by the GPU on a CPU, and thats outside BOINC's control and outside the scope of this value change. Its unpredictable from the BOINC code perspective, as it will vary from Project to Project. In effect the correct value has to be hard coded, there's no other way without a huge recoding which would be a little silly when all thats needed is the ability to manually set it.

Its the default value - before adjustment by a Project - that is too high. It just affects the reservation placed on the CPU by the application - it still uses a very low value circa 1%. The default value should always be generous as the BOINC Wrapper cannot know what a Projects needs will be, however it looks like the assumptions made in calculating the default value are way too generous.

The latter would explain why many projects reduce the value once its clear what the application needs are. The latter is impossible to predict in advance by the BOINC Wrapper, and in reality a nightmare for the BOINC Wrapper to automatically pick up, when Core Application languages for the application inside the BOINC Wrapper vary from ancient COBOLT through to .... pick your language out of potentially dozens :)

Regards
Zy
ID: 651 · Rating: 0 · rate: Rate + / Rate - Report as offensive
scottishwebcamslive.com
Avatar

Send message
Joined: 2 May 11
Posts: 21
Credit: 173,527,396
RAC: 0
Message 654 - Posted: 18 Jun 2011, 18:17:23 UTC
Last modified: 18 Jun 2011, 18:24:45 UTC

Hi,

As you know my field is not really coding or the software in use but more the hardware
what i dont understand is if this figure is set way to high far higher than it needs to be then why is moo using 4 of my multithreaded cores 100% of the time to run this project ?
running other cpu projects slowes everything way down so these 4 virtual cores must be doing something thats needed all the time ?
DNETC does not use any of the cores and has a setting of 0.05 for the cpu's
but as anybody knows running DNETC with anything more than one GPU per host makes it completely unstable
i'm just wondering if its because of the extra usage of the cpu's ( whatever there working on )
that stops this project being so unstable and perfect for people like myself with more than one GPU in their machine

may have this all wrong but you live and learn lol

best regards
Ian

p.s. congrats on getting your twin 5970's to a RAC of 1 million
see you their soon with my pair lol
----> Please Join team Scotland HERE
ID: 654 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 655 - Posted: 18 Jun 2011, 21:38:40 UTC - in response to Message 654.  
Last modified: 18 Jun 2011, 21:40:20 UTC

what i dont understand is if this figure is set way to high far higher than it needs to be then why is moo using 4 of my multithreaded cores 100% of the time to run this project ?


BOINC Wrapper is not Moo. The default setting of 0.97 comes from a calculation inside the BOINC Wrapper - its that calculation thats far too pessimistic as a default value. As a result of that calculation "space" is reserved (by BOINC Wrapper not Moo) on the CPU to enable Moo to work. However, in reality that much space need not be reserved, the default settings by BOINC Wrapper is far too cautious.

By intervening at Project level server side, using a BOINC Server provided parameter provided for that purpose, the correct value can be inserted by the Project. As the default assumption is 1%, a setting of 0.05 is arguably too much, but thats arguing such small differences that 0.05 is a sensible safe setting.

To give a horrible parrallel, its a bit like a group booking of 5 tickets for a baseball match, on arrival you find 97 seats were reserved for your group, not 5. You shrug your shoulders and watch the game anyway, as you only paid for five. However those remaining 92 seats were reserved by the booking agent for you, so no one else can use them .... Teemu is about to "kick the booking agent" and make sure you only get 5 seats in future :)

Regards
Zy
ID: 655 · Rating: 0 · rate: Rate + / Rate - Report as offensive
scottishwebcamslive.com
Avatar

Send message
Joined: 2 May 11
Posts: 21
Credit: 173,527,396
RAC: 0
Message 656 - Posted: 18 Jun 2011, 22:51:02 UTC

hi,

so the 5 seats you used left 92 seats left doing nothing ?
ok then if we use that i take it that whether we use the 5 seats or buy 97 seats and use 5 of them my cpu will use the same 4 virtual cores of the cpu 100% of the time as it does just now ?

sorry about extending the parallel :)

Ian
----> Please Join team Scotland HERE
ID: 656 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 657 - Posted: 19 Jun 2011, 0:08:44 UTC - in response to Message 656.  
Last modified: 19 Jun 2011, 0:09:02 UTC

Sorry delay in reply - had hassles doing driver upgrade, usual story .... :)

No, once Teemu "kicks the agent", you'll get a reservation of "5 seats" (0.05).

What happened in the past is 0.97 (97 seats) of a CPU was reserved per GPU. No matter what happened after that, no matter what other program asked for space, the CPU worked on the basis that 0.97 was reserved for Moo. In reality only a minor part was actually being used by Moo, but because 0.97 was reserved, it appeared as if Moo was using 0.97 - it wasnt, it used what it needed, like the analogy, you only used a minor number of seats.

What will happen now, is Teemu will reset the value to 0.05 (it would seem, from what he is saying, no doubt after a test check). At that point, only 0.05 of a CPU is reserved. If you look closely at his post, it looks like only around 1% (or 0.01) will be needed. End result, 0.05 reserved, 0.01 used (it used to be 0.97 reserved, 0.01 used) - always best to err on side of caution, hence 0.05, and the "over reservation" of 0.04 is inconsequential in the scheme of things.

That will then mean that the application gets what it always actually used, and the over reservation is freed up for other tasks - eg other Project CPU (Aqua, PG, PRPNet - whatever) applications without impinging drastically on Moo GPU application.

Regards
Zy
ID: 657 · Rating: 0 · rate: Rate + / Rate - Report as offensive
scottishwebcamslive.com
Avatar

Send message
Joined: 2 May 11
Posts: 21
Credit: 173,527,396
RAC: 0
Message 658 - Posted: 19 Jun 2011, 1:10:30 UTC

Hi,

oh no a dreaded drivers fiasco lol
rather work with hardware than drivers and the like any day of the week :)

i sort of get the idea now how this works ( in its own strange way lol )

one thing still puzzles me though, what actually is going on with the high cpu usage ? it must be doing something as it cant be using all the electrical power its using just to not be doing anything ?

i thought each of these multithreads 4 of 8 was being used by each of the GPU's at 100% to drive them continuously as if you look at the task manager 4 of the 8 monitors for the cpu usage read 100% constantly

if you try and use the other 4 or run any cpu work at all the whole thing slows down or even just stops
i tried playing with this core selection part in my preferances but anything other than 3 means the wu take twice as long or they just dont crunch at all

so 3 seems to be the ticket
i'll just wait to see what happens if or when the cpu setting is lowered to 0.05 if it makes any differance to my crunch times or the rigs stability

thanks for the explanations hopefully others might have had some questions answered too by all the above

best regards
Ian

----> Please Join team Scotland HERE
ID: 658 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 659 - Posted: 19 Jun 2011, 1:27:47 UTC - in response to Message 658.  
Last modified: 19 Jun 2011, 1:28:57 UTC

..... oh no a dreaded drivers fiasco lol
rather work with hardware than drivers and the like any day of the week :)


I was loading up 11.6, it went straight in on my 5850, but I had a problem with the 5970s. It would not go in with a simple control panel delete of the old driver set. I had to go the whole hog and clean it all out with Driver Cleaner. Could well have been fingers and thumbs on my part, dont know; but fair play, it did go in first time once the old set was properly cleaned out.

Appears to be running fine, 5850 timings are back to normal now, and the 5970s are getting there, so far all looks good. See what it looks like once its run overnight. The slight hickup on loading was probably me somehow, doesnt look like its the driver.

Its not a re run of the infamous 11.2 saga .... thats one AMD probably want to forget all about :)

Regards
Zy
ID: 659 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 388
Credit: 822,356,221
RAC: 0
Message 662 - Posted: 19 Jun 2011, 13:48:53 UTC

Hello,

I'll try to explain scheduling a bit better so that people don't get too confused with the terms I used. :)

Clarifications to some things said in this thread
First, BOINC Wrapper measures CPU usage as reported by OS for the child application (Distributed.net Client in our case) and that value gets reported back with the task. BOINC Client (or even project server code) doesn't actually use that measurement to decide about scheduling or resource reservations. For those BOINC Client uses a value (CPU resources we claim to need) that our project server calculates and sends with each task.

Secondly, the 1% I mentioned is only used to calculate amount of CPU resources we think we need (the high 0.95 values on some multi-GPU machines). It's actually not comparable to a static request of 0.05 of CPU resources (even though, this can be thought as needing 5% of one CPU core).

BOINC Client task scheduling
Normally BOINC Client scheduling is simple. Let's say a host has 4 CPU cores to use for BOINC computing. There's CPU only projects that all need the default 1 CPU resource per one work unit. BOINC Client can simply start 4 tasks before the host runs out of CPU resources. It now has 4 CPU resources in use.

Now we add GPU resources into the scheduling mix. Let's say this host now has also 4 ATI cards available for crunching. There's still the CPU only projects that need 1 CPU. But now there's also Moo! Wrapper project that says "I need 0.95 CPU and 4 GPU resources to run one task". So BOINC Client ends up starting the same 4 CPU-only tasks but also one Moo task. Now the total resources in use are 4.95 CPU and 4 GPU. Yes, this does over saturate the CPUs with work but the thinking of BOINC devs is (I believe) that this is fine and OS thread scheduler can handle this (not to mention most apps don't use 100% CPU all the time due to I/O waits and things like that).

Now, if we instead have PrimeGrid GPU tasks that requests (this is just an example) "0.25 CPU and 1 GPU" what will BOINC Client schedule? It'll actually now schedule only 3 CPU-only tasks and 4 PrimeGrid GPU tasks. Resources in use (according to BOINC Client) are now 4 CPU and 4 GPU so no over saturation. Reason for this is that 4 PrimeGrid tasks claim to use 1 full CPU core (0.25 times four is 1) so BOINC Client reserves one core for GPU tasks.

What needs to change and why
It's exactly this last scheduling scenario I need to avoid by changing what CPU resources our tasks request. By claiming that we need almost one CPU but then ending up using way less there's CPU cores that in reality are idle. Obviously, we don't want that. :)

I hope this helps (more than just confuse more). Also, I'll try get this change deployed today.

-w

P.S. I probably shouldn't have mention that 1% as it's more a coding detail on the default server code. It probably just confused more than gave some substance. BTW, anybody still want a similar detailed explanation on what that 1% actually is used for in the server code? :)
ID: 662 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 664 - Posted: 19 Jun 2011, 15:36:04 UTC - in response to Message 662.  

.... I'll try P.S. I probably shouldn't have mention that 1% as it's more a coding detail on the default server code. It probably just confused more than gave some substance. BTW, anybody still want a similar detailed explanation on what that 1% actually is used for in the server code? :)


Go for it - I'm all for being Edjamucated :)

Regards
Zy
ID: 664 · Rating: 0 · rate: Rate + / Rate - Report as offensive
scottishwebcamslive.com
Avatar

Send message
Joined: 2 May 11
Posts: 21
Credit: 173,527,396
RAC: 0
Message 667 - Posted: 19 Jun 2011, 19:36:39 UTC

hi,

thanks for that Teemu

I'm only now curious about what moo is using 4 of the 8 multithreads on my CPU at 100% for ?

I get that each GPU is being alocated one of its own but if its the GPU doing the work why's my task manager reading 4 of the 8 CPU graphs being used full all the time until the begining and end of each WU ?

best regards
Ian
----> Please Join team Scotland HERE
ID: 667 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 388
Credit: 822,356,221
RAC: 0
Message 673 - Posted: 20 Jun 2011, 17:48:53 UTC - in response to Message 667.  

I'm only now curious about what moo is using 4 of the 8 multithreads on my CPU at 100% for ?


All GPU tasks need to use some amount of CPU to keep the GPU happy and crunching. Usually this shouldn't be a lot but can be due to various normal or strange reasons.

For this project these reasons include the core used but also the card driver version or even some strange driver/card installation issues. Enabling cross fire can also cause strange behavior. I've also seen high CPU usage when otherwise the CPU would be idle. (So Dnet Client uses CPU if nobody else wants to use it but yield back if some task with more priority wants it.)

In fact, this last issue is what I'm seeing right now on my own host that only has one CPU task running in addition to Moo! Wrapper task. (I'm looking into why only one CPU tasks is run on my system but at the moment it looks like it's some kind of funky scheduling thing due to MT and high-priority tasks.)

I looked at your host and you are using core 3, which usually is right for ATI cards. But it can be that your 4-card system would actually benefit from using some other core (you can try experimenting or running a benchmark). Otherwise it can be a driver issue or maybe you have crossfire enabled?

Change deployed
FYI, I've just deployed a change to the requested average CPU resources needed. From now on server should always request 0.05 for ATI tasks and 0.20 for nVidia tasks. These numbers were purely made up by me so tweaking them is an option (maybe by getting average CPU used with some clever DB queries). I'm not sure how long it'll take that these change on clients. Could very well be only for new tasks handed out as of now..

I left the current calculated value for the max CPU resources requested field as a reference to see what it would have been. (This field is not used for anything by BOINC Client or the server so it should be fine.)

-w
ID: 673 · Rating: 0 · rate: Rate + / Rate - Report as offensive
scottishwebcamslive.com
Avatar

Send message
Joined: 2 May 11
Posts: 21
Credit: 173,527,396
RAC: 0
Message 674 - Posted: 20 Jun 2011, 19:47:05 UTC

Hello,

thank you for that
i do crossfire the cards for ease of use something not very easy to do on dnetc

i use core 3 because it was suggested somewhere else on these threads

i tried 1, 2, 3, 4, and 7 and the rest cut the cpu usage by half and doubled the length of time to complete the work unit so stuck with 3

i dont know about the benchmarking thing your speaking about but i just changed the numbers and looked at the new times that the change made

it will be a couple of days till i get todays 0.05 work units as i have a 3 day bank of work units incase the server goes down over the weekend as sometimes happens on other projects

will let you know of any changes to timings and stability after the new work units start being used by my machine

best regards
Ian
----> Please Join team Scotland HERE
ID: 674 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Beyond
Avatar

Send message
Joined: 18 May 11
Posts: 46
Credit: 1,254,302,893
RAC: 0
Message 679 - Posted: 21 Jun 2011, 19:10:10 UTC
Last modified: 21 Jun 2011, 19:10:31 UTC

Thanks Teemu for posting the information on BOINC task scheduling and changing the CPU reservation. The WUs are running great at .05 CPU on all my machines and it allowed me to increase my Moo! participation. Thanks again for the change.
ID: 679 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 680 - Posted: 21 Jun 2011, 20:47:54 UTC

Same thanks from me .... working fine, and for me it gave me space to run some additional PG PRPNet 5oB WUs for a long standing project closure task without impinging on Moo timings.

Many Thanks :)

Regards
Zy
ID: 680 · Rating: 0 · rate: Rate + / Rate - Report as offensive
scottishwebcamslive.com
Avatar

Send message
Joined: 2 May 11
Posts: 21
Credit: 173,527,396
RAC: 0
Message 681 - Posted: 22 Jun 2011, 11:40:16 UTC
Last modified: 22 Jun 2011, 11:44:40 UTC

Hello,

Somethings went wrong here



my manager has now put the work units i have into high priority for no reason and is picking whichever it feels like instead of going from top to bottom ( completion date )


how do i get rid of the priority mode so the manager can get back to crunching the work with the lowest completion date ?
the first of these is due to be be crunched and sent by the 25th so no reason for it thinking its running out of time

dont know if this has anything to do with the new 0.05 cpu usage which seems to have happened overnight but the 4 full core usage on the cpu has not changed


best regards
Ian
----> Please Join team Scotland HERE
ID: 681 · Rating: 0 · rate: Rate + / Rate - Report as offensive
1 · 2 · Next

Message boards : Number crunching : Too high CPU core estimation


 
Copyright © 2011-2024 Moo! Wrapper Project