Forcing work on one GPU only

\n studio-striking\n

Message boards : Number crunching : Forcing work on one GPU only
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,826,667,680
RAC: 32,419
Message 2826 - Posted: 5 Mar 2012, 12:47:36 UTC - in response to Message 2820.  

Thank you for your effort. I will try a few aspects of your proposed solution after task backlog gets cleared a bit - right now I excluded second GPU from any BOINC projects, giving all GPU power to Moo! (BOINC doesn't intend to use it but DNETC uses it). The workaround certainly helps to reduce congestion between two Moo! tasks trying to consume same GPU resources (as I mentioned before, if two dnetc applications are running concurrently, they both use excessive amount of CPU).

There are a few problems with this work-around ...
    1. it doesn't make DNETC application behave BOINC-friendly: it will still grab all GPUs it sees, thus starving other projects. One might not want to dedicate all the resources to distributed computing (in general)
    2. it doesn't prevent DNETC from wasting resources due to non-synchronous processing of work batches. BTW, the same problem does arise also with CPU application when one has many CPUs in machine (think dual hexa core HT server, like dual Intel X5670 or similar), where it hits even harder



BTW, I don't think it's necessary to exclude the second GPU from other BOINC projects. They mostly (if not all) only require single GPU and play well alongside each other. One might want to do it only if they want to see Moo! running all the time.

Your workaround might also work around the BOINC CC bug which prevents from running multi-GPU tasks ... but will have to test and see.



The exclude option should tell Moo to NOT use any gpu that is excluded, so something is wrong if that is not happening!

Second yes you DO need the exclude option for other projects or they will run more than one unit at a time using all gpu's available to it, just putting one workunit on each gpu.

Thirdly if you ARE running cpu units AND gpu units for Moo ON THE SAME PC you should stop or make sure you are using the absolute newest version of Boinc! Many, MANY version of Boinc have 'issues' trying to separate the cache for the cpu and the gpu and it gets to be a mess! Normally the cpu units run longer than the gpu units so trying to keep a consistent cache level for each is tough at best! 2 days for the cpu is a much different number of units than 2 days for a gpu! That is why I NEVER EVER run the same project on both the cpu and the gpu in the same machine!! I have 14 machines crunching right now, so it is easy to shuffle things around.
ID: 2826 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Metod, S56RKO

Send message
Joined: 1 Mar 12
Posts: 8
Credit: 7,361,960
RAC: 0
Message 2831 - Posted: 5 Mar 2012, 20:32:09 UTC - in response to Message 2826.  
Last modified: 5 Mar 2012, 20:33:47 UTC

The exclude option means that BOINC CC will not assign Moo! task to that particular GPU. If the task does as instructed, then that's the end of story. Not so with DNETC application which is greedy and uses all of the resources of the same kind (CPU or GPU) and doesn't care about what BOINC CC is instructing it to do. The application doesn't know that some particular GPU is off-limits because nobody told it so ... and it doesn't care either as it is right now.

Here's process list from my Linux box:

boinc   19783  0.0  0.0  11284  1136 ?        SNl  18:02   0:04 ../../projects/moowrap.net/dnetc_wrapper_1.3_x86_64-pc-linux-gnu__cuda31 --device 0
boinc   19785 10.0  0.0 79853508 63552 ?      SNl  18:02  11:28 dnetc -ini dnetc.ini -runoffline -multiok=1 -e email@somewhere.net


The first line is Moo! wrapper and BOINC CC passes instruction to use only device 0 (NVIDIA GPU). The second line is actual DNETC application which is not BOINC aware (that's why we need wrapper). Note, that information about which GPU to use got lost between the two.

BTW, the actual DNETC application is instructed not to top up the buffer (option -runoffline) which would otherwise be done in a DNETC way. The application will terminate after it empties the work buffer - that one had been prepared by Moo! server as a task.

Now a part of output of the DNETC application:

18:02:54 (19783): wrapper: starting v1.3.9.7
18:02:54 (19783): device: 2 x GeForce GT 430 (driver version unknown, CUDA version 4.20, compute capability 2.1, 1024MB, 1001MB available, 280 GFLOPS peak)
18:02:54 (19783): checkpoint interval: 25 min (task 414750 GFLOPS, 12 min)
18:02:54 (19783): wrapper: running dnetc517-linux-amd64-cuda31 (-ini dnetc.ini -runoffline -multiok=1) - attempt 1/10

dnetc v2.9108-517-CTR-10070313 for CUDA 3.1 on Linux (Linux 2.6.36.1).
Using email address (distributed.net ID) 'email@somewhere.net'

[Mar 05 18:02:54 UTC] Automatic processor detection found 2 processors.
[Mar 05 18:02:54 UTC] Loading crunchers with work...
[Mar 05 18:02:54 UTC] Automatic processor type detection found
                      a GeForce GT 430 (64 SPs) processor.
[Mar 05 18:02:54 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd).
[Mar 05 18:02:54 UTC] RC5-72 #a: Loaded CD:B80D28C8:00000000:48*2^32
[Mar 05 18:02:54 UTC] RC5-72 #b: Loaded CD:B80D4442:00000000:64*2^32


At the top there's some output from the wrapper application (it already omits the info about assigned GPU device). Then DNETC starts: it detected 2 GPUs and says that it started two threads. To add insult to injury thread b (which is executed by off-limits NVIDIA device 1) got a larger work batch (64 vs. 48) and that one is competing for GPU cycles with another application (from a different BOINC project) which actually behaves and only runs on the designated GPU device. Which means that while DNETC crunching will run full-speed on GPU 0, it will crawl on GPU 1. My estimation is that GPU 0 will sit idle at least 50% of time because DNETC application will wait for thread b to finish.

This is where DNETC project should step towards BOINC community. Unless there's some option for the DNETC application that would instruct i only to use particular device (CPU or GPU), Moo! won't be able to tie the two distributed computing worlds together.

Second yes you DO need the exclude option for other projects or they will run more than one unit at a time using all gpu's available to it, just putting one workunit on each gpu.


BOINC CC takes care not to over commit resources. If there's one Moo! running (assumed to run on one GPU but this is not the case as illustrated above), then it won't assign another task to use the same resource. And that works just fine for the rest of GPU projects I participate.

And, BTW, you got it mixed up a bit for your third argument. A BOINC task as served by project (Moo!) server consists of (set of) executable as well as data file. Without forcing it other way BOINC will only process data file using designated executable. In Moo! case data file contains work cache and it won't be processed by application other than the one defined when task had been served (either CPU or GPU). This means it's perfectly safe to mix CPU and GPU tasks from the same project on the same machine as BOINC will not confuse them (it beter doesn't, there are projects that serve completely different applications and data files. You can immagine disaster if BOINC CC would mix them at will).

No, I'm no running CPU tasks from Moo! ... and I'm quite sure I won't be running GPU tasks on this dual-GPU machine either until this issue gets resolved.
Metod ...
ID: 2831 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,826,667,680
RAC: 32,419
Message 2832 - Posted: 5 Mar 2012, 23:50:20 UTC - in response to Message 2831.  

The exclude option means that BOINC CC will not assign Moo! task to that particular GPU. If the task does as instructed, then that's the end of story. Not so with DNETC application which is greedy and uses all of the resources of the same kind (CPU or GPU) and doesn't care about what BOINC CC is instructing it to do. The application doesn't know that some particular GPU is off-limits because nobody told it so ... and it doesn't care either as it is right now.

Here's process list from my Linux box:

boinc   19783  0.0  0.0  11284  1136 ?        SNl  18:02   0:04 ../../projects/moowrap.net/dnetc_wrapper_1.3_x86_64-pc-linux-gnu__cuda31 --device 0
boinc   19785 10.0  0.0 79853508 63552 ?      SNl  18:02  11:28 dnetc -ini dnetc.ini -runoffline -multiok=1 -e email@somewhere.net


The first line is Moo! wrapper and BOINC CC passes instruction to use only device 0 (NVIDIA GPU). The second line is actual DNETC application which is not BOINC aware (that's why we need wrapper). Note, that information about which GPU to use got lost between the two.

BTW, the actual DNETC application is instructed not to top up the buffer (option -runoffline) which would otherwise be done in a DNETC way. The application will terminate after it empties the work buffer - that one had been prepared by Moo! server as a task.

Now a part of output of the DNETC application:

18:02:54 (19783): wrapper: starting v1.3.9.7
18:02:54 (19783): device: 2 x GeForce GT 430 (driver version unknown, CUDA version 4.20, compute capability 2.1, 1024MB, 1001MB available, 280 GFLOPS peak)
18:02:54 (19783): checkpoint interval: 25 min (task 414750 GFLOPS, 12 min)
18:02:54 (19783): wrapper: running dnetc517-linux-amd64-cuda31 (-ini dnetc.ini -runoffline -multiok=1) - attempt 1/10

dnetc v2.9108-517-CTR-10070313 for CUDA 3.1 on Linux (Linux 2.6.36.1).
Using email address (distributed.net ID) 'email@somewhere.net'

[Mar 05 18:02:54 UTC] Automatic processor detection found 2 processors.
[Mar 05 18:02:54 UTC] Loading crunchers with work...
[Mar 05 18:02:54 UTC] Automatic processor type detection found
                      a GeForce GT 430 (64 SPs) processor.
[Mar 05 18:02:54 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd).
[Mar 05 18:02:54 UTC] RC5-72 #a: Loaded CD:B80D28C8:00000000:48*2^32
[Mar 05 18:02:54 UTC] RC5-72 #b: Loaded CD:B80D4442:00000000:64*2^32


At the top there's some output from the wrapper application (it already omits the info about assigned GPU device). Then DNETC starts: it detected 2 GPUs and says that it started two threads. To add insult to injury thread b (which is executed by off-limits NVIDIA device 1) got a larger work batch (64 vs. 48) and that one is competing for GPU cycles with another application (from a different BOINC project) which actually behaves and only runs on the designated GPU device. Which means that while DNETC crunching will run full-speed on GPU 0, it will crawl on GPU 1. My estimation is that GPU 0 will sit idle at least 50% of time because DNETC application will wait for thread b to finish.

This is where DNETC project should step towards BOINC community. Unless there's some option for the DNETC application that would instruct i only to use particular device (CPU or GPU), Moo! won't be able to tie the two distributed computing worlds together.

Second yes you DO need the exclude option for other projects or they will run more than one unit at a time using all gpu's available to it, just putting one workunit on each gpu.


BOINC CC takes care not to over commit resources. If there's one Moo! running (assumed to run on one GPU but this is not the case as illustrated above), then it won't assign another task to use the same resource. And that works just fine for the rest of GPU projects I participate.

And, BTW, you got it mixed up a bit for your third argument. A BOINC task as served by project (Moo!) server consists of (set of) executable as well as data file. Without forcing it other way BOINC will only process data file using designated executable. In Moo! case data file contains work cache and it won't be processed by application other than the one defined when task had been served (either CPU or GPU). This means it's perfectly safe to mix CPU and GPU tasks from the same project on the same machine as BOINC will not confuse them (it beter doesn't, there are projects that serve completely different applications and data files. You can immagine disaster if BOINC CC would mix them at will).

No, I'm no running CPU tasks from Moo! ... and I'm quite sure I won't be running GPU tasks on this dual-GPU machine either until this issue gets resolved.


Wooo I am sorry...I did not realize you were using Linux as the OS, I am guessing it is in what directory you put the cc_config.xml file or the version of Boinc, you ARE using ver 7 right? But since I am NOT a Linux guy I have no clue how to help you. In Windows it does work, so I have no more clues for you, sorry!
ID: 2832 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Metod, S56RKO

Send message
Joined: 1 Mar 12
Posts: 8
Credit: 7,361,960
RAC: 0
Message 2836 - Posted: 6 Mar 2012, 6:55:01 UTC - in response to Message 2832.  

I'm using BOINC 7.0.18. And cc_config.xml works just fine if placed in BOINC data directory. I'm using some debug options and as far as I observed, BOINC respects exclusion of Moo! form GPU number 1. Again: it's not up to project application to actually read the cc_config.xml file, it's up to BOINC to pass appropriate settings and it's up to project application just to obey them. For this part, OS choice doesn't matter at all. E.g. how is project application supposed to know on which GPU is it supposed to run unless it gets this application from BOINC CC?

I don't know how DNETC behaves in Windows and how you can actually verify proper operation (with regard to exclusion), however I proved in my previous posts that it doesn't work as it should in Linux.

Nevertheless, it's not up to us volunteers to fix the application, it's up to Moo! project and DNETC project.
Metod ...
ID: 2836 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,826,667,680
RAC: 32,419
Message 2839 - Posted: 6 Mar 2012, 12:02:21 UTC - in response to Message 2836.  

I don't know how DNETC behaves in Windows and how you can actually verify proper operation (with regard to exclusion), however I proved in my previous posts that it doesn't work as it should in Linux.


By having 2 gpu's in one machine and each crunching for a different project, one being Moo.

Nevertheless, it's not up to us volunteers to fix the application, it's up to Moo! project and DNETC project.


I agree.
ID: 2839 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Metod, S56RKO

Send message
Joined: 1 Mar 12
Posts: 8
Credit: 7,361,960
RAC: 0
Message 2842 - Posted: 6 Mar 2012, 14:09:32 UTC - in response to Message 2839.  

I don't know how DNETC behaves in Windows and how you can actually verify proper operation (with regard to exclusion), however I proved in my previous posts that it doesn't work as it should in Linux.


By having 2 gpu's in one machine and each crunching for a different project, one being Moo.


Nope. If you check the output of a task from your dual-GPU machine (e.g. this one), you will see that DNETC application started two worker threads (#a and #b) and assigned work to both. Compare this to output from task being run on any other of your machines (such as this one) which only starts one worker thread.

Which means that DNETC application is as greedy on Windows as on Linux.

You should not trust BOINC manager when checking if project application behaves ... BOINC manager has no idea about that.

Metod ...
ID: 2842 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,826,667,680
RAC: 32,419
Message 2844 - Posted: 6 Mar 2012, 22:22:32 UTC - in response to Message 2842.  

I don't know how DNETC behaves in Windows and how you can actually verify proper operation (with regard to exclusion), however I proved in my previous posts that it doesn't work as it should in Linux.


By having 2 gpu's in one machine and each crunching for a different project, one being Moo.


Nope. If you check the output of a task from your dual-GPU machine (e.g. this one), you will see that DNETC application started two worker threads (#a and #b) and assigned work to both. Compare this to output from task being run on any other of your machines (such as this one) which only starts one worker thread.

Which means that DNETC application is as greedy on Windows as on Linux.

You should not trust BOINC manager when checking if project application behaves ... BOINC manager has no idea about that.


On my only dual gpu machine I AM running both on Moo right now, but I used to run one on Collatz. I move things around all the time, I am a tinkerer, it is only a hobby, it is only a hobby!! In fact in a few days I will be moving things around AGAIN as my new multi pci-e slot machines are not using dual, or more, pci-e cards, except for the one you see. My problem is I have cards that are unused and would like to pair them up as much as possible, so several gpu moves are in my future. I want to end up with all 4 5870's in 2 machines, and then double up on the rest of my gpu's as much as possible. I have 4 5870's, 6 5770's, 1 6850 and 1 Nvidia 560 ti. I also have some older AMD and Nvidia cards that I would like to put into machines but may not have enough pci-e slots to do that.
ID: 2844 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Metod, S56RKO

Send message
Joined: 1 Mar 12
Posts: 8
Credit: 7,361,960
RAC: 0
Message 2849 - Posted: 7 Mar 2012, 13:49:38 UTC - in response to Message 2844.  
Last modified: 7 Mar 2012, 13:50:13 UTC

On my only dual gpu machine I AM running both on Moo right now, but I used to run one on Collatz.


OK, so you have a home work to do: when you will split your GPU power to two projects again, check the output of one task - not the first one of course, rather one that gets downloaded after you do the split. I'd be quite interested in that output.
Metod ...
ID: 2849 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,826,667,680
RAC: 32,419
Message 2854 - Posted: 8 Mar 2012, 11:35:41 UTC - in response to Message 2849.  

On my only dual gpu machine I AM running both on Moo right now, but I used to run one on Collatz.


OK, so you have a home work to do: when you will split your GPU power to two projects again, check the output of one task - not the first one of course, rather one that gets downloaded after you do the split. I'd be quite interested in that output.


The wife leaves tomorrow on a short trip so I will have time while she is gone.
ID: 2854 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 · 2

Message boards : Number crunching : Forcing work on one GPU only


 
Copyright © 2011-2024 Moo! Wrapper Project