Nvidia GPU workunits won't start

\n studio-striking\n

Message boards : Number crunching : Nvidia GPU workunits won't start
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Daniel

Send message
Joined: 2 May 13
Posts: 13
Credit: 100,928,847
RAC: 0
Message 6604 - Posted: 22 Jun 2015, 6:59:59 UTC
Last modified: 22 Jun 2015, 7:05:24 UTC

None of my Moo! Wrapper Nvidia GPU workunits will start running. They all say "Ready to Start", but even when there are no other GPU workunits running from other projects, the Moo! Wrapper workunits won't start running.

I have tried suspending any other projects (CPU & GPU ones) but still the Moo! Wrapper workunits won't start.

I have tried them on 2 different PCs (one with Windows 7 and one with Windows 8) and I have the same problem.

I am using v7.4.42 of BOINC Manager. I have tried upgrading my Nvidia drivers to the latest version, but this made no difference.

GPU workunits from other projects work fine.

When I last crunched GPU workunits for this project (about a month ago) they also worked fine, but not anymore.

Any help would be appreciated.
ID: 6604 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Daniel

Send message
Joined: 2 May 13
Posts: 13
Credit: 100,928,847
RAC: 0
Message 6605 - Posted: 22 Jun 2015, 9:08:49 UTC - in response to Message 6604.  

After reading some older threads in this forum, it appears that other people have had the same problem and that it might be due to the fact that both my PCs have dual graphics cards.

It's strange that it was working about a month ago, but doesn't work now. I don't recall changing anything.

I've tried an app_info.xml file mentioned in another thread but the problem persists.

I'll keep trying.
ID: 6605 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,826,336,240
RAC: 414
Message 6606 - Posted: 22 Jun 2015, 10:17:41 UTC - in response to Message 6605.  

After reading some older threads in this forum, it appears that other people have had the same problem and that it might be due to the fact that both my PCs have dual graphics cards.

It's strange that it was working about a month ago, but doesn't work now. I don't recall changing anything.

I've tried an app_info.xml file mentioned in another thread but the problem persists.

I'll keep trying.


Go into your account, computing preferences and see if you have the gpu set to NOT run while the pc is in use?
Suspend GPU computing when computer is in use
ID: 6606 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Daniel

Send message
Joined: 2 May 13
Posts: 13
Credit: 100,928,847
RAC: 0
Message 6608 - Posted: 22 Jun 2015, 11:13:34 UTC - in response to Message 6606.  

Hi Mikey,

"Suspend when computer is in use" - is unticked
"Suspend GPU computing when computer is in use" - is unticked

I wonder if it has anything to do with the new v1.04 (opencl_nvidia_101) application that was released around the end of May. Although maybe not, since my PCs aren't actually getting the new application (I'm still receiving the v1.03 (cuda31) application.

Daniel.
ID: 6608 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Pooh Bear 27
Avatar

Send message
Joined: 5 May 11
Posts: 12
Credit: 74,488,285
RAC: 0
Message 6609 - Posted: 22 Jun 2015, 17:11:10 UTC

Looking at units that you did in May show "driver version 340.52", your systems now show "driver: 353.06". I would look here first. Back down the driver and see if it starts processing work.
ID: 6609 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Daniel

Send message
Joined: 2 May 13
Posts: 13
Credit: 100,928,847
RAC: 0
Message 6611 - Posted: 23 Jun 2015, 3:35:03 UTC - in response to Message 6609.  

I was having the problem with the old driver as well. I installed the new driver a few days ago to see if it would fix my problem, but it didn't.
ID: 6611 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,826,336,240
RAC: 414
Message 6612 - Posted: 23 Jun 2015, 10:05:15 UTC - in response to Message 6608.  

Hi Mikey,

"Suspend when computer is in use" - is unticked
"Suspend GPU computing when computer is in use" - is unticked

I wonder if it has anything to do with the new v1.04 (opencl_nvidia_101) application that was released around the end of May. Although maybe not, since my PCs aren't actually getting the new application (I'm still receiving the v1.03 (cuda31) application.

Daniel.


How about this setting?

Suspend when non-BOINC CPU usage is above --- %
ID: 6612 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Daniel

Send message
Joined: 2 May 13
Posts: 13
Credit: 100,928,847
RAC: 0
Message 6614 - Posted: 23 Jun 2015, 10:28:48 UTC - in response to Message 6612.  

"Suspend when non-BOINC CPU usage is above" - was set to 25%. I've now unticked it so that it won't suspend, but my problem still persists.

I've noticed that in the News section of the main Moo! Wrapper page it says ...

The old app version is still available for systems running older BOINC Client. However, if your system has multiple GPUs, please consider updating it's BOINC Client to a support version. There are known issues running the old app on systems with multiple GPUs. New apps fix these by running one cruncher per device.

I'm still getting the v1.03 (cuda31) application (which was working OK previously).

Do you know if there is a way to get the new v1.04 (opencl_nvidia_101) application ?

Maybe the new application will fix my problem.
ID: 6614 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,826,336,240
RAC: 414
Message 6631 - Posted: 6 Jul 2015, 10:39:59 UTC - in response to Message 6614.  

"Suspend when non-BOINC CPU usage is above" - was set to 25%. I've now unticked it so that it won't suspend, but my problem still persists.

I've noticed that in the News section of the main Moo! Wrapper page it says ...

The old app version is still available for systems running older BOINC Client. However, if your system has multiple GPUs, please consider updating it's BOINC Client to a support version. There are known issues running the old app on systems with multiple GPUs. New apps fix these by running one cruncher per device.

I'm still getting the v1.03 (cuda31) application (which was working OK previously).

Do you know if there is a way to get the new v1.04 (opencl_nvidia_101) application ?

Maybe the new application will fix my problem.


No I do not know how to force the change.
ID: 6631 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 388
Credit: 822,356,221
RAC: 0
Message 6632 - Posted: 6 Jul 2015, 12:27:21 UTC - in response to Message 6604.  

None of my Moo! Wrapper Nvidia GPU workunits will start running. They all say "Ready to Start", but even when there are no other GPU workunits running from other projects, the Moo! Wrapper workunits won't start running.


I noticed this too on my test nVidia system that has dual GPUs. BOINC Client has major problems on trying to schedule work units that require multiple GPUs (like all our old app workunits do). I've tested all major versions since 7.2.33 and all seem to be affected at least on my test system.

I did disable sending old app versions to any multi-GPU systems using BOINC Client v7.2.x or later but it seems some users have managed to get our old app working so this prevented them getting any work. I have no idea why or how to detect those systems in the server side. So for now only multi-GPU systems with BOINC Client v7.2.33 won't get old apps (this is the version that's available for CentOS 6 in their repo).

There was a fix committed to BOINC Client for this problem recently (see https://github.com/BOINC/boinc/commit/8c7aef5b997c028e007dc158d76eab3a5502e3c4) but I don't think there's a release version of it yet. I've tested it with a test build and it does seem to fix the scheduling problem.

However, that won't help old BOINC Client users. So the best course probably is to have everybody use our new app. For that we have to have new enough Dnet Client that supports selecting the GPU to be used and there's no CUDA builds available for that. I've gotten one compiled for Linux and thus was able to release cuda60/cuda70 builds with new app. I'll try to get Windows ones built too.

In the mean time, nVidia OpenCL apps are the thing to get. I've fixed some problems in the BOINC Server code that could have prevented people getting that app. (There has been various problems for sending apps when there's more than one possible app for a platform.) Additinally, more stats the server has for app version, the more it knows it's true speed against other apps.

So, could you test if you can get OpenCL apps now. And if not, tell me the host # that's affected and I'll try to debug the app sending problem further. Thanks!

-w
ID: 6632 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Daniel

Send message
Joined: 2 May 13
Posts: 13
Credit: 100,928,847
RAC: 0
Message 6635 - Posted: 8 Jul 2015, 3:20:48 UTC - in response to Message 6632.  
Last modified: 8 Jul 2015, 3:21:18 UTC

Thanks for your reply Teemu.

I've just downloaded some more workunits, but I'm still having the same problem.

The workunits are v1.03 cuda31 and they won't start running.

Here is a link to my computer ...
http://moowrap.net/show_host_detail.php?hostid=185032
ID: 6635 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,826,336,240
RAC: 414
Message 6636 - Posted: 9 Jul 2015, 11:19:50 UTC - in response to Message 6635.  

Thanks for your reply Teemu.

I've just downloaded some more workunits, but I'm still having the same problem.

The workunits are v1.03 cuda31 and they won't start running.

Here is a link to my computer ...
http://moowrap.net/show_host_detail.php?hostid=185032


Is this your only pc, or do you have others at your other projects? Because you seem to be doing several gpu projects at the same time, could it be your cache is too big and you just have too many units over all and Moo is suffering from a lack of time to run them? Your units say 'abandoned' or in the case of one unit 'didn't run by the deadline'.
ID: 6636 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Daniel

Send message
Joined: 2 May 13
Posts: 13
Credit: 100,928,847
RAC: 0
Message 6637 - Posted: 9 Jul 2015, 12:52:38 UTC - in response to Message 6636.  
Last modified: 9 Jul 2015, 12:55:22 UTC

Yes I have other computers.
Two of them have Nvidia GPUs and both of them are having the same problem with this project.
I do run other GPU projects, but even when I have no other GPU workunits or when I suspend the other projects, workunits from this project still won't start.
ID: 6637 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,826,336,240
RAC: 414
Message 6638 - Posted: 10 Jul 2015, 10:34:08 UTC - in response to Message 6637.  

Yes I have other computers.
Two of them have Nvidia GPUs and both of them are having the same problem with this project.
I do run other GPU projects, but even when I have no other GPU workunits or when I suspend the other projects, workunits from this project still won't start.


I am out of ideas then, sorry.
ID: 6638 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 388
Credit: 822,356,221
RAC: 0
Message 6639 - Posted: 10 Jul 2015, 11:21:13 UTC - in response to Message 6635.  

Here is a link to my computer ...
http://moowrap.net/show_host_detail.php?hostid=185032


Right, thanks for the host #. I've blacklisted the nVidia CUDA for that host for now so you should get the nVidia OpenCL work units now. Hope they work better. Please reset the project and fetch new work to test, thanks!

The problem with getting nVidia OpenCL work in the first place is that BOINC Server thinks that the OpenCL app is way much slower than the CUDA one for your host. It obviously doesn't take into account that CUDA doesn't start at all now. :(

Specifically the scheduler says: "Comparing AV#38 (43.01 GFLOPS) against AV#26 (178.36 GFLOPS)" where AV#38 is the OpenCL and AV#26 is the CUDA. CUDA speed comes from the measured time from succesfully returned work and OpenCL is a guess by the server. I've been trying to make it guess better but have not succeeded fully yet.

-w
ID: 6639 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Daniel

Send message
Joined: 2 May 13
Posts: 13
Credit: 100,928,847
RAC: 0
Message 6640 - Posted: 10 Jul 2015, 14:09:09 UTC - in response to Message 6639.  
Last modified: 10 Jul 2015, 14:09:50 UTC

Thanks Teemu !

You've fixed my problem.

I'm now getting the OpenCL workunits and they start running straight away.

Perhaps you could add an option in our account preferences that lets us choose which application to run.

Thanks also to Mikey & Pooh Bear 27 for your suggestions.
ID: 6640 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Daniel

Send message
Joined: 2 May 13
Posts: 13
Credit: 100,928,847
RAC: 0
Message 6641 - Posted: 10 Jul 2015, 14:39:28 UTC - in response to Message 6640.  

Looks like I spoke too soon. :(

While the tasks do now start running straight away, the GPU load reported by GPU-Z is always 0%.

After a few seconds of running the task progress bar stops at 17.567% and no further progress is made.

I tried rebooting the computer and resetting the project, but this made no difference.
ID: 6641 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 388
Credit: 822,356,221
RAC: 0
Message 6643 - Posted: 10 Jul 2015, 18:48:16 UTC - in response to Message 6641.  

After a few seconds of running the task progress bar stops at 17.567% and no further progress is made..


Oh no. :( Can you find the stderr.log file in the slot directory from a task that's stuck and post it either here or PM it to me? (Note that the log file might contain your Distributed.net ID which you might not want to disclose publicly.)

-w
ID: 6643 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Daniel

Send message
Joined: 2 May 13
Posts: 13
Credit: 100,928,847
RAC: 0
Message 6644 - Posted: 11 Jul 2015, 0:44:50 UTC - in response to Message 6643.  

I think I know what the problem is.

The application is trying to use my Intel GPU instead of NVidia GPUs.

When I alternately suspend & resume the Moo!Wrapper project my NVidia GPU usage stays at 0%, but my Intel GPU usage goes from 0 to 100%.

A similar problem was reported by anson1998 in this thread http://moowrap.net/forum_thread.php?id=406

Here is the contents of my stderr.txt ...

10:32:13 (1612): wrapper v1.4 build 18 for nVidia OpenCL starting (BOINC Wrapper v7.5.26011)
10:32:13 (1612): device: OpenCL: NVIDIA GPU 0: GeForce GTX 760 (driver version 353.06, device version OpenCL 1.2 CUDA, 2048MB, 1958MB available, 2650 GFLOPS peak)
10:32:13 (1612): device: OpenCL: NVIDIA GPU 1 (not used): GeForce GTX 760 (driver version 353.06, device version OpenCL 1.2 CUDA, 2048MB, 1958MB available, 2650 GFLOPS peak)
10:32:13 (1612): checkpoint interval: 0h15m00s00 (task 268800 GFLOPS, 0h01m41s44 per packet)
10:32:13 (1612): wrapper: running dnetc520-win32-x86-opencl.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 1/10

dnetc v2.9111-520-CTR-12082118 for OpenCL on Win32 (WindowsNT 6.2).

* ==========================================================================
* The client is not configured with your email address (distributed.net ID)
* Work done cannot be credited until it is set. Please run 'dnetc -config'
* ==========================================================================

[Jul 11 00:32:14 UTC] Automatic processor type detection did not
recognize the processor (tag: "Intel(R) HD Graphics 4000 ")
[Jul 11 00:32:34 UTC] RC5-72: using core #1 (CL 1-pipe).
[Jul 11 00:32:34 UTC] RC5-72: Loaded D3:CB6FEE00:00000000:64*2^32
Packet was from a different user/core/client cpu/os/build.
[Jul 11 00:32:34 UTC] RC5-72: 2 packets (128.00 stats units) remain in
in.r72
Projected ideal time to completion: 0.04:07:28.00
[Jul 11 00:32:34 UTC] RC5-72: 0 packets are in out.r72
[Jul 11 00:32:34 UTC] 1 cruncher has been started.
ID: 6644 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 388
Credit: 822,356,221
RAC: 0
Message 6645 - Posted: 11 Jul 2015, 2:20:22 UTC - in response to Message 6644.  

The application is trying to use my Intel GPU instead of NVidia GPUs.


Aha! Thanks for this information, at least now we know what the problem is. Although, the fix for this might be a bit involved. Meanwhile, I'll probably try and get newer CUDA builds with our new app for Windows build and deployed as those will use 1 device at a time and won't suffer from the scheduling problem.

This stems from the two different views on the OpenCL devices on the system. BOINC only sees the nVidia devices while D.net Client sees all the OpenCL capable devices and thus device 0 is different. :(

-w
ID: 6645 · Rating: 0 · rate: Rate + / Rate - Report as offensive
1 · 2 · Next

Message boards : Number crunching : Nvidia GPU workunits won't start


 
Copyright © 2011-2024 Moo! Wrapper Project