BOINC Scheduler changes for multiple app version case

Message boards : News : BOINC Scheduler changes for multiple app version case
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 363
Credit: 807,087,510
RAC: 60,757
Message 7981 - Posted: 2 May 2018, 12:00:34 UTC

BOINC Scheduler has had problems sending different app versions to clients when there's multiple possible versions for a platform. For example, this happens when there's both OpenCL and Stream/CUDA or both 32-bit and 64-bit CPU app version available. To hopefully fix this our scheduler has been changed to send each app version until it has enough host specific speed samples. Only exception is when that version has been failing.

Please report any problems of getting work or having them fail more often in our forums. Thank you and happy crunching!
ID: 7981 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Woodles

Send message
Joined: 6 May 11
Posts: 4
Credit: 890,049,823
RAC: 829,231
Message 7982 - Posted: 2 May 2018, 16:30:55 UTC - in response to Message 7981.  

I have a host with two GTX1080s in http://moowrap.net/show_host_detail.php?hostid=798990 and up until a few hours ago they were both crunching the same "Distributed.net Client v1.03 (cuda31) windows_intelx86" tasks quite happily.

Now I'm only getting "Distributed.net Client v1.04 (opencl_nvidia_101) windows_intelx86" tasks which all fail after about 20 seconds with "computation error"

The project attempts to run one task per GPU but GPU-Z shows no load and no power used.

It may have something to do with the message "[May 02 03:32:57 UTC] Automatic processor type detection did not recognize the processor (tag: "Intel(R) HD Graphics 530 ")" in the stderr file.

No changes at my end, another hosts with the same GPUs is also getting the opencl tasks and running them although they are claiming to need over six hours (and increasing) to finish!

I don't use an app_config.
ID: 7982 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Woodles

Send message
Joined: 6 May 11
Posts: 4
Credit: 890,049,823
RAC: 829,231
Message 7983 - Posted: 2 May 2018, 17:43:42 UTC

I've let it run again and tasks now state:
17:45:15 (7524): wrapper v1.4 build 18 for nVidia OpenCL starting (BOINC Wrapper v7.5.26011)
17:45:15 (7524): device: OpenCL: NVIDIA GPU 0 (not used): GeForce GTX 1080 (driver version 378.49, device version OpenCL 1.2 CUDA, 8192MB, 3042MB available, 9395 GFLOPS peak)
17:45:15 (7524): device: OpenCL: NVIDIA GPU 1: GeForce GTX 1080 (driver version 378.49, device version OpenCL 1.2 CUDA, 8192MB, 3042MB available, 9395 GFLOPS peak)
17:45:15 (7524): checkpoint interval: 0h15m00s00 (task 1075200 GFLOPS, 0h01m54s44 per packet)
17:45:15 (7524): wrapper: running dnetc520-win32-x86-opencl.exe (-ini dnetc.ini -runoffline -multiok=1) - attempt 1/10

dnetc v2.9112-521-CTR-16020314 for Win64 (WindowsNT 6.1).
Using email address (distributed.net ID) 'xxxxxxxx@xxxxxxxx'

[May 02 16:45:22 UTC] Automatic processor detection found 8 processors.
[May 02 16:45:22 UTC] Loading crunchers with work...
[May 02 16:45:22 UTC] Automatic processor type detection found
an Intel Core iX-6xxx (Skylake) processor.
[May 02 16:45:22 UTC] OGR-NG: using core #2 (cj-asm-sse2).
[May 02 16:45:22 UTC] RC5-72: using core #4 (YK AVX2).


So apparently these "Distributed.net Client v1.04 (opencl_nvidia_101) windows_intelx86" tasks are running on the CPU?!
ID: 7983 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
CJ Xuereb

Send message
Joined: 2 Jun 17
Posts: 1
Credit: 14,018,194
RAC: 60,401
Message 7984 - Posted: 3 May 2018, 1:28:30 UTC - in response to Message 7982.  

I have a host with two GTX1080s in http://moowrap.net/show_host_detail.php?hostid=798990 and up until a few hours ago they were both crunching the same "Distributed.net Client v1.03 (cuda31) windows_intelx86" tasks quite happily.

Now I'm only getting "Distributed.net Client v1.04 (opencl_nvidia_101) windows_intelx86" tasks which all fail after about 20 seconds with "computation error"

The project attempts to run one task per GPU but GPU-Z shows no load and no power used.

It may have something to do with the message "[May 02 03:32:57 UTC] Automatic processor type detection did not recognize the processor (tag: "Intel(R) HD Graphics 530 ")" in the stderr file.

No changes at my end, another hosts with the same GPUs is also getting the opencl tasks and running them although they are claiming to need over six hours (and increasing) to finish!

I don't use an app_config.


For at least those people who use Norton Antivirus (as I do), Norton is treating the file as having a virus, blocks it and this results in a computation error when the task is run.


Filename: dnetc520-win32-x86-opencl.exe
Threat name: Heur.AdvML.CFull Path: d:\programdata\boinc\projects\moowrap.net\dnetc520-win32-x86-opencl.exe
____________________________

On computers as of 
03/05/2018 at 09:20:11

Last Used 
03/05/2018 at 09:20:11

Startup Item 
No

Launched 
No

Threat type: Heuristic Virus. Detection of a threat based on malware heuristics.

____________________________

dnetc520-win32-x86-opencl.exe Threat name: Heur.AdvML.C
Locate


Few Users
Hundreds of users in the Norton Community have used this file.

Mature
This file was released 5 years 2 months ago.

High
This file risk is high.

____________________________

http://moowrap.net/download/dnetc520-win32-x86-opencl.exe
Downloaded File from moowrap.net
Source: External Media

____________________________

File Actions

File: d:\programdata\boinc\projects\moowrap.net\ dnetc520-win32-x86-opencl.exe Blocked
____________________________

File Thumbprint - SHA:
Not available
File Thumbprint - MD5:
Not available
ID: 7984 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MC_IVBE

Send message
Joined: 19 Dec 16
Posts: 1
Credit: 22,392,704
RAC: 115
Message 7985 - Posted: 3 May 2018, 8:35:12 UTC - in response to Message 7984.  

Same problem here. I have Norton and it's spitting out a message that one of the downloaded files for the new version is a virus.

dnetc520-win32-x86-opencl.exe

I don't know if this is a false positive or not by Symantec.
ID: 7985 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2063
Credit: 1,000,866,048
RAC: 0
Message 7986 - Posted: 3 May 2018, 11:40:20 UTC - in response to Message 7985.  

Same problem here. I have Norton and it's spitting out a message that one of the downloaded files for the new version is a virus.

dnetc520-win32-x86-opencl.exe

I don't know if this is a false positive or not by Symantec.


Set Norton to ignore the Boinc directories, any 'false positive' will be ignored and any real virus that tries to get out of the Boinc directories will be caught. 'False positives' are a problem for Boinc due to the way some projects send and receive data fairly often, mimicking a virus collecting data.
ID: 7986 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Woodles

Send message
Joined: 6 May 11
Posts: 4
Credit: 890,049,823
RAC: 829,231
Message 7987 - Posted: 4 May 2018, 14:49:21 UTC
Last modified: 4 May 2018, 14:52:42 UTC

It's happened before but it's not my issue, I don't run any anti virus on my crunching machines.

I have several hosts running Moo and the only ones having problems are the ones with GTX 1080 GPUs in and only with Opencl tasks, everything else runs without problems.

Probably a driver problem but I'm not going to mess around with things at the moment. No matter, I'll move them to a different project.
ID: 7987 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Teemu Mannermaa
Project administrator
Project developer
Project tester

Send message
Joined: 20 Apr 11
Posts: 363
Credit: 807,087,510
RAC: 60,757
Message 7988 - Posted: 4 May 2018, 16:29:55 UTC - in response to Message 7982.  

It may have something to do with the message "[May 02 03:32:57 UTC] Automatic processor type detection did not recognize the processor (tag: "Intel(R) HD Graphics 530 ")" in the stderr file.


Right, you seem to have internal Intel GPU as well and Dnet OpenCL app detects that and tries to use but fails to. There's two problems here, first that the Intel GPU fails to run the app and second that it gets detected by the Dnet app so the GPU numbering is out of sync with BOINC Client.

BOINC Schduler should have given up after 10 failures on the OpenCL app but doesn't seem to be doing that. I'll try to figure out if this can be fixed (you should only be getting the cuda31 app until problems with Intel GPUs can be fixed).

-w
ID: 7988 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
xii5ku

Send message
Joined: 16 Mar 17
Posts: 1
Credit: 127,814,201
RAC: 465
Message 7989 - Posted: 10 May 2018, 8:03:16 UTC
Last modified: 10 May 2018, 8:26:26 UTC

On a GTX 1070, Windows 7 64 bit, driver 384.94, which runs various CUDA and OpenCL applications just fine, all "Distributed.net Client v1.04 (opencl_nvidia_101) windows_intelx86" tasks fail with "No OpenCL platforms available!".

- edit -
Nevermind. Something must have shut down uncleanly before this. I rebooted the computer, and now it's running just fine.
ID: 7989 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 5 Dec 17
Posts: 1
Credit: 33,954,322
RAC: 5
Message 7990 - Posted: 10 May 2018, 12:27:36 UTC
Last modified: 10 May 2018, 13:07:18 UTC

Same problem as Woodles, i7 CPU based GPU racks (with 1080/1080 Ti's) producing computational errors.

I have only tried one rack so far with 3x 1080 but every WU failed.

Our GPU racks only use the 1080/1080 Ti's for crunching, but never the IGPUs. (intel).

This is the host:

https://moowrap.net/show_host_detail.php?hostid=984556

I just tried running Moo on Xeon boxes and they run fine, under the same NVidia driver. So the problem does seem to be related to the i7 CPUs based Racks.
ID: 7990 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Woodles

Send message
Joined: 6 May 11
Posts: 4
Credit: 890,049,823
RAC: 829,231
Message 7991 - Posted: 10 May 2018, 12:30:40 UTC - in response to Message 7988.  

Right, you seem to have internal Intel GPU as well and Dnet OpenCL app detects that and tries to use but fails to. There's two problems here, first that the Intel GPU fails to run the app and second that it gets detected by the Dnet app so the GPU numbering is out of sync with BOINC Client.

BOINC Schduler should have given up after 10 failures on the OpenCL app but doesn't seem to be doing that. I'll try to figure out if this can be fixed (you should only be getting the cuda31 app until problems with Intel GPUs can be fixed).

-w

I've never used the Intel GPU for crunching so it's probably got none of the correct drivers.

The task does indeed give up after ten goes of trying to use the Intel GPU ... and then moves onto the next task and repeats.

I tried to download work again last night and got "191 Distributed.net Client v1.04 (opencl_nvidia_101)
windows_intelx86" tasks.

Some Cuda ones would be nice as the external GPU has no issues with them.
ID: 7991 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
trebotuet

Send message
Joined: 14 Nov 17
Posts: 5
Credit: 80,064,037
RAC: 414,758
Message 7997 - Posted: 18 May 2018, 12:26:02 UTC - in response to Message 7991.  

I don´t know if I have the same problems, but I´ve create a new thread because I haven´t seen this before.

https://moowrap.net/forum_user_posts.php?userid=254811
ID: 7997 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : BOINC Scheduler changes for multiple app version case


 
Copyright © 2011-2018 Moo! Wrapper Project