OpenCL versus Stream/CAL on distributed.net / Moo client.

\n studio-striking\n

Message boards : Number crunching : OpenCL versus Stream/CAL on distributed.net / Moo client.
Message board moderation

To post messages, you must log in.

AuthorMessage
[AF>FAH-Addict.net]toTOW

Send message
Joined: 4 May 11
Posts: 27
Credit: 112,091,698
RAC: 0
Message 3988 - Posted: 14 Oct 2012, 22:14:07 UTC

There is an upcoming distributed.net OpenCL client in the pipe to support HD7k GPUs. I tested it on my HD6950 (800 MHz, but with reactivated shaders like on HD6970) with 12.6 drivers and I have one good news and one bad news.

The good one : performances are identical with Stream/CAL client and with OpenCL one :)

Here are the numbers with Stream/CAL client :
[Oct 14 21:36:18 UTC] RC5-72: Summary: 6 packets (332.00 stats units)
0.00:12:32.06 - [1,825.43 Mkeys/s]


And then with OpenCL one :
[Oct 14 21:52:08 UTC] RC5-72: Summary: 6 packets (351.00 stats units)
0.00:13:34.24 - [1,818.48 Mkeys/s]



The bad news is : like any other OpenCL application that I know about, it requires one free CPU core to feed the GPU or you'll get pathetic performances :/

With OpenCL client, with BOINC client shut down :
[Oct 14 21:52:22 UTC] RC5-72: using core #0 (CL ANSI 1-pipe).
[Oct 14 21:52:36 UTC] RC5-72: Benchmark for core #0 (CL ANSI 1-pipe)
0.00:00:11.09 [389,851,265 keys/sec]
[Oct 14 21:52:36 UTC] RC5-72: using core #1 (CL 1-pipe).
[Oct 14 21:52:45 UTC] RC5-72: Benchmark for core #1 (CL 1-pipe)
0.00:00:07.02 [624,650,764 keys/sec]
[Oct 14 21:52:45 UTC] RC5-72: using core #2 (CL 2-pipe).
[Oct 14 21:52:51 UTC] RC5-72: Benchmark for core #2 (CL 2-pipe)
0.00:00:03.47 [1,259,728,448 keys/sec]
[Oct 14 21:52:51 UTC] RC5-72: using core #3 (CL 4-pipe).
[Oct 14 21:52:56 UTC] RC5-72: Benchmark for core #3 (CL 4-pipe)
0.00:00:02.41 [1,814,054,483 keys/sec]
[Oct 14 21:52:56 UTC] RC5-72 benchmark summary :
Default core : #3 (CL 4-pipe) 1,814,054,483 keys/sec
Fastest core : #3 (CL 4-pipe) 1,814,054,483 keys/sec


And with BOINC client running on the CPU (8 SIMAP cores), with OpenCL client still :
[Oct 14 21:54:04 UTC] RC5-72: using core #0 (CL ANSI 1-pipe).
[Oct 14 21:54:23 UTC] RC5-72: Benchmark for core #0 (CL ANSI 1-pipe)
0.00:00:16.98 [232,066,050 keys/sec]
[Oct 14 21:54:23 UTC] RC5-72: using core #1 (CL 1-pipe).
[Oct 14 21:54:42 UTC] RC5-72: Benchmark for core #1 (CL 1-pipe)
0.00:00:16.36 [174,500,672 keys/sec]
[Oct 14 21:54:42 UTC] RC5-72: using core #2 (CL 2-pipe).
[Oct 14 21:54:56 UTC] RC5-72: Benchmark for core #2 (CL 2-pipe)
0.00:00:12.15 [356,654,108 keys/sec]
[Oct 14 21:54:56 UTC] RC5-72: using core #3 (CL 4-pipe).
[Oct 14 21:55:05 UTC] RC5-72: Benchmark for core #3 (CL 4-pipe)
0.00:00:07.05 [628,047,103 keys/sec]
[Oct 14 21:55:05 UTC] RC5-72 benchmark summary :
Default core : #3 (CL 4-pipe) 628,047,103 keys/sec
Fastest core : #3 (CL 4-pipe) 628,047,103 keys/sec


This time, BOINC client shut down and with Stream/CAL client :
[Oct 14 21:58:01 UTC] RC5-72: using core #0 (IL 4-pipe c).
[Oct 14 21:58:05 UTC] RC5-72: Benchmark for core #0 (IL 4-pipe c)
0.00:00:02.41 [1,824,758,914 keys/sec]
[Oct 14 21:58:05 UTC] RC5-72: using core #1 (IL 4-pipe c alt).
[Oct 14 21:58:12 UTC] RC5-72: Benchmark for core #1 (IL 4-pipe c alt)
0.00:00:04.25 [1,041,056,719 keys/sec]
[Oct 14 21:58:12 UTC] RC5-72: using core #2 (IL 4-pipe 2 threads).
[Oct 14 21:58:17 UTC] RC5-72: Benchmark for core #2 (IL 4-pipe 2 threads)
0.00:00:03.05 [1,421,427,937 keys/sec]
[Oct 14 21:58:17 UTC] RC5-72: using core #3 (IL 4-pipe cs-1).
[Oct 14 21:58:22 UTC] RC5-72: Benchmark for core #3 (IL 4-pipe cs-1)
0.00:00:02.35 [1,878,393,210 keys/sec]
[Oct 14 21:58:22 UTC] RC5-72 benchmark summary :
Default core : #0 (IL 4-pipe c) 1,824,758,914 keys/sec
Fastest core : #3 (IL 4-pipe cs-1) 1,878,393,210 keys/sec


And finally, with BOINC client running (8 SIMAP cores),with Stream/CAL again :
[Oct 14 21:58:46 UTC] RC5-72: using core #0 (IL 4-pipe c).
[Oct 14 21:58:51 UTC] RC5-72: Benchmark for core #0 (IL 4-pipe c)
0.00:00:02.41 [1,828,166,073 keys/sec]
[Oct 14 21:58:51 UTC] RC5-72: using core #1 (IL 4-pipe c alt).
[Oct 14 21:58:58 UTC] RC5-72: Benchmark for core #1 (IL 4-pipe c alt)
0.00:00:04.25 [1,023,539,257 keys/sec]
[Oct 14 21:58:58 UTC] RC5-72: using core #2 (IL 4-pipe 2 threads).
[Oct 14 21:59:03 UTC] RC5-72: Benchmark for core #2 (IL 4-pipe 2 threads)
0.00:00:03.01 [1,462,924,800 keys/sec]
[Oct 14 21:59:03 UTC] RC5-72: using core #3 (IL 4-pipe cs-1).
[Oct 14 21:59:08 UTC] RC5-72: Benchmark for core #3 (IL 4-pipe cs-1)
0.00:00:02.35 [1,857,392,644 keys/sec]
[Oct 14 21:59:08 UTC] RC5-72 benchmark summary :
Default core : #0 (IL 4-pipe c) 1,828,166,073 keys/sec
Fastest core : #3 (IL 4-pipe cs-1) 1,857,392,644 keys/sec


I guess you'll have to set up the OpenCL application in Moo to run with 1 CPU + x GPUs ...
ID: 3988 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile STE\/E

Send message
Joined: 2 May 11
Posts: 57
Credit: 250,035,598
RAC: 0
Message 4016 - Posted: 20 Oct 2012, 22:03:30 UTC

How did you get it to work with the Boinc Client, I couldn't get it to work ??? Thanks
ID: 4016 · Rating: 0 · rate: Rate + / Rate - Report as offensive
[AF>FAH-Addict.net]toTOW

Send message
Joined: 4 May 11
Posts: 27
Credit: 112,091,698
RAC: 0
Message 4053 - Posted: 25 Oct 2012, 23:52:00 UTC

I didn't run it under BOINC. I did my tests with the distributed.net clients.
ID: 4053 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Szopler

Send message
Joined: 3 May 11
Posts: 8
Credit: 15,002,506
RAC: 0
Message 4630 - Posted: 7 Mar 2013, 22:38:09 UTC

Give it to us! Please!

In POEM there is the same problem or worse - few cpu cores needed to feed 7770 so one core in Moo! won't be a problem! :)
ID: 4630 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,844,066,440
RAC: 94,284
Message 4631 - Posted: 8 Mar 2013, 12:04:01 UTC - in response to Message 4630.  

Give it to us! Please!

In POEM there is the same problem or worse - few cpu cores needed to feed 7770 so one core in Moo! won't be a problem! :)


DistRTgen too, one cpu core for each gpu, both AMD and Nvidia need a cpu free to feed the gpu. I only have a single unit running on each gpu, so can't say if you run more then one unit at a time if another cpu core would be needed or not.
ID: 4631 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Matthias Lehmkuhl

Send message
Joined: 22 Oct 11
Posts: 4
Credit: 3,145,103
RAC: 42
Message 4652 - Posted: 13 Mar 2013, 21:01:11 UTC
Last modified: 13 Mar 2013, 21:31:41 UTC

one additional point, the CPU usage for a result is 100%

in boincclient the settings for
Distributed.net Client v1.03 (cuda31) are set to 0,2C and 1NV
can you set this please to 1C and 1NV
This is done on POEM, and it will prevent other results getting problems with the time line while not able to use the CPU when a Distributed.net Client v1.03 (cuda31) result is running.

edit:
extract from POEM in client_state.xml
<app_version>
<app_name>poemcl</app_name>
<version_num>105</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>1.000000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>

<flops>955157838839.538820</flops>
<plan_class>opencl_nvidia_100</plan_class>
<api_version>7.1.0</api_version>
<file_ref>
<file_name>poemcl_1.5_windows_intelx86__opencl_nvidia_100</file_name>
<main_program/>
</file_ref>
<coproc>
<type>NVIDIA</type>
<count>1.000000</count>
</coproc>
<gpu_ram>268435456.000000</gpu_ram>
</app_version>

and now I found the settings for Moo! Wrapper
<app_version>
<app_name>dnetc</app_name>
<version_num>103</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.200000</avg_ncpus>
<max_ncpus>0.200000</max_ncpus>

<flops>117464525973.684710</flops>
<plan_class>cuda31</plan_class>
<api_version>6.13.12</api_version>
<file_ref>
<file_name>dnetc_wrapper_1.3_windows_intelx86__cuda31.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>dnetc518-win32-x86-cuda31.exe</file_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>dnetc-gpu-1.3.ini</file_name>
<open_name>dnetc.ini</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>job-cuda31-1.00.xml</file_name>
<open_name>job.xml</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>cudart32_31_9.dll</file_name>
<copy_file/>
</file_ref>
<coproc>
<type>NVIDIA</type>
<count>1.000000</count>
</coproc>
<gpu_ram>33554432.000000</gpu_ram>
</app_version>
Matthias
ID: 4652 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mikey
Avatar

Send message
Joined: 22 Jun 11
Posts: 2080
Credit: 1,844,066,440
RAC: 94,284
Message 4653 - Posted: 13 Mar 2013, 21:16:26 UTC - in response to Message 4652.  

one additional point, the CPU usage for a result is 100%

in boincclient the settings for
Distributed.net Client v1.03 (cuda31) are set to 0,2C and 1NV
can you set this please to 1C and 1NV
This is done on POEM, and it will prevent other results getting problems with the time line while not able to use the CPU when a Distributed.net Client v1.03 (cuda31) result is running.


Boinc is a funky bird sometimes, although they use the same software most projects do not talk among themselves about how they write their own version of the software they use for crunching. Boinc would like Office, lots of people use it but most use different formulas. I am NOT saying they don't, but most don't on a regular basis.
ID: 4653 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Matthias Lehmkuhl

Send message
Joined: 22 Oct 11
Posts: 4
Credit: 3,145,103
RAC: 42
Message 4679 - Posted: 21 Mar 2013, 9:02:21 UTC - in response to Message 4653.  

one additional point, the CPU usage for a result is 100%

in boincclient the settings for
Distributed.net Client v1.03 (cuda31) are set to 0,2C and 1NV
can you set this please to 1C and 1NV
...


Boinc is a funky bird sometimes, although they use the same software most projects do not talk among themselves about how they write their own version of the software they use for crunching. Boinc would like Office, lots of people use it but most use different formulas. I am NOT saying they don't, but most don't on a regular basis.


Thats right ;-)

Could change the settings locally by app_config.xml
Now it could work when Primgrid has finished "high priority working"
I'll see
Matthias
ID: 4679 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Matthias Lehmkuhl

Send message
Joined: 22 Oct 11
Posts: 4
Credit: 3,145,103
RAC: 42
Message 4684 - Posted: 21 Mar 2013, 21:34:12 UTC - in response to Message 4679.  

It's working like expected.
using Boinc 7.0.52
Matthias
ID: 4684 · Rating: 0 · rate: Rate + / Rate - Report as offensive
nanoprobe
Avatar

Send message
Joined: 31 Mar 13
Posts: 3
Credit: 326,936
RAC: 0
Message 4726 - Posted: 31 Mar 2013, 23:10:13 UTC

ATI
<app_version>
<app_name>dnetc</app_name>
<version_num>103</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.050000</avg_ncpus>
<max_ncpus>0.050000</max_ncpus>

<flops>19368306992.516346</flops>
<plan_class>ati14</plan_class>
<api_version>6.13.12</api_version>

Nvidia
<app_version>
<app_name>dnetc</app_name>
<version_num>103</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.200000</avg_ncpus>
<max_ncpus>0.200000</max_ncpus>

<flops>117464525973.684710</flops>
<plan_class>cuda31</plan_class>
<api_version>6.13.12</api_version>

If you compare the client_state.xml files it will probably less CPU intensive on ATI cards than Nvidia cards. JMHO
ID: 4726 · Rating: 0 · rate: Rate + / Rate - Report as offensive

Message boards : Number crunching : OpenCL versus Stream/CAL on distributed.net / Moo client.


 
Copyright © 2011-2024 Moo! Wrapper Project