OpenCL versus Stream/CAL on distributed.net / Moo client.

Author	Message
[AF>FAH-Addict.net]toTOW Send message Joined: 4 May 11 Posts: 27 Credit: 116,370,371 RAC: 7	Message 3988 - Posted: 14 Oct 2012, 22:14:07 UTC There is an upcoming distributed.net OpenCL client in the pipe to support HD7k GPUs. I tested it on my HD6950 (800 MHz, but with reactivated shaders like on HD6970) with 12.6 drivers and I have one good news and one bad news. The good one : performances are identical with Stream/CAL client and with OpenCL one :) Here are the numbers with Stream/CAL client : [Oct 14 21:36:18 UTC] RC5-72: Summary: 6 packets (332.00 stats units) 0.00:12:32.06 - [1,825.43 Mkeys/s] And then with OpenCL one : [Oct 14 21:52:08 UTC] RC5-72: Summary: 6 packets (351.00 stats units) 0.00:13:34.24 - [1,818.48 Mkeys/s] The bad news is : like any other OpenCL application that I know about, it requires one free CPU core to feed the GPU or you'll get pathetic performances :/ With OpenCL client, with BOINC client shut down : [Oct 14 21:52:22 UTC] RC5-72: using core #0 (CL ANSI 1-pipe). [Oct 14 21:52:36 UTC] RC5-72: Benchmark for core #0 (CL ANSI 1-pipe) 0.00:00:11.09 [389,851,265 keys/sec] [Oct 14 21:52:36 UTC] RC5-72: using core #1 (CL 1-pipe). [Oct 14 21:52:45 UTC] RC5-72: Benchmark for core #1 (CL 1-pipe) 0.00:00:07.02 [624,650,764 keys/sec] [Oct 14 21:52:45 UTC] RC5-72: using core #2 (CL 2-pipe). [Oct 14 21:52:51 UTC] RC5-72: Benchmark for core #2 (CL 2-pipe) 0.00:00:03.47 [1,259,728,448 keys/sec] [Oct 14 21:52:51 UTC] RC5-72: using core #3 (CL 4-pipe). [Oct 14 21:52:56 UTC] RC5-72: Benchmark for core #3 (CL 4-pipe) 0.00:00:02.41 [1,814,054,483 keys/sec] [Oct 14 21:52:56 UTC] RC5-72 benchmark summary : Default core : #3 (CL 4-pipe) 1,814,054,483 keys/sec Fastest core : #3 (CL 4-pipe) 1,814,054,483 keys/sec And with BOINC client running on the CPU (8 SIMAP cores), with OpenCL client still : [Oct 14 21:54:04 UTC] RC5-72: using core #0 (CL ANSI 1-pipe). [Oct 14 21:54:23 UTC] RC5-72: Benchmark for core #0 (CL ANSI 1-pipe) 0.00:00:16.98 [232,066,050 keys/sec] [Oct 14 21:54:23 UTC] RC5-72: using core #1 (CL 1-pipe). [Oct 14 21:54:42 UTC] RC5-72: Benchmark for core #1 (CL 1-pipe) 0.00:00:16.36 [174,500,672 keys/sec] [Oct 14 21:54:42 UTC] RC5-72: using core #2 (CL 2-pipe). [Oct 14 21:54:56 UTC] RC5-72: Benchmark for core #2 (CL 2-pipe) 0.00:00:12.15 [356,654,108 keys/sec] [Oct 14 21:54:56 UTC] RC5-72: using core #3 (CL 4-pipe). [Oct 14 21:55:05 UTC] RC5-72: Benchmark for core #3 (CL 4-pipe) 0.00:00:07.05 [628,047,103 keys/sec] [Oct 14 21:55:05 UTC] RC5-72 benchmark summary : Default core : #3 (CL 4-pipe) 628,047,103 keys/sec Fastest core : #3 (CL 4-pipe) 628,047,103 keys/sec This time, BOINC client shut down and with Stream/CAL client : [Oct 14 21:58:01 UTC] RC5-72: using core #0 (IL 4-pipe c). [Oct 14 21:58:05 UTC] RC5-72: Benchmark for core #0 (IL 4-pipe c) 0.00:00:02.41 [1,824,758,914 keys/sec] [Oct 14 21:58:05 UTC] RC5-72: using core #1 (IL 4-pipe c alt). [Oct 14 21:58:12 UTC] RC5-72: Benchmark for core #1 (IL 4-pipe c alt) 0.00:00:04.25 [1,041,056,719 keys/sec] [Oct 14 21:58:12 UTC] RC5-72: using core #2 (IL 4-pipe 2 threads). [Oct 14 21:58:17 UTC] RC5-72: Benchmark for core #2 (IL 4-pipe 2 threads) 0.00:00:03.05 [1,421,427,937 keys/sec] [Oct 14 21:58:17 UTC] RC5-72: using core #3 (IL 4-pipe cs-1). [Oct 14 21:58:22 UTC] RC5-72: Benchmark for core #3 (IL 4-pipe cs-1) 0.00:00:02.35 [1,878,393,210 keys/sec] [Oct 14 21:58:22 UTC] RC5-72 benchmark summary : Default core : #0 (IL 4-pipe c) 1,824,758,914 keys/sec Fastest core : #3 (IL 4-pipe cs-1) 1,878,393,210 keys/sec And finally, with BOINC client running (8 SIMAP cores),with Stream/CAL again : [Oct 14 21:58:46 UTC] RC5-72: using core #0 (IL 4-pipe c). [Oct 14 21:58:51 UTC] RC5-72: Benchmark for core #0 (IL 4-pipe c) 0.00:00:02.41 [1,828,166,073 keys/sec] [Oct 14 21:58:51 UTC] RC5-72: using core #1 (IL 4-pipe c alt). [Oct 14 21:58:58 UTC] RC5-72: Benchmark for core #1 (IL 4-pipe c alt) 0.00:00:04.25 [1,023,539,257 keys/sec] [Oct 14 21:58:58 UTC] RC5-72: using core #2 (IL 4-pipe 2 threads). [Oct 14 21:59:03 UTC] RC5-72: Benchmark for core #2 (IL 4-pipe 2 threads) 0.00:00:03.01 [1,462,924,800 keys/sec] [Oct 14 21:59:03 UTC] RC5-72: using core #3 (IL 4-pipe cs-1). [Oct 14 21:59:08 UTC] RC5-72: Benchmark for core #3 (IL 4-pipe cs-1) 0.00:00:02.35 [1,857,392,644 keys/sec] [Oct 14 21:59:08 UTC] RC5-72 benchmark summary : Default core : #0 (IL 4-pipe c) 1,828,166,073 keys/sec Fastest core : #3 (IL 4-pipe cs-1) 1,857,392,644 keys/sec I guess you'll have to set up the OpenCL application in Moo to run with 1 CPU + x GPUs ... ID: 3988 · Rating: 0 · rate: /

$View the profile of STE\/E$ STE\/E Send message Joined: 2 May 11 Posts: 57 Credit: 250,035,598 RAC: 0	Message 4016 - Posted: 20 Oct 2012, 22:03:30 UTC How did you get it to work with the Boinc Client, I couldn't get it to work ??? Thanks ID: 4016 · Rating: 0 · rate: /

[AF>FAH-Addict.net]toTOW Send message Joined: 4 May 11 Posts: 27 Credit: 116,370,371 RAC: 7	Message 4053 - Posted: 25 Oct 2012, 23:52:00 UTC I didn't run it under BOINC. I did my tests with the distributed.net clients. ID: 4053 · Rating: 0 · rate: /

Szopler Send message Joined: 3 May 11 Posts: 8 Credit: 15,002,506 RAC: 0	Message 4630 - Posted: 7 Mar 2013, 22:38:09 UTC Give it to us! Please! In POEM there is the same problem or worse - few cpu cores needed to feed 7770 so one core in Moo! won't be a problem! :) ID: 4630 · Rating: 0 · rate: /

mikey Send message Joined: 22 Jun 11 Posts: 2080 Credit: 1,854,430,696 RAC: 0	Message 4631 - Posted: 8 Mar 2013, 12:04:01 UTC - in response to Message 4630. Give it to us! Please! In POEM there is the same problem or worse - few cpu cores needed to feed 7770 so one core in Moo! won't be a problem! :) DistRTgen too, one cpu core for each gpu, both AMD and Nvidia need a cpu free to feed the gpu. I only have a single unit running on each gpu, so can't say if you run more then one unit at a time if another cpu core would be needed or not. ID: 4631 · Rating: 0 · rate: /

Matthias Lehmkuhl Send message Joined: 22 Oct 11 Posts: 4 Credit: 4,795,463 RAC: 2,215	Message 4652 - Posted: 13 Mar 2013, 21:01:11 UTC Last modified: 13 Mar 2013, 21:31:41 UTC one additional point, the CPU usage for a result is 100% in boincclient the settings for Distributed.net Client v1.03 (cuda31) are set to 0,2C and 1NV can you set this please to 1C and 1NV This is done on POEM, and it will prevent other results getting problems with the time line while not able to use the CPU when a Distributed.net Client v1.03 (cuda31) result is running. edit: extract from POEM in client_state.xml <app_version> <app_name>poemcl</app_name> <version_num>105</version_num> <platform>windows_intelx86</platform> <avg_ncpus>1.000000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <flops>955157838839.538820</flops> <plan_class>opencl_nvidia_100</plan_class> <api_version>7.1.0</api_version> <file_ref> <file_name>poemcl_1.5_windows_intelx86__opencl_nvidia_100</file_name> <main_program/> </file_ref> <coproc> <type>NVIDIA</type> <count>1.000000</count> </coproc> <gpu_ram>268435456.000000</gpu_ram> </app_version> and now I found the settings for Moo! Wrapper <app_version> <app_name>dnetc</app_name> <version_num>103</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.200000</avg_ncpus> <max_ncpus>0.200000</max_ncpus> <flops>117464525973.684710</flops> <plan_class>cuda31</plan_class> <api_version>6.13.12</api_version> <file_ref> <file_name>dnetc_wrapper_1.3_windows_intelx86__cuda31.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>dnetc518-win32-x86-cuda31.exe</file_name> <copy_file/> </file_ref> <file_ref> <file_name>dnetc-gpu-1.3.ini</file_name> <open_name>dnetc.ini</open_name> <copy_file/> </file_ref> <file_ref> <file_name>job-cuda31-1.00.xml</file_name> <open_name>job.xml</open_name> <copy_file/> </file_ref> <file_ref> <file_name>cudart32_31_9.dll</file_name> <copy_file/> </file_ref> <coproc> <type>NVIDIA</type> <count>1.000000</count> </coproc> <gpu_ram>33554432.000000</gpu_ram> </app_version> Matthias ID: 4652 · Rating: 0 · rate: /

mikey Send message Joined: 22 Jun 11 Posts: 2080 Credit: 1,854,430,696 RAC: 0	Message 4653 - Posted: 13 Mar 2013, 21:16:26 UTC - in response to Message 4652. one additional point, the CPU usage for a result is 100% in boincclient the settings for Distributed.net Client v1.03 (cuda31) are set to 0,2C and 1NV can you set this please to 1C and 1NV This is done on POEM, and it will prevent other results getting problems with the time line while not able to use the CPU when a Distributed.net Client v1.03 (cuda31) result is running. Boinc is a funky bird sometimes, although they use the same software most projects do not talk among themselves about how they write their own version of the software they use for crunching. Boinc would like Office, lots of people use it but most use different formulas. I am NOT saying they don't, but most don't on a regular basis. ID: 4653 · Rating: 0 · rate: /

Matthias Lehmkuhl Send message Joined: 22 Oct 11 Posts: 4 Credit: 4,795,463 RAC: 2,215	Message 4679 - Posted: 21 Mar 2013, 9:02:21 UTC - in response to Message 4653. one additional point, the CPU usage for a result is 100% in boincclient the settings for Distributed.net Client v1.03 (cuda31) are set to 0,2C and 1NV can you set this please to 1C and 1NV ... Boinc is a funky bird sometimes, although they use the same software most projects do not talk among themselves about how they write their own version of the software they use for crunching. Boinc would like Office, lots of people use it but most use different formulas. I am NOT saying they don't, but most don't on a regular basis. Thats right ;-) Could change the settings locally by app_config.xml Now it could work when Primgrid has finished "high priority working" I'll see Matthias ID: 4679 · Rating: 0 · rate: /

Matthias Lehmkuhl Send message Joined: 22 Oct 11 Posts: 4 Credit: 4,795,463 RAC: 2,215	Message 4684 - Posted: 21 Mar 2013, 21:34:12 UTC - in response to Message 4679. It's working like expected. using Boinc 7.0.52 Matthias ID: 4684 · Rating: 0 · rate: /

nanoprobe Send message Joined: 31 Mar 13 Posts: 3 Credit: 326,936 RAC: 0	Message 4726 - Posted: 31 Mar 2013, 23:10:13 UTC ATI <app_version> <app_name>dnetc</app_name> <version_num>103</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.050000</avg_ncpus> <max_ncpus>0.050000</max_ncpus> <flops>19368306992.516346</flops> <plan_class>ati14</plan_class> <api_version>6.13.12</api_version> Nvidia <app_version> <app_name>dnetc</app_name> <version_num>103</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.200000</avg_ncpus> <max_ncpus>0.200000</max_ncpus> <flops>117464525973.684710</flops> <plan_class>cuda31</plan_class> <api_version>6.13.12</api_version> If you compare the client_state.xml files it will probably less CPU intensive on ATI cards than Nvidia cards. JMHO ID: 4726 · Rating: 0 · rate: /