Questions and Answers :
Preferences :
Client core values
Message board moderation
Author | Message |
---|---|
Send message Joined: 2 May 11 Posts: 47 Credit: 319,540,306 RAC: 0 |
Can someone please define what each of the values mean? From reading the various threads, it looks like range is -1 through 10, only integer values. Is this correct? I know -1 means auto-select. But what to all the rest of the values mean? Can someone please post the definitions? -1 = auto-select 0 = ? 1 = ? 2 = ? 3 = ? 4 = ? 5 = ? 6 = ? 7 = ? 8 = ? 9 = ? 10 = ? Reno, NV Team SETI.USA |
Send message Joined: 2 May 11 Posts: 65 Credit: 242,754,987 RAC: 0 |
Different cores simply means different algorithms used by d.net client to maximize computation with your current hardware. By maximizing computations means crunching more keys/sec. If you look at your task, you'll see, how fast your tasks were completed. The higher speed, the better you're, cause it will take less time to complete a work unit. example: http://moowrap.net/result.php?resultid=75750 you'll see: [May 10 08:49:57 UTC] RC5-72: Summary: 7 packets (448.00 stats units) 0.00:11:09.51 - [2,873.96 Mkeys/s] which means, that tasks 2873960 keys/sec. By default, having it to -1 should be already maximized. Some users notices if they changed it to some other values maximized their computation. The names aren't really important. |
Send message Joined: 2 May 11 Posts: 47 Credit: 319,540,306 RAC: 0 |
Thanks for the answer, but I am not sure what to do with the answer: -1 should be best, but maybe not. Yes, I can look at the performance of a particular task, but that does not tell me how "good" that performance is, or how it would have compared to the exact same task with a different value. How do I know this? Reno, NV Team SETI.USA |
Send message Joined: 2 May 11 Posts: 65 Credit: 242,754,987 RAC: 0 |
By looking at the task name, you'll notice something like: dnetc_r72_1304995456_8_477_0 The interesting part is: _477_ this means that work units count for 477 stats units on distributed.net stats. Normally it's around 448. If you change core, you'll see a different speed in those 2 tasks. the _good_ simply means the faster. By using 2 tasks with 477 stats, if one core is faster, it will take less time to complete. That would be great if the BOINC benchmark (in the BOINC manager) would do a benchmark for existing GPUs too. |
Send message Joined: 2 May 11 Posts: 47 Credit: 319,540,306 RAC: 0 |
So the answer is "trial and error"? Mess around until one value rises to the top? Reno, NV Team SETI.USA |
Send message Joined: 2 May 11 Posts: 65 Credit: 242,754,987 RAC: 0 |
I would say yes, you have to go with trial and error. the goal of the auto-detect (-1) is to avoid that trial & error method. It works fine on some system, but sadly, it doesn't work well on ALL systems. If you find something better then the auto-detect, feel free to publish it for other users having the same setup as you. |
Send message Joined: 2 May 11 Posts: 47 Credit: 319,540,306 RAC: 0 |
Heh. I am not good at the math or the programing. I am just trying to understand how things work. So can the different values/formulas be posted? Also, do they end at 10? Or more? What is the range? Reno, NV Team SETI.USA |
Send message Joined: 2 May 11 Posts: 27 Credit: 1,151,788 RAC: 0 |
Can someone please define what each of the values mean? From reading the various threads, it looks like range is -1 through 10, only integer values. Is this correct? I know -1 means auto-select. But what to all the rest of the values mean? Can someone please post the definitions? you can look yourself by starting the app from a command prompt with "--config". select 3 and then 1.. for CUDA it's: RC5-72:-1) Auto select 0) CUDA 1-pipe 64-thd 1) CUDA 1-pipe 128-thd 2) CUDA 1-pipe 256-thd 3) CUDA 2-pipe 64-thd 4) CUDA 2-pipe 128-thd 5) CUDA 2-pipe 256-thd 6) CUDA 4-pipe 64-thd 7) CUDA 4-pipe 128-thd 8) CUDA 4-pipe 256-thd 9) CUDA 1-pipe 64-thd busy wait 10) CUDA 1-pipe 64-thd sleep 100us 11) CUDA 1-pipe 64-thd sleep dynamic |
Send message Joined: 4 May 11 Posts: 7 Credit: 224,776,119 RAC: 0 |
So I did the bench test and found core 3 was significantly faster. As such I configured the app to use core 3. From the results since I made that configuration change the core setting seems to be ignored at the app continues to use core 0 |
Send message Joined: 2 May 11 Posts: 47 Credit: 319,540,306 RAC: 0 |
So I did the bench test and found core 3 was significantly faster. As such I configured the app to use core 3. From the results since I made that configuration change the core setting seems to be ignored at the app continues to use core 0 You have to configure it in your project preferences page, here at the project. If you configure it from the command line, it gets over-written by the project preferences with the next task. Reno, NV Team SETI.USA |
Send message Joined: 8 May 11 Posts: 11 Credit: 1,075,941 RAC: 0 |
So I did the bench test and found core 3 was significantly faster. As such I configured the app to use core 3. From the results since I made that configuration change the core setting seems to be ignored at the app continues to use core 0 I can only speak for myself but selecting any other core than the default -1 didn't work with the windows 1.01 wrapper/app (The app exits and does nothing, after ten retries it's calculation error time). The Linux 1.01 wrapper/app happily ignores the setting (Core #7 is the fastest here). Note: The Linux BOINC client just downloaded the new 1.02 wrapper/app. I have aborted all 1.01 tasks. 1.02 does work as expected. |
Send message Joined: 8 May 11 Posts: 11 Credit: 1,075,941 RAC: 0 |
So the answer is "trial and error"? Mess around until one value rises to the top? Copy the executable (from DNETC) to another directory and start it with --bench (pause BOINC in the meantime). You can log the output with -l somefilename. Here is an example run (Linux/GTX 470): Some useless info removed [May 18 14:04:31 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd). [May 18 14:04:42 UTC] RC5-72: Benchmark for core #0 (CUDA 1-pipe 64-thd) 0.00:00:08.79 [491,117,547 keys/sec] [May 18 14:04:42 UTC] RC5-72: using core #1 (CUDA 1-pipe 128-thd). [May 18 14:04:52 UTC] RC5-72: Benchmark for core #1 (CUDA 1-pipe 128-thd) 0.00:00:08.51 [507,606,843 keys/sec] [May 18 14:04:52 UTC] RC5-72: using core #2 (CUDA 1-pipe 256-thd). [May 18 14:05:03 UTC] RC5-72: Benchmark for core #2 (CUDA 1-pipe 256-thd) 0.00:00:08.59 [502,783,808 keys/sec] [May 18 14:05:03 UTC] RC5-72: using core #3 (CUDA 2-pipe 64-thd). [May 18 14:05:14 UTC] RC5-72: Benchmark for core #3 (CUDA 2-pipe 64-thd) 0.00:00:08.65 [498,891,716 keys/sec] [May 18 14:05:14 UTC] RC5-72: using core #4 (CUDA 2-pipe 128-thd). [May 18 14:05:25 UTC] RC5-72: Benchmark for core #4 (CUDA 2-pipe 128-thd) 0.00:00:08.40 [514,268,438 keys/sec] [May 18 14:05:25 UTC] RC5-72: using core #5 (CUDA 2-pipe 256-thd). [May 18 14:05:35 UTC] RC5-72: Benchmark for core #5 (CUDA 2-pipe 256-thd) 0.00:00:08.48 [509,478,872 keys/sec] [May 18 14:05:35 UTC] RC5-72: using core #6 (CUDA 4-pipe 64-thd). [May 18 14:05:46 UTC] RC5-72: Benchmark for core #6 (CUDA 4-pipe 64-thd) 0.00:00:08.59 [502,289,441 keys/sec] [May 18 14:05:46 UTC] RC5-72: using core #7 (CUDA 4-pipe 128-thd). [May 18 14:05:56 UTC] RC5-72: Benchmark for core #7 (CUDA 4-pipe 128-thd) 0.00:00:08.36 [517,607,292 keys/sec] [May 18 14:05:56 UTC] RC5-72: using core #8 (CUDA 4-pipe 256-thd). [May 18 14:06:07 UTC] RC5-72: Benchmark for core #8 (CUDA 4-pipe 256-thd) 0.00:00:08.43 [512,468,703 keys/sec] [May 18 14:06:07 UTC] RC5-72: using core #9 (CUDA 1-pipe 64-thd busy wait). [May 18 14:06:19 UTC] RC5-72: Benchmark for core #9 (CUDA 1-pipe 64-thd busy wait) 0.00:00:08.78 [491,695,519 keys/sec] [May 18 14:06:19 UTC] RC5-72: using core #10 (CUDA 1-pipe 64-thd sleep 100us). [May 18 14:06:30 UTC] RC5-72: Benchmark for core #10 (CUDA 1-pipe 64-thd sleep 100us) 0.00:00:08.82 [489,091,617 keys/sec] [May 18 14:06:30 UTC] RC5-72: using core #11 (CUDA 1-pipe 64-thd sleep dynamic). [May 18 14:06:41 UTC] RC5-72: Benchmark for core #11 (CUDA 1-pipe 64-thd sleep dynamic) 0.00:00:08.78 [491,651,933 keys/sec] [May 18 14:06:41 UTC] RC5-72 benchmark summary : Default core : #0 (CUDA 1-pipe 64-thd) Fastest core : #7 (CUDA 4-pipe 128-thd) [May 18 14:06:41 UTC] Core #7 is significantly faster than the default core. The GPU core selection has been made as a tradeoff between core speed and responsiveness of the graphical desktop. Please file a bug report along with the output of -cpuinfo only if the the faster core selection does not degrade graphics performance. They key info is: Fastest core : #7 (CUDA 4-pipe 128-thd) Your mileage may vary. This info is also important: [May 18 14:06:41 UTC] Core #7 is significantly faster than the default core. The GPU core selection has been made as a tradeoff between core speed and responsiveness of the graphical desktop. Please file a bug report along with the output of -cpuinfo only if the the faster core selection does not degrade graphics performance. If the selected core makes problems (laggy screen updates, too much CPU load, GPUs glowing red) you can try a another core. Cores number 9, 10 and 11 use the same code as core number 1 but coordinate the CPU and the GPU by different methods. The other cores are also variations of core #1 but they differ in the way they divide the work that is sent to the multiprocessing units of the GPU. |
Send message Joined: 4 May 11 Posts: 7 Credit: 224,776,119 RAC: 0 |
So I did the bench test and found core 3 was significantly faster. As such I configured the app to use core 3. From the results since I made that configuration change the core setting seems to be ignored at the app continues to use core 0 So that doesn't make any sense. What if you have more than one system running different cards and core 1 is best on one card core 2 on the other etc etc... shouldn't that be set per card, not at the project level? |
Send message Joined: 2 May 11 Posts: 47 Credit: 319,540,306 RAC: 0 |
What if you have more than one system running different cards and core 1 is best on one card core 2 on the other etc etc... shouldn't that be set per card, not at the project level? That is what you use the different locations for. For example, you can have for example, Home (with "3") for 5870s, and Work (with "1") for my 4870s. Then assign the machines to the locations accordingly. Reno, NV Team SETI.USA |
Send message Joined: 4 May 11 Posts: 7 Credit: 224,776,119 RAC: 0 |
You can't do that if the cards that require the different cores are in the same machine. |
Send message Joined: 2 May 11 Posts: 65 Credit: 242,754,987 RAC: 0 |
Mr. Hankey, what types of cards do you have in that machine? |
Send message Joined: 19 May 11 Posts: 7 Credit: 12,852,260 RAC: 0 |
@Clod, I suspect something like, ATI 5770 + ATI 5850 5770 good with 0. 5850 good with 3. Eg. NVIDIA 8800gtx + ATI 5850 I got 1 cruncher configured as such, but not in this project. Maybe the wrapper could read off a best-plan file (provided thru the project) and set the right core instead. Am sure the community will already provided best-core for different GPU. Add something like "Use best-core plan" in preference page. I sorely miss this future "best-plan" feature. :) |
Send message Joined: 2 May 11 Posts: 65 Credit: 242,754,987 RAC: 0 |
im curious about your "best-plan" feature. What would you do in a technical side? |
Send message Joined: 18 May 11 Posts: 46 Credit: 1,254,302,893 RAC: 0 |
Based on the information in this thread I tried core 3 on both an HD5850 and HD5770. Core 0 (auto detected by selecting -1) was faster here on both. Auto (0) was much faster on the HD5770 and a bit faster than 3 on the HD5850. This was determined by running several complete WUs and comparing both the keys/sec and the completion time for the same credit values. YMMV. |