Big issue with HD5870 !!

\n studio-striking\n

Message boards : Number crunching : Big issue with HD5870 !!
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
PAULIN Laurent

Send message
Joined: 7 Nov 11
Posts: 7
Credit: 57,800
RAC: 0
Message 1393 - Posted: 12 Nov 2011, 9:06:20 UTC

Hello !! Sorry for my english, I'm french !!
I've two HD5870 not in CF. I tried this project but the results tasks are noticed invalid: http://moowrap.net/result.php?resultid=4164872
It takes very long time !!


Win7 ultimate 64bits, catalyst 11.7, I7 860

It will be nice to help me.

Thanks
ID: 1393 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Bernt
Avatar

Send message
Joined: 26 May 11
Posts: 568
Credit: 121,524,886
RAC: 0
Message 1397 - Posted: 12 Nov 2011, 14:22:19 UTC - in response to Message 1393.  

Hello !! Sorry for my english, I'm french !!
I've two HD5870 not in CF. I tried this project but the results tasks are noticed invalid: http://moowrap.net/result.php?resultid=4164872
It takes very long time !!


Win7 ultimate 64bits, catalyst 11.7, I7 860

It will be nice to help me.

Thanks


Is it possible to help Laurent to understand what is going on. There is a remark in the WU about ABANDONED. I have never seen this before.

Anyone?
ID: 1397 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile SLAYER OF DEATH

Send message
Joined: 12 Jul 11
Posts: 112
Credit: 229,191,777
RAC: 0
Message 1399 - Posted: 12 Nov 2011, 18:45:41 UTC

First thing I would have said is CCC 11.7 is junk, also 11.8. CCC 11.6 seams the best. Good one Bernt, I looked, but since I really dont know what/where to look for, I learned something, Thanks Bernt. Lets see what others have to say.
ID: 1399 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Bernt
Avatar

Send message
Joined: 26 May 11
Posts: 568
Credit: 121,524,886
RAC: 0
Message 1401 - Posted: 12 Nov 2011, 23:08:19 UTC - in response to Message 1399.  
Last modified: 12 Nov 2011, 23:12:03 UTC

First thing I would have said is CCC 11.7 is junk, also 11.8. CCC 11.6 seams the best. Good one Bernt, I looked, but since I really dont know what/where to look for, I learned something, Thanks Bernt. Lets see what others have to say.


There are two WU´s crunced with invalid status, 4164872 and 4036220. They look very strange to me and they have a very long processing time almost two hours.

Laurent, Maybe you should go back to 11.5 or 11.6 from 11.7. You must make an uninstall of the actual version to get rid of everything before you install a new one. Zydor has made a thread about this action somewhere on the messageboard.

Bernt
ID: 1401 · Rating: 0 · rate: Rate + / Rate - Report as offensive
PAULIN Laurent

Send message
Joined: 7 Nov 11
Posts: 7
Credit: 57,800
RAC: 0
Message 1402 - Posted: 13 Nov 2011, 10:50:25 UTC

Hello !!

Thank you everyone for your answers !!

I've installed ccc 11.6. Nothing's changing.
When I'm starting a WU, the remaining time start at 13 min and increase.
For example: 8.823% 00:17:00 min elapsed and 00:43:42 min remaining : both are increasing toghether.

I've stopped that WU before the end, because the crunch is too long.

I'm using Boinc 6.12.34 X64.

An other idea ?

Thanks
ID: 1402 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 1403 - Posted: 13 Nov 2011, 13:44:42 UTC - in response to Message 1402.  
Last modified: 13 Nov 2011, 13:49:59 UTC

Until the problem is resolved, set your preferences to a cache of zero, that way when you need one you will get one, but only one. At present you are trashing hundreds by downloading a full cache as well as driver changes. Keeping it to zero cache will enable testing with no problem, and avoid excess WU trashing.

Looking at the stderr file of the two invalids, your two cards are not being picked up properly - also the client heartbeat is not being picked up. By now several driver changes have occurred and there will be likely as not, lots of other errors lurking to the extent that its difficult to pick up whats going on. Need to start from square one, and make sure clean files are loaded.

First try one last thing before going the long route .... download the latest BOINC application, go to control panel and deinstall BOINC, run the BOINC install of the file downloaded, fire up BOINC see what happens. If it runs ok, great for now. If not, you'll have to go the long route, which whilst takes time, is not complex, just take it steady stage by stage and you'll be fine - it will take about an hour to do the steps below.

1. Detatch and reattach from Moo (not just a reset), via BOINCStats Host screen

That will ensure you have the latest files from Moo, and good clean copies.

2. Go to http://www.guru3d.com/category/driversweeper/ and download Driver Sweeper, install it.

3. Reboot the PC and go into Safe Mode, then run Driver Sweeper by check marking ONLY the AMD box, click the clean button to get rid of all AMD files.

4. Reboot the PC, and when its settled (likely to be a strange 640x480 screen - dont worry), re-run driver sweeper. If more bits are found, delete them then reboot.

5. If you have a Registry Cleaner, run it when the PC settles, delete whatever entries it suggests (lkikely to be a large number of them). Reboot the PC.

6. You now have a clean PC that just needs the Driver Loading. Download 11.10 from the AMD Site, and install. If it does not install first time, re-run the install over the top of the last attempt. When completed, reboot the PC.

7. When PC sttles with Driver Loaded, re-run registry cleaner (twice), then reboot again.

8. Download latest BOINC, deinstall from Control Panel, and resinstall from downloaded file.

You should now have a clean install, run BOINC and report back what happened.

Regards
Zy
ID: 1403 · Rating: 0 · rate: Rate + / Rate - Report as offensive
PAULIN Laurent

Send message
Joined: 7 Nov 11
Posts: 7
Credit: 57,800
RAC: 0
Message 1412 - Posted: 14 Nov 2011, 13:02:48 UTC

Thank you for these instructions, Zydor !!

I did all that you wrote.

Nothing's changing !!

Collatz and Milkyway are always running fine but not Moo.

It's really very strange.

Another idea ?
ID: 1412 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 1414 - Posted: 14 Nov 2011, 16:01:19 UTC - in response to Message 1412.  

Which driver did you load up when you went through the above?

What are the card settings you use for memory and GPU for Collatz, Milkyway & Moo ?

Regards
Zy
ID: 1414 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 1415 - Posted: 14 Nov 2011, 16:30:32 UTC - in response to Message 1414.  

Another question ....

Are you running any CPU WUs at the same time? If so which ones and how many ?
ID: 1415 · Rating: 0 · rate: Rate + / Rate - Report as offensive
PAULIN Laurent

Send message
Joined: 7 Nov 11
Posts: 7
Credit: 57,800
RAC: 0
Message 1416 - Posted: 14 Nov 2011, 18:05:13 UTC - in response to Message 1415.  

Hello.

I use CCC 11.6
Graphic cards settings: GPU 900 Mhz Memory 1300 Mhz : fine for Collatz and Milkyway

I run Enigma and WCG with all 8 threads of the CPU. It's OC to 3.07 GHz instead of original 2.8 GHz. It's fine for these two CPU projects.

Regards.
ID: 1416 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 1417 - Posted: 14 Nov 2011, 19:34:15 UTC - in response to Message 1416.  

I suspect its a mixture of a number of things - principly the CPU WUs, and the card settings. I am typing up a longer reply with more detail, but for now, set the cards to 850 GPU, and as low a memory setting as you can get, even down to 175 on the memory (dont go lower than 175 on memory), and reboot the PC. Then suspend all CPU WUs, and try a Moo WU with no CPU WU running, it should be ok.

I will explain more in about half an hour once I can get a more detailed reply sorted out - I have a couple of non-BOINC things to do first. Let the GPU WU run full length, and others following on as well, if I have not yet replied again - I will reply further in about 30 mins with a full explanation.

Regards
Zy
ID: 1417 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 1419 - Posted: 14 Nov 2011, 20:29:11 UTC
Last modified: 14 Nov 2011, 20:37:58 UTC

I am 99% convinced the problem is a mixture of your card settings, and effects of running CPU WUs with Moo. The detail will not be a short reply, please bare with me, I know its a long reply, as I suspect I need to nail and refute a couple of myths that are getting in your way :)

APPLICATIONS
There are (as such) two types of applications for any GPU Project. One is written in the card Native language, and the other is written in OpenCL. Thats true for both NVidia and AMD, although the vast majority of NVidia applications are written in its native language, CUDA. For AMD, there is an increasing mix, as AMD is actively switching to only supporting the Industry standard language - OpenCL. There are problems with the OpenCL implementations as its still relatively new (as such), but for better or for worse (I believe the better in the long run) AMD are going full ahead on OpenCL.

Moo is an OpenCL application. Collatz and Milkyway are written in the old ATI native language or NVidia CUDA, and inherently faster because of that. The difference is there to see, of the three, only Moo is OpenCL, and OpenCL has a dreaded bug that is affecting both NVidia cards and AMD cards. At present, machines running more than one GPU are being hammered by OpenCL grabbing one complete core per GPU to run the application. It definitely affects all dual GPU cards (5970 & 6990), single GPU machines are entirely unaffected. Those running two 5870 cards have been affected, but for some reason less so, non the less it needs watching out for.

Machines affected will have one core per GPU grabbed by the OpenCL application. In your case (if you are affected by the bug), you will loose two Cores to run the applications on the 2 GPUs. At present you are running 8xCPU WUs on the machine by using hyper-threading. What will happen is the CPU WUs and GPU WUs will fight for CPU time as its over comitted (8xCPU WUs plus 2xGPU WUs equalks 10 cores, you only have 8 available). That will slow down the GPU applicatiuons a lot, as the CPU applications get priority time allocated by the operating system - such over committment can also crash it in some cases.

The solution is to run less (or no) CPU WUs - maximum of six. That way your Cores are not over committed.

That way, the Milkyway and Collatz applications are ok with all cores running CPU WUs (although they are slowed a little), and Moo does not get hammered by a clash between the application and running CPU WUs.

GPU & MEMORY SETTINGS

Memory
First lets nail once and for all the memory setting myth, it causes chaos .....
The memory setting is NOT a measure of speed. Its a measure of Bandwidth. The latter means its a measure of how much data can pass at anyone time through the memory "pipe". The lower the memory setting, the less data can pass at anyone time, in the same way as a bigger water pipe carrys more water at anyone time than a smaller water pipe. The memory setting does NOT increase the speed at which data passes through the memory pipe. Of course, if the pipe is bigger, the end effect is to pass more data, but thats only because the pipe is wider, its NOT because each data bit travels faster.

The latter is very important to understand to set the optimum settings for GPU applications. Collatz has big datasets travelling back and forward between the GPU and CPU. The more that passes at anyone time the better, and therfore Collatz benefits hugely by having a high memory bandwidth - which you had (900/1300), and thats fine.

Milkyway and Moo, have very small datasets travelling around, and dont need high memory bandwidth. In fact you can set it very low and its still fine as the dataset passes through on one pass even on low memory (I run my 5970 which is 2x5870 GPUs on the same card) at 175. Increasing that does NOT speed up the Moo or Milkyway WUs, all it does is generate more heat and waste more power for no purpose.

Therefore when running Collatz, set the memory high, you had 1300, you probably could get that higher around 1375, just becareful about heat, keep an eye on the card temperature, particularly the memory voltage regulator temperatures (use GPU-Z to monitor those). For Moo & Milkyway, set the meory to 175, you will have to directly edit the ini file to get it that low, post if you dont know how to do that.

GPU Setting
All three applications will benefit from a high GPU setting, but you can get it too high for one, and run ok on the others. Collatz and Milkyway will go the highest before falling over. Its likely however that the Moo application will fall over if using the same very high setting as Collatz and Milkyway, and crash. Moo is less tollerant of high GPU settings, it doesnt need them, and if set too high, falls over. That GPU limit is lower than the other two, therefore you can get a situation where Collatz and Milkyway run fine on high GPU, but its too high for Moo, and needs to come down a bit - experiment for the setting thats right for you. As always when overclocking GPUs be VERY careful about overheating, keep the GPU below 90 degrees peak temperature, and normal running 82-86 on a 5870.

Apologies for long reply, but it cant be avoided with this many aspects all mixed in together. It sounds a lot, but its not really, once you get your settings and WU numbers tuned for each application. All will then be fine.
ID: 1419 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Chris S
Avatar

Send message
Joined: 2 Oct 11
Posts: 238
Credit: 386,592,310
RAC: 11,607
Message 1420 - Posted: 14 Nov 2011, 20:38:05 UTC

Thanks for that post Zydor, an excellent treatise for anybody to read.
I iz also got icons!



ID: 1420 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Bernt
Avatar

Send message
Joined: 26 May 11
Posts: 568
Credit: 121,524,886
RAC: 0
Message 1421 - Posted: 14 Nov 2011, 21:48:51 UTC - in response to Message 1419.  

Zydor,

Lucky we have you!!!!!
ID: 1421 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 1422 - Posted: 14 Nov 2011, 21:55:12 UTC
Last modified: 14 Nov 2011, 22:19:29 UTC

.... Moo is an OpenCL application. Collatz and Milkyway are written in the old ATI native language or NVidia CUDA, and inherently faster because of that. The difference is there to see, of the three, only Moo is OpenCL, ...


On re-reading that - a slight clarification to avoid confusion.

Moo runs inside the BOINC Wrapper. The Wrapper is an application written for BOINC that allows a core application to run using ANY language. For Moo, the Core application is made by DNet (not sure what it uses, probably C++ at a guess). However the Wrapper is written to interface with Moo using OpenCL to talk to the Wrapper, and its that the BOINC Client "sees". Its the fact that Moo uses OpenCL in talking to the Wrapper which makes Moo vulnerable to the OpenCL bug.

AMD are slowly getting at the OpenCL bug, at present AMD are testing a fix for Linux machines, no news yet on the Windows fix.

Regards
Zy
ID: 1422 · Rating: 0 · rate: Rate + / Rate - Report as offensive
John Clark

Send message
Joined: 27 Jul 11
Posts: 342
Credit: 252,653,488
RAC: 0
Message 1423 - Posted: 14 Nov 2011, 21:55:34 UTC
Last modified: 14 Nov 2011, 21:58:26 UTC

Agreed, Chris.

But that has been Zydor over several projects ... very helpful in suggestions for cracking problems.

I persume I have not run in to the multiple GPU bug as I am running 11.3 as the driver, and intend staying with it when I upgrade the HD3850 (after some work on the server). That is more complex as I will install the 11.1 with AGP hotfix.
ID: 1423 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 1424 - Posted: 14 Nov 2011, 22:09:06 UTC

Correct - it seems the OpenCL bug creeps into the AMD equation when running AMD drivers above 11.4, so running the versions you stated, you will be unaffected by the OpenCL bug.

Regards
Zy
ID: 1424 · Rating: 0 · rate: Rate + / Rate - Report as offensive
PAULIN Laurent

Send message
Joined: 7 Nov 11
Posts: 7
Credit: 57,800
RAC: 0
Message 1426 - Posted: 15 Nov 2011, 15:22:52 UTC - in response to Message 1419.  

Hello Zydor !!

Thank you for this post.

Here are my last settings: CCC 11.6, GPU clock=850, Memory clock=900 (the minimum via CCC)
I don't know how to set memory clock at 175. Maybe, you may explain to me how to do ?

Since this morning, I'm running Moo and only Moo ( without WCG, Enigma, Collatz and Milkyway).

Please have a look to the results. Some are validated but the elapsed time is very long !!

You said in your last post that there is no problem about OpenCL with CCC 11.4 and below. Do you think that i could try CCC 11.4 ? I know that this CCC is very good for Collatz and Milkyway but what is the situation for Moo ?

Regards.

ID: 1426 · Rating: 0 · rate: Rate + / Rate - Report as offensive
John Clark

Send message
Joined: 27 Jul 11
Posts: 342
Credit: 252,653,488
RAC: 0
Message 1428 - Posted: 15 Nov 2011, 20:12:36 UTC

You can alter the core clock and memory clocks when running MSI Afterburner 2.10.

I am running my HD5850 at stock - core 755 and memory underclocked by 19% to 913

ID: 1428 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Zydor
Avatar

Send message
Joined: 5 May 11
Posts: 233
Credit: 351,414,150
RAC: 0
Message 1431 - Posted: 16 Nov 2011, 11:30:23 UTC

Here are my last settings: CCC 11.6, GPU clock=850, Memory clock=900 (the minimum via CCC)
I don't know how to set memory clock at 175. Maybe, you may explain to me how to do ?


You asked a detailed question - its not going to be a short post I'm afraid :)

Utilities such as Afterburner and CCC do not directly change the card, they do it via the card ini or config files. The utilities are just a frontend to allow the user to change various things without causing permanent drama (as such). They also provide a mechanism to allow developers to keep users under control in various states of enthusiasm - ie stop 'em going nuts :) By limiting what the front end application changes (ccc and afterburner for example) the developer can limit potential damage to the card by silly changes to its configuration. The outline below is directly changing the config file without the protection of the frontend applications. Whilst the procedure is perfectly safe, you must be VERY careful not to change ANYTHING else other than the intended changes. The insertion of just one space in the wrong place can crash the config file.

The good news is that just rebooting will reset the card to default values and recover mistakes - provided the mistake did not insert a damn silly value that melted it before rebooting. As always the cautionary note above sounds worse than in reality it is. Have no concerns doing the changes, just be careful, slow and easy is way better than fast and stupid. There are no bonus points for speedy changes, only melted cards as a reward for ego driven stupidity. Harsh statement I know, I just want to drive home that its a safe procedure, just be sensible.
You will see other values in the config file other than what is described below - leave them alone for now - do NOT get carried away, just one thing at a time ..... please :)

The AMD card configurations are driven from: Users/YOUR USER NAME/AppData/Local/ATI/ACE

In that directory you will see (usually) four files. The key config file is profiles.xml. That file provides the default config for the card, do not attempt to change it under ANY circumstances, the card's internal software will maintain it, and will reverse any changes you try to make anyway. Just be aware its there, and what it does.

The file you will change is in: Users/YOUR USER NAME/AppData/Local/ATI/ACE/Profiles

Initially that directory probably does not even exist if you have never set any Presets inside the CCC since installing the current card driver. Open up CCC and save a Preset called "850-175" inside the CCC.) Those reading this with other cards - be careful .... the value 850 is his current GPU speed setting in CCC, it will differ from card to card and user as to user what the setting is). Now look back at /.../ACE/profiles, in there you will now see a file called "850-175.xml" - that is the preset you just saved. You can call it what the hell you like, I just use the card gpu and memory settings as the file name as its easier to find the correct preset in CCC.

Open up Notepad (dont mess around with other editors USE notepad - notepad will not insert hidden charactors in the file and crash it, other editors will - so use notepad whether or not you hate it!), navigate to /..../ACE/Profiles and open up the file 850-175.xml. Inside you will see a section very similar to the one below (which is part of my currect 790-175.xml. In my setup there are four sections such as this as its for a dual 5970, the other three sections being CoreClockTarget_1, CoreClockTarget_2, and CoreClockTarget_3.

For each GPU there is one such section, look for all of them, dont just change one section only. Do not get clever and try changing one GPU value different to the other sections - just keep it simple, and change all parts to the same values I describe next.

Go to the bolded values (shown below) inside the MemoryClockTarget_ section(s), and change whatever is there to 17500, be careful to insert the correct number - for example inserting 175000 would be very painful as it will set memory to 1,750Mhz !! Not the best idea on the Block :) If you make a mistake, and are unsure whether or not you have corrected that mistake, or for whatever reason you are not comfortable with a change, just exit without saving, no harm done, then go back and reopen it.

DO NOT set the memory lower than the card idle setting for memory (I strongly suggest the lowest for your actual memory setting is 17500) on mine the idle speed shown as 15700 on this line:

<Property name="Want_0" value="15700" />

Once you have changed the values (all three in each section to 17500), just save the file - dont mess with stating file extention, just save it as is, notepad will then save it as an xml file. If you find it has saved as a .txt file, just go back, open it in notepad and save again with the explicit extention ie: 850-175.xml

<Feature name="CoreClockTarget_0">
<Property name="Want_0" value="15700" />
<Property name="Want_1" value="55000" />
<Property name="Want_2" value="79000" />
</Feature>
<Feature name="PowerControl_0">
<Property name="Want" value="0" />
</Feature>
<Feature name="MemoryClockTarget_0">
<Property name="Want_0" value="17500" />
<Property name="Want_1" value="17500" />
<Property name="Want_2" value="17500" />
</Feature>
<Feature name="CoreVoltageTarget_0">
<Property name="Want_0" value="950" />
<Property name="Want_1" value="1038" />
<Property name="Want_2" value="1050" />
</Feature>
<Feature name="MemoryVoltageTarget_0">
<Property name="Want_0" value="0" />
<Property name="Want_1" value="0" />
<Property name="Want_2" value="0" />
</Feature>

You now have a config xml file for your card(s) that will drive the card(s) as 850GPU, and 175 Memory. Go to CCC, open it, and you will see in the preset section the file name 850-175 that you just created. Click it, and it will reset the card(s) to the values 850/175. Double check it changed, you may have to click the preset a second time. Inside CCC just go to the drop down box for the GPU(s) if you have more than one GPU, and check all changed ok.

At that point you will find the memory figure for the GPUs in CCC is displayed as 175 minimum value .... think about that .... it now means you can safely set other memory values above 175 from inside CCC, and not have to do it manually via the eml file. The reason for that is one of the three values you set inside 859-175, changes the minimum value you can see in CCC for card memory.

I would suggest you set a preset now for 725-175, that can be there for a very quick downclock to low values if you ever need it - card suddenly overheats - or whatever - not essential, just a nice safe precaution should you ever need it.

Be aware that if you completely deinstall the card driver (via driver sweeper for example) you will lose any profiles you made, and will need to redo them. If you deinstall via Control Panel, the custom profiles will survive.

Remember this is not a magic bullet, all thats happened is to give you the ability to lower card memory setting for each GPU, it does not cure your ills. It will however enable you to run the card with lower memory and save power, or run low memory saving heat - thus allowing (possibly) a higher GPU setting that (maybe) was previousy not possible as the card was overheating.

Thats it - you are done :)
ID: 1431 · Rating: 0 · rate: Rate + / Rate - Report as offensive
1 · 2 · Next

Message boards : Number crunching : Big issue with HD5870 !!


 
Copyright © 2011-2024 Moo! Wrapper Project