[hard drive] New HD failing Page 2
Links: home · search · speed test · login · more ·

 
Links: Reply New Topic
Forums » Technical » Computer Hardware Help » [hard drive] New HD failing
page: 1 · 2
koitsu @ 31st Oct 04:02AM:
Re: [hard drive] New HD failing

smartmontools for Windows can't talk to USB devices, because it (smartmontools) lacks the code to passthrough raw SMART requests via USB. I believe there are some Windows programs out there which can get SMART data from a USB-connected drive, but they likely don't provide all of the information smartmontools does.

With regards to the drive connected via SATA or PATA -- please re-run the command I asked you to using the "-a" flag (lowercase a, not capital a). I need to see the SMART event log.
--
Making life hard for others since 1977.
I speak for myself and not my employer/affiliates of my employer.

reply
Chip @ 31st Oct 07:53AM:
Re: [hard drive] New HD failing

said by Oleg :

1T for $25 where did you find it so cheap?
Well i let it warm up and it stays at 43C.
Max operating temp for that drive is 60c. Your ok
--
The three great strategies for obscuring an issue are to introduce irrelevancies, to arouse prejudice, and to excite ridicule--Bergen Evans

reply
koitsu @ 1st Nov 08:32PM:
Re: [hard drive] New HD failing

Oleg sent me a private message with the full SMART output for his C: drive:

»www.filedropper.com/drivec

Rather than respond privately, I felt disclosing how to read the stats publicly would benefit others. So, here's the analysis:


This tells me the disk is SATA, and that the drive model/revision *is not* in smartmontools' internal database. The fact that it's not in the database means some Attributes may not be decoded correctly; I'll talk more about this at the end of my post.

Let's check the overall "global status" for SMART:


Okay, good.

Let's look for signs of bad sectors on the disk:


Attribute 5 (Reallocated_Sector_Ct) tells you how many times the drive has detected a bad sector and transparently (and successfully) remapped it to a spare. Your OS wouldn't know of this remapping, but the OS may have seen an I/O error or timeout during the remapping phase (it depends -- too many cases to discuss here). Either way, RAW_VALUE is 0, so that indicates this hasn't happened.

Attribute 196 (Reallocated_Event_Count) tells you how many times the drive detected a bad sector, regardless of remapping being successful or failing. Again, RAW_VALUE is 0, so things look good.

Attribute 197 (Current_Pending_Sector) tells you how many sectors on the drive are pending remapping, or at least that's how I understand this attribute to function. :-) Again, RAW_VALUE is 0 -- good.

Attribute 198 (Offline_Uncorrectable) tells you how many times the drive has failed to remap sectors it thought were bad. The important thing to note about this Attribute is that its UPDATED column is labelled Offline. This indicates that the value will only be updated if you run an offline test. Don't let the word "offline" scare you -- you can actually run a test while the disk is in use (I/O operations take priority over SMART tests); you can run such a test by doing "smartctl -t short {disk}" or if you want a more thorough test, use "smartctl -t long {disk}". "short" is sufficient to update this attribute most of the time.

Let's check thermals:


Attribute 194 (Temperature_Celsius) indicates your drive is presently running at 31C. The drive, during its lifetime, has seen a minimum of 10C and maximum of 49C. It's important to note that the Lifetime Min/Max values shown are sometimes not cleared when a drive leaves the factory, and other times aren't decoded correctly (when the drive isn't in smartmontools' internal database; again, more on that later). Either way, 31C is excellent (cold).

Let's check drive spin-up/spin-down and power-cycle counts, and overall drive lifetime, since you state this is a brand new disk:


Attribute 9 (Power_On_Hours) indicates the drive has been powered up during its lifetime for a total of 3442 hours (143 working days). This is very surprising for a drive that you state is brand new.

Attribute 4 (Start_Stop_Count) tells you how many times the drive has been told to spin down or spin up, including when the system has been powered on or shut off. RAW_VALUE says 1626 times, which is again surprising for a brand new drive.

Attribute 12 (Power_Cycle_Count) tells you how many times the drive has been powered cycled (e.g. how many times the drive has been powered on, such as when you turn your PC on but not necessarily when you hit Reset). RAW_VALUE says the drive has been been powered on 1610 times -- same thing as before (surprising).

I do not know how to read Attributes 192 and 193. :-)

First question: is this machine being powered off regularly, or going into suspend/sleep mode (not the monitor -- I mean the entire machine), or do you have your Power Policy Setting in Windows set up to power down the disks after N minutes of being idle? If so -- okay, those values are normal then.

Second question: when did you purchase this disk, and how long has it been in use? You stated "New HD" in your post, that's why I ask. If this drive was purchased within the past couple weeks as new, then whoever sold you the drive sold you something that had been used in the past -- meaning you got sold a used disk.

Now for the bad news...


Attribute 199 (UDMA_CRC_Error_Count) tells you how many times the drive has detected a CRC error for any sort of data I/O on the drive. RAW_VALUE says 86 CRC errors have been witnessed during the lifetime of this drive. Because we don't know if this drive is TRULY brand new, it may have happened in the past (e.g. when it wasn't in your machine).

These errors are correctable -- that is to say, if the drive witnesses a CRC error, it tells the SATA controller "hey that last command/data you sent me was corrupt, try it again". CRC (checksum) errors can happen for all kinds of reasons, but the most common reason is bad cables or cables which are too long. They could also be the sign of a SATA controller going bad, or crazy-insane interference inside of your case (I've never heard of this happening, but I suppose it's possible).

Next you'll understand why I asked you to use "-a" to view the SMART error log:


This is an indication of some form of internal SMART error witnessed on the drive. Sadly, smartmontools does not know how to decode the SMART error log commands for this particular drive. Usually there are READ_DMA or WRITE_DMA indications in the log, but in this case we have "VENDOR SPECIFIC".

Don't let the fact that there are 5 lines shown here scare you; there are numerous commands that get sent to/from the drive during an I/O transfer. The SMART error log helps tell you what happened shortly prior to the error (e.g. "I was told to do this, I did this, then I was told to do this, and the error happened").

To decode these, I would have to get a full technical specifications document from Hitachi and decode the hexadecimal values shown on the left side. Drive vendors do not usually release this for consumer (PATA/SATA) drives, but do provide it for SCSI drives (and the error reporting mechanism in SCSI is entirely different anyway).

What's of importance is that there is a single error seen, and it happened at the 2057 lifetime hours mark. Remember that Attribute 9 (Power_On_Hours) was at 3442 when you ran smartctl; so, the error happened in the past.

Finally, discussing the data stored in SMART and how to read data:

Relying on RAW_VALUE is not always possible or wise. There is no "standard" method to reading this data, but smartmontools tries to do its best (by having a database of drive models/revisions and knowing how to decode the data for each drive), which is more than I can say for other SMART software.

You see, the data stored in RAW_VALUE can be in any format -- there is no industry standard. There's a recommended format for them, but vendors often store the data in a proprietary format. For example, some Seagate disks will show a very high RAW_VALUE for Attribute 1 (Raw_Read_Error_Rate), which people see and "freak out" over when actually it's 100% normal -- just that Bruce Allen (author of smartmontools) hasn't managed to figure out the encoding format for the data yet.

In this particular case, it's best to look at the fields labelled VALUE, WORST, and THRESH. These are what are called "adjusted values", indicating a formula is applied to the RAW_VALUE data (since the drive firmware itself knows how to decode the data) and is then turned into something that's considered an "health status" value.

VALUE is what the current adjusted value is, WORST is the worst value ever seen, and THRESH is the threshold where if reached the overall-health self-assessment test result will go from PASSED to something like FAILED. I can show you what a failure look like if you want.

The first thing you'll notice about THRESH is that they're set absurdly low. For example, your drive has 86 CRC errors, yet VALUE is 200, WORST is 200, and THRESH is 000.

What does this mean? It means drive manufacturers set their SMART health thresholds to absolutely absurd values; a failing drive will often show PASSED for its overall-health self-assessment. Isn't it great? Hitachi, Seagate, Western Digital, Fujitsu, blah blah blah... they all are like this. The thresholds are set way too low for them to be effective.

So what's the status of your C: drive?

It honestly looks fine, although I don't know how "new" it is. I'd say this drive has been in use for quite some time. You might want to replace your SATA cables "just in case", but otherwise your drive shows no sign of problems.

I have a feeling a ton of people are going to thumbs-up this post. *laugh* I spend a lot of time with disks since I'm a UNIX SA, and "messing with disks" is a kind of forte of mine.

--
Making life hard for others since 1977.
I speak for myself and not my employer/affiliates of my employer.

reply
Vamp @ 1st Nov 08:38PM:
Re: [hard drive] New HD failing

The only data in the SMART data you need to be concerned about is the raw data, the thresholds and limits are just BS values IMO.

As long as reallocated sector count and read/write error counts are at 0 you have nothing to worry about, when you start seeing any raw data for realloc then you can pretty much recover whatever data you can then toss the drive because it's failing.


--
20/20 FIOS || MSN Msgr: scott001^gmail_com

reply
Oleg @ 1st Nov 08:51PM:
Re: [hard drive] New HD failing

Thanks so much koitsu for your answers.This test was for the old HD,because i was unable to do test on the new drive connected using USB cable.

reply
koitsu @ 2nd Nov 12:47AM:
Re: [hard drive] New HD failing

said by Vamp :

The only data in the SMART data you need to be concerned about is the raw data, the thresholds and limits are just BS values IMO.
This is simply untrue. When the RAW_VALUE portion looks absurd or unreasonable (due to what I described -- see the bottom portion of my long explanation), the only thing you have to go off of are VALUE/WORST/THRESH. That's simply reality.
--
Making life hard for others since 1977.
I speak for myself and not my employer/affiliates of my employer.

reply

Thank you for using lo-fi dslreports.com - report bugs
© 99-2009 silver matrix LLC