LinuxSA Mailing list archives

Index: [thread] [date] [subject] [author] [stats]
  From: Damien Uern <carrigan_2606@optusnet.com.au>
  To  : Rick Harris <rickharris@mightylegends.zapto.org>
Linux SA <linuxsa@linuxsa.org.au> Date: Tue, 15 Jul 2003 22:07:18 +0930

Re: Argh! Hard Drive Dying?

Hey,

Argh!!! Not another dud! I downloaded smartmontools and had a look at the 
SMART information reported from the drive, it appears the drive itself knows 
about these errors, so it is most certainly not a linux or chipset problem. 
Here are some snippets from the output (formatting may be wonky):

Device Model:     ST380021A
Serial Number:    3HV4A2P3
Firmware Version: 3.75
ATA Version is:   5
ATA Standard is:  Unrecognized. Minor revision code: 0x00
SMART support is: Enabled

SMART overall-health self-assessment test result: PASSED

Vendor Specific SMART Attributes with Thresholds:
Revision Number: 10
Attribute                    Flag     Value Worst Threshold Raw Value
(  1)Raw Read Error Rate     0x000f   073   065   034       239826243
(  3)Spin Up Time            0x0003   082   082   000       0
(  4)Start Stop Count        0x0032   100   100   020       187
(  5)Reallocated Sector Ct   0x0033   100   100   036       7
(  7)Seek Error Rate         0x000f   072   060   030       18985135
(  9)Power On Hours          0x0032   099   099   000       951
( 10)Spin Retry Count        0x0013   100   100   097       0
( 12)Power Cycle Count       0x0032   100   100   020       195
(194)Temperature             0x0022   032   052   000       32
(195)Hardware ECC Recovered  0x001a   073   065   000       239826243
(197)Current Pending Sector  0x0012   100   100   000       3
(198)Offline Uncorrectable   0x0010   100   100   000       3
(199)UDMA CRC Error Count    0x003e   200   200   000       0
(200)Unknown Attribute       0x0000   100   253   000       0
(202)Unknown Attribute       0x0032   100   253   000       0

ATA Error Count: 166 (only the most recent five errors are shown below)

Acronyms used below:
DCR = Device Control Register
FR  = Features Register
SC  = Sector Count Register
SN  = Sector Number Register
CL  = Cylinder Low Register
CH  = Cylinder High Register
D/H = Device/Head Register
CR  = Content written to Command Register
ER  = Error register
STA = Status register

Error Log Structure 1:
Error occurred at disk power-on lifetime: 951 hours
When the command that caused the error occurred, the device was active or 
idle.
After command completion occurred, registers were:
ER:40 SC:00 SN:cd CL:2e CH:58 D/H:e5 ST:51
Sequence of commands leading to the command that caused the error were:
DCR   FR   SC   SN   CL   CH   D/H   CR   Timestamp
 00   00   08   cb   2e   58    e5   c8     14197.851
 00   00   08   e3   08   56    e5   ca     14197.851
 00   00   08   6b   42   55    e5   ca     14197.851
 00   00   08   f3   5e   54    e5   ca     14197.850
 00   00   08   6b   a5   4f    e5   ca     14197.850


######## (there are 4 more errors logs shown)

Hmm I'll have to see what 0x40 means in the Error register. Oh man what a 
pain.

Thanks,

Damien

On Tue, 15 Jul 2003 09:27 pm, Rick Harris wrote:
> Hi Damien,
>
> Back-up any important, non-corrupted data right now.
> The drive is on it's way out.
>
> It is possible to partition out bad HD areas, & these can be found with
> the 'badblocks' utility.
>
> However, chances are it's going to be a no-win, as more will probably
> occur.
>
> Your HD temperature looks fine. Another dud perhaps ?
>
> Your chipset is rock solid, as is reiserfs.
>
> Surprised that the Western Digital carked it, they're normally very
> good.
>
> Regards,
> Rick
>
> On Tue, 2003-07-15 at 20:34, Damien Uern wrote:
> > Hey,
> >
> > I've been getting problems reading some large (say 40MB or greater)
> > binary files occasionally on my box, I've been blaming Reiserfs for file
> > corruption, but now I'm not so sure. The latest is when trying to access
> > some part of the rpm database (which would be huge by now) I get this
> > output from the program:
> >
> > [root@thebeast rpms]# rpm -i --test vice-1.11-4mdk.i586.rpm
> > rpmdb: read: 0x400c25fc, 4096: Input/output error
> > error: db4 error(5) from dbcursor->c_get: Input/output error
> > error: error(5) getting "libncurses.so.5" records from Providename index
> > rpmdb: read: 0x400d097c, 4096: Input/output error
> > error: db4 error(5) from dbcursor->c_get: Input/output error
> > error: error(5) getting "libpng.so.3" records from Providename index
> > error: failed dependencies:
> >         libncurses.so.5   is needed by vice-1.11-4mdk
> >         libpng.so.3   is needed by vice-1.11-4mdk
> >
> > I've had similar input/output error messages from other programs (e.g
> > cp). Looking in /var/log/messages I see this:
> >
> > Jul 15 20:14:56 thebeast kernel: hda: dma_intr: status=0x51 { DriveReady
> > SeekComplete Error }
> > Jul 15 20:14:56 thebeast kernel: hda: dma_intr: error=0x40 {
> > UncorrectableError }, LBAsect=89665229, sector=665064
> > Jul 15 20:14:56 thebeast kernel: end_request: I/O error, dev 03:07 (hda),
> > sector 665064
> >
> > ... (8 more times)
> >
> > This harddrive I got replaced on warranty once before, it used to be a
> > western digital caviar 40GB 7200rpm (it made strange clicking noises
> > occasionally), but now it's a seagate 80GB 7200rpm drive:
> >
> > hda: ST380021A, ATA DISK drive
> >
> > Could the problem also be my motherboard (seems strange to have 2 hard
> > drive failures in 1.5 years). This is my IDE chipset (MSI mobo - VIA
> > kt266):
> >
> > VP_IDE: VIA vt8233 (rev 00) IDE UDMA100 controller on pci00:11.1
> >
> > Could my computer perhaps not have enough cooling in it (temperature
> > hovers around 58-59 degrees with the case OFF and CPU idle!). Mobo temp
> > at this very moment is 41 degrees, CPU is 58 (hovers around 61-62 idle
> > when case on).
> >
> > I'm tired of having a flaky computer!! Any help would be most
> > appreciated.

-- 
LinuxSA WWW: http://www.linuxsa.org.au/ IRC: #linuxsa on irc.freenode.net
To unsubscribe from the LinuxSA list:
  mail linuxsa-request@linuxsa.org.au with "unsubscribe" as the subject


Index: [thread] [date] [subject] [author] [stats]
Return to the LinuxSA Mailing List Information Page