LinuxSA Mailing list archives
Index:
[thread]
[date]
[subject]
[author]
[stats]
From: Damien Uern <carrigan_2606@optusnet.com.au>
To : Rick Harris <rickharris@mightylegends.zapto.org>
Linux SA <linuxsa@linuxsa.org.au>
Date: Tue, 15 Jul 2003 22:07:18 +0930
Re: Argh! Hard Drive Dying?
Hey,
Argh!!! Not another dud! I downloaded smartmontools and had a look at the
SMART information reported from the drive, it appears the drive itself knows
about these errors, so it is most certainly not a linux or chipset problem.
Here are some snippets from the output (formatting may be wonky):
Device Model: ST380021A
Serial Number: 3HV4A2P3
Firmware Version: 3.75
ATA Version is: 5
ATA Standard is: Unrecognized. Minor revision code: 0x00
SMART support is: Enabled
SMART overall-health self-assessment test result: PASSED
Vendor Specific SMART Attributes with Thresholds:
Revision Number: 10
Attribute Flag Value Worst Threshold Raw Value
( 1)Raw Read Error Rate 0x000f 073 065 034 239826243
( 3)Spin Up Time 0x0003 082 082 000 0
( 4)Start Stop Count 0x0032 100 100 020 187
( 5)Reallocated Sector Ct 0x0033 100 100 036 7
( 7)Seek Error Rate 0x000f 072 060 030 18985135
( 9)Power On Hours 0x0032 099 099 000 951
( 10)Spin Retry Count 0x0013 100 100 097 0
( 12)Power Cycle Count 0x0032 100 100 020 195
(194)Temperature 0x0022 032 052 000 32
(195)Hardware ECC Recovered 0x001a 073 065 000 239826243
(197)Current Pending Sector 0x0012 100 100 000 3
(198)Offline Uncorrectable 0x0010 100 100 000 3
(199)UDMA CRC Error Count 0x003e 200 200 000 0
(200)Unknown Attribute 0x0000 100 253 000 0
(202)Unknown Attribute 0x0032 100 253 000 0
ATA Error Count: 166 (only the most recent five errors are shown below)
Acronyms used below:
DCR = Device Control Register
FR = Features Register
SC = Sector Count Register
SN = Sector Number Register
CL = Cylinder Low Register
CH = Cylinder High Register
D/H = Device/Head Register
CR = Content written to Command Register
ER = Error register
STA = Status register
Error Log Structure 1:
Error occurred at disk power-on lifetime: 951 hours
When the command that caused the error occurred, the device was active or
idle.
After command completion occurred, registers were:
ER:40 SC:00 SN:cd CL:2e CH:58 D/H:e5 ST:51
Sequence of commands leading to the command that caused the error were:
DCR FR SC SN CL CH D/H CR Timestamp
00 00 08 cb 2e 58 e5 c8 14197.851
00 00 08 e3 08 56 e5 ca 14197.851
00 00 08 6b 42 55 e5 ca 14197.851
00 00 08 f3 5e 54 e5 ca 14197.850
00 00 08 6b a5 4f e5 ca 14197.850
######## (there are 4 more errors logs shown)
Hmm I'll have to see what 0x40 means in the Error register. Oh man what a
pain.
Thanks,
Damien
On Tue, 15 Jul 2003 09:27 pm, Rick Harris wrote:
> Hi Damien,
>
> Back-up any important, non-corrupted data right now.
> The drive is on it's way out.
>
> It is possible to partition out bad HD areas, & these can be found with
> the 'badblocks' utility.
>
> However, chances are it's going to be a no-win, as more will probably
> occur.
>
> Your HD temperature looks fine. Another dud perhaps ?
>
> Your chipset is rock solid, as is reiserfs.
>
> Surprised that the Western Digital carked it, they're normally very
> good.
>
> Regards,
> Rick
>
> On Tue, 2003-07-15 at 20:34, Damien Uern wrote:
> > Hey,
> >
> > I've been getting problems reading some large (say 40MB or greater)
> > binary files occasionally on my box, I've been blaming Reiserfs for file
> > corruption, but now I'm not so sure. The latest is when trying to access
> > some part of the rpm database (which would be huge by now) I get this
> > output from the program:
> >
> > [root@thebeast rpms]# rpm -i --test vice-1.11-4mdk.i586.rpm
> > rpmdb: read: 0x400c25fc, 4096: Input/output error
> > error: db4 error(5) from dbcursor->c_get: Input/output error
> > error: error(5) getting "libncurses.so.5" records from Providename index
> > rpmdb: read: 0x400d097c, 4096: Input/output error
> > error: db4 error(5) from dbcursor->c_get: Input/output error
> > error: error(5) getting "libpng.so.3" records from Providename index
> > error: failed dependencies:
> > libncurses.so.5 is needed by vice-1.11-4mdk
> > libpng.so.3 is needed by vice-1.11-4mdk
> >
> > I've had similar input/output error messages from other programs (e.g
> > cp). Looking in /var/log/messages I see this:
> >
> > Jul 15 20:14:56 thebeast kernel: hda: dma_intr: status=0x51 { DriveReady
> > SeekComplete Error }
> > Jul 15 20:14:56 thebeast kernel: hda: dma_intr: error=0x40 {
> > UncorrectableError }, LBAsect=89665229, sector=665064
> > Jul 15 20:14:56 thebeast kernel: end_request: I/O error, dev 03:07 (hda),
> > sector 665064
> >
> > ... (8 more times)
> >
> > This harddrive I got replaced on warranty once before, it used to be a
> > western digital caviar 40GB 7200rpm (it made strange clicking noises
> > occasionally), but now it's a seagate 80GB 7200rpm drive:
> >
> > hda: ST380021A, ATA DISK drive
> >
> > Could the problem also be my motherboard (seems strange to have 2 hard
> > drive failures in 1.5 years). This is my IDE chipset (MSI mobo - VIA
> > kt266):
> >
> > VP_IDE: VIA vt8233 (rev 00) IDE UDMA100 controller on pci00:11.1
> >
> > Could my computer perhaps not have enough cooling in it (temperature
> > hovers around 58-59 degrees with the case OFF and CPU idle!). Mobo temp
> > at this very moment is 41 degrees, CPU is 58 (hovers around 61-62 idle
> > when case on).
> >
> > I'm tired of having a flaky computer!! Any help would be most
> > appreciated.
--
LinuxSA WWW: http://www.linuxsa.org.au/ IRC: #linuxsa on irc.freenode.net
To unsubscribe from the LinuxSA list:
mail linuxsa-request@linuxsa.org.au with "unsubscribe" as the subject
Index:
[thread]
[date]
[subject]
[author]
[stats]
Return to the LinuxSA Mailing List Information Page