Symptoms of a dying graphic card

All started in December last year. I saw a red and a green pixel on my monitor. Moving the window containing the wrong-colors pixels corrected them. The wrong-colored pixels reappear from time to time in increased quantity. Then a second symptom appeared: The screen went black and came back one second later. This was triggered by moving a window and scrolling. I wasn’t sure what the reason was. Either it was a bug somewhere in the X stack or some hardware was dying. dmesg showed multiple problems with the radeon driver:

[283808.667454] radeon 0000:01:00.0: ffff88021f815c00 unpin not necessary
[283808.667820] radeon 0000:01:00.0: GPU softreset
[283808.667823] radeon 0000:01:00.0:   R_008010_GRBM_STATUS=0xE57024A4
[283808.667825] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2=0x00330302
[283808.667826] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS=0x200000C0
[283808.667832] radeon 0000:01:00.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
[283808.682844] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
[283808.698840] radeon 0000:01:00.0:   R_008010_GRBM_STATUS=0x00003028
[283808.698843] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2=0x00000002
[283808.698845] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS=0x200000C0
[283808.699845] radeon 0000:01:00.0: GPU reset succeed
[283808.717570] [drm] Clocks initialized !
[283808.765829] [drm] ring test succeeded in 0 usecs
[283808.765838] [drm] ib test succeeded in 1 usecs
[283808.765840] [drm] Enabling audio support
[283812.521265] radeon 0000:01:00.0: GPU lockup CP stall for more than 1000msec
[283812.521269] ————[ cut here ]————
[283812.521294] WARNING: at /build/buildd/linux-2.6.35/drivers/gpu/drm/radeon/radeon_fence.c:235 radeon_fence_wait+0x365/0x3d0 [radeon]()
[283812.521297] Hardware name:
[283812.521299] GPU lockup (waiting for 0x00AE670A last fence id 0x00AE6705)
[283812.521301] Modules linked in: btrfs zlib_deflate crc32c libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs exportfs reiserfs nls_utf8 udf ip6table_filter ip6_tables binfmt_misc ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp kvm_intel kvm parport_pc ppdev snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_emu10k1 snd_ac97_codec ac97_bus snd_pcm snd_page_alloc snd_util_mem snd_hwdep snd_seq_midi radeon snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device ttm snd pl2303 drm_kms_helper coretemp usbserial joydev psmouse soundcore drm serio_raw intel_agp i2c_algo_bit lp parport hid_cherry usbhid hid firewire_ohci firewire_core usb_storage crc_itu_t e1000e ahci libahci pata_marvell
[283812.521360] Pid: 2194, comm: compiz Tainted: G        W   2.6.35-24-generic #42-Ubuntu
[283812.521362] Call Trace:
[283812.521370]  [<ffffffff8106089f>] warn_slowpath_common+0x7f/0xc0
[283812.521374]  [<ffffffff81060996>] warn_slowpath_fmt+0x46/0x50
[283812.521390]  [<ffffffffa01bd775>] radeon_fence_wait+0x365/0x3d0 [radeon]
[283812.521394]  [<ffffffff8107f730>] ? autoremove_wake_function+0x0/0x40
[283812.521410]  [<ffffffffa01bdf71>] radeon_sync_obj_wait+0x11/0x20 [radeon]
[283812.521418]  [<ffffffffa01751a3>] ttm_bo_wait+0x103/0x1c0 [ttm]
[283812.521435]  [<ffffffffa01d4e1a>] radeon_gem_wait_idle_ioctl+0x9a/0x150 [radeon]
[283812.521447]  [<ffffffffa010f433>] drm_ioctl+0x463/0x520 [drm]
[283812.521465]  [<ffffffffa01d4d80>] ? radeon_gem_wait_idle_ioctl+0x0/0x150 [radeon]
[283812.521470]  [<ffffffff81162f0d>] vfs_ioctl+0x3d/0xd0
[283812.521473]  [<ffffffff811637e1>] do_vfs_ioctl+0x81/0x340
[283812.521477]  [<ffffffff811535f1>] ? vfs_read+0x181/0x1a0
[283812.521480]  [<ffffffff81163b21>] sys_ioctl+0x81/0xa0
[283812.521484]  [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
[283812.521487] —[ end trace 6d5e03bab743abfa ]—
[283812.521493] [drm] Disabling audio support
[283812.525575] [drm:radeon_ib_schedule] *ERROR* radeon: couldn’t schedule IB(10).
[283812.525579] [drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[283812.527021] [drm:radeon_ib_schedule] *ERROR* radeon: couldn’t schedule IB(11).
[283812.527024] [drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[283812.527921] [drm:radeon_ib_schedule] *ERROR* radeon: couldn’t schedule IB(12).
[283812.527923] [drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !

The time between going black reduces every time. The system wasn’t usable any more in the end. After testing that the screen was going blank on other systems like a live CD of Ubuntu and a not noteworthy proprietary system, it was clear that some hardware component was dying. My first assumption was verified after replacing it: My graphic card, a Radeon HD 4670, died. This was three weeks ago and around 25 month after I bought the graphics card. So I didn’t have to worry if replacing the fan voided the guarantee, because the guarantee lasts only 24 month.

Conclusion:

  1. Not every error is a software bug.
  2. The hardware component with the highest failure rate is the graphics card, followed by the motherboard. Four graphics cards died last year (in four different systems owned by four different people, in three different households).
  3. Not every error is a software bug.
About these ads

8 thoughts on “Symptoms of a dying graphic card

  1. I experienced a graphic card failure too, with symptoms roughly similar, but in a really short time, about several hours: corrupted display, then no display at all.

  2. I’ve had a bad drive cable and a dying hard drive, two separate incidents, that I began thinking was a software bug. Now I trend towards assuming the hardware is at fault before submitting a bug report.

  3. Funny conclusion you got to. I guess that it is because our usage patterns are very different – Making a bit of memory, in the ~25 years I have had a computer, I have experienced (in approximate order of frequency) dead keyboards, floppy drives, hard disks, modems, network cards, motherboards, memory, various ports or I/O cards… But so far, I’ve never seen the demise of a graphics card.

  4. You must be running kernel mode setting. I get errors similar to “[283812.525575] [drm:radeon_ib_schedule] *ERROR* radeon: couldn’t schedule IB(10)” only when running under KMS (but not under user mode setting). It makes me wonder whether UMS is actually more robust WRT hardware errors than KMS. My uptime under KMS is measured in weeks, while it goes up to months under UMS.

Comments are closed.