Thursday, December 16, 2010

Filesystem read-only caused ORA-01034: ORACLE not available

Introduction:
Usually ORA-01034: ORACLE not available means either db instance is not up, or ORACLE_HOME or ORACLE_SID is not set correctly from the environment you're trying to connect the db instance.

For details, see http://www.freelists.org/post/oracle-l/fixing-a-bad-oracle-install,1 
Oracle uses a proprietary algorithm that combines the ORACLE_HOME and ORACLE_SID to come up with a shared memory key, which is used at shared memory segment creation time,
i.e., when the SGA is allocated.  After that, further bequeath connections must
have the same ORACLE_HOME and ORACLE_SID defined, so that they can define the
same key value, and use it to attach to that existing SGA.  If the ORACLE_HOME
and/or ORACLE_SID is set  incorrectly, the key value will be calculated
incorrectly, and the server process will not be able to attach to the SGA
shared memory segments.


Problem:
Recently we had the same error on QA environment
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Linux-x86_64 Error: 2: No such file or directory


Root Cause:
The root cause is not caused by above 2 reasons, but from "Remounting filesystem read-only".


[root@mvnowdb03 ~]# cat /var/log/messages
Dec 15 10:03:14 mvnowdb03 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 15 10:03:14 mvnowdb03 kernel: ata1.00: BMDMA stat 0x25
Dec 15 10:03:14 mvnowdb03 kernel: ata1.00: cmd 35/00:20:f8:18:3d/00:00:2d:00:00/e0 tag 0 dma 16384 out
Dec 15 10:03:14 mvnowdb03 kernel:          res 51/10:20:f8:18:3d/10:00:2d:00:00/e0 Emask 0x81 (invalid argument)
Dec 15 10:03:14 mvnowdb03 kernel: ata1.00: status: { DRDY ERR }
Dec 15 10:03:14 mvnowdb03 kernel: ata1.00: error: { IDNF }
Dec 15 10:03:15 mvnowdb03 kernel: ata1.00: configured for UDMA/133
Dec 15 10:03:15 mvnowdb03 kernel: sd 0:0:0:0: SCSI error: return code = 0x08000002
Dec 15 10:03:15 mvnowdb03 kernel: sda: Current [descriptor]: sense key: Aborted Command
Dec 15 10:03:15 mvnowdb03 kernel:     Add. Sense: Recorded entity not found
Dec 15 10:03:15 mvnowdb03 kernel:
Dec 15 10:03:15 mvnowdb03 kernel: Descriptor sense data with sense descriptors (in hex):
Dec 15 10:03:15 mvnowdb03 kernel:         72 0b 14 00 00 00 00 0c 00 0a 80 00 00 00 00 00
Dec 15 10:03:16 mvnowdb03 kernel:         2d 3d 18 f8
Dec 15 10:03:16 mvnowdb03 kernel: end_request: I/O error, dev sda, sector 758978808
Dec 15 10:03:16 mvnowdb03 kernel: Buffer I/O error on device sda9, logical block 35429835
Dec 15 10:03:16 mvnowdb03 kernel: lost page write due to I/O error on sda9
Dec 15 10:03:16 mvnowdb03 kernel: Buffer I/O error on device sda9, logical block 35429836
Dec 15 10:03:16 mvnowdb03 kernel: lost page write due to I/O error on sda9
Dec 15 10:03:16 mvnowdb03 kernel: Buffer I/O error on device sda9, logical block 35429837
Dec 15 10:03:16 mvnowdb03 kernel: lost page write due to I/O error on sda9
Dec 15 10:03:16 mvnowdb03 kernel: Buffer I/O error on device sda9, logical block 35429838
Dec 15 10:03:16 mvnowdb03 kernel: lost page write due to I/O error on sda9
Dec 15 10:03:16 mvnowdb03 kernel: ata1: EH complete
Dec 15 10:03:16 mvnowdb03 kernel: SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
Dec 15 10:03:16 mvnowdb03 kernel: sda: Write Protect is off
Dec 15 10:03:16 mvnowdb03 kernel: SCSI device sda: drive cache: write back
Dec 15 10:03:16 mvnowdb03 kernel: Aborting journal on device sda9.
Dec 15 10:03:16 mvnowdb03 kernel: ext3_abort called.
Dec 15 10:03:16 mvnowdb03 kernel: EXT3-fs error (device sda9): ext3_journal_start_sb: Detected aborted journal
Dec 15 10:03:16 mvnowdb03 kernel: Remounting filesystem read-only
Dec 15 10:03:16 mvnowdb03 kernel: __journal_remove_journal_head: freeing b_committed_data

Solution:
  1. There is suggestion to unmount the hard disk and run a fsck on it, then remount. After that, we need restart DB instances and app servers to rebuild connection pool
  2. Replace the dying (bad) disk

No comments:

Post a Comment