Oracle Database Appliance Disk State FAILED / DiskRemoved but ASM Healthy: Troubleshooting Guide

17 May 2026 01:03

—

A strange disk issue on Oracle Database Appliance X10 HA:

While working on an Oracle Database Appliance X10 HA system, I encountered an interesting inconsistency between the ODA management layer and the actual storage stack.

One disk was reported as:

			
[root@myoda1 ~]# odaadmcli show disk
NAME PATH TYPE STATE STATE_DETAILS
e0_pd_00 /dev/sdf SSD ONLINE Good
e0_pd_01 /dev/sda SSD ONLINE Good
e0_pd_02 /dev/sdd SSD ONLINE Good
e0_pd_03 /dev/sdb SSD ONLINE Good
e0_pd_04 /dev/sde SSD FAILED DiskRemoved
e0_pd_05 /dev/sdi SSD ONLINE Good

		

While working on an Oracle Database Appliance X10 HA system, I encountered an interesting inconsistency between the ODA management layer and the actual storage stack.

One disk was reported as:At first glance, this looked like a serious disk failure. However, deeper investigation showed that the disk was still fully operational.

Symptoms

Although odaadmcli reported the disk as removed and failed:

Oracle ASM was actively using the disk
Linux multipath showed healthy paths
Database operations continued normally
No ASM rebalance or disk drop activity existed

The environment was:

Oracle Database Appliance X10 HA
ASM-based storage
Multipath enabled
No visible database impact

Verification Steps

1. Verify Linux Multipath Status

First, I checked the multipath layer. The affected disk still showed: “active ready running” for all paths. This confirmed that the operating system still had healthy access to the device.

2. Verify ASM Disk State

Next, I checked the ASM layer.

			
SQL> SELECT name,
       path,
       header_status,
       mode_status,
       state,
       mount_status
FROM   v$asm_disk where path like '%SSD_E0_S04%'
ORDER BY name;

		

The disk was visible and healthy:

HEADER_STATUS : MEMBER
MODE_STATUS   : ONLINE
STATE         : NORMAL

This is the most important validation step.

If ASM reports the disk as NORMAL and ONLINE, then the database is still safely using the disk.

3. Smartctl Health Check :

smartctl is a command-line utility used to monitor and manage the health of storage devices such as HDDs, SSDs, NVMe drives, and SAS/SATA disks. It is part of the open-source smartmontools package and works with the disk’s S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology) system.

			
[root@myoda1 ~]# smartctl --all /dev/sde
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.17-2136.327.2.el8uek.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor:               SAMSUNG
Product:              MS9AC2DD2SUN7.6T
Revision:             RXA0
Compliance:           SPC-5
User Capacity:        7,681,501,126,656 bytes [7.68 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
Formatted with type 1 protection
8 bytes of protection information per logical block
LU is resource provisioned, LBPRZ=1
Rotation Rate:        Solid State Device
Form Factor:          2.5 inches
Logical Unit id:      YYYYYYYYY
Serial number:        XXXXXXXXX
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Tue May  1 12:49:44 2026 +03
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Percentage used endurance indicator: 0%
Current Drive Temperature:     25 C
Drive Trip Temperature:        74 C
Accumulated power on time, hours:minutes 16203:26
Manufactured in week 32 of year 2023
Accumulated start-stop cycles:  8
Specified load-unload count over device lifetime:  0
Accumulated load-unload cycles:  0
Elements in grown defect list: 0
Error counter log:
          Errors Corrected by           Total   Correction     Gigabytes    Total
              ECC          rereads/    errors   algorithm      processed    uncorrected
          fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0    3320205.863           0
write:         0        0         0         0          0      18114.128           0
Non-medium error count:        1
Pending defect count:0 Pending Defects
No Self-tests have been logged

		

4. Compare ODA Layer

Then I checked ODA storage inventory.

odaadmcli show storage
odaadmcli show disk
odaadmcli show disk e0_pd_04

Only the ODA management layer believed the disk was removed. In

[root@myoda1 ~]# odaadmcli show disk e0_pd_04
…
State : Failed
StateChangeTs : 1777672380
StateDetails : DiskRemoved

			
date -d @1777672380
Fri May  1 19:13:00 UTC 2026

On 01 May 2026 at 22:13, there was a temporary network outage according to /var/log/messages. Although connectivity recovered within seconds, odaadmcli disk status remained in a FAILED state.

While searching for a useful command to diagnose more, I found the odaadmcli stordiag command in the X10 Deployment and User’s Guide for Linux x86-64, which helped me troubleshoot and resolve the problem. It collects detailed information about the disk.

[root@myoda1 ~]# odaadmcli stordiag e0_pd_04

It produces a long output with lots of section.

On section 9.

9 : asmappl.config and multipath.conf consistency check
[INFO]: /opt/oracle/extapi/asmappl.config file is not in sync between nodes, differences are following
> disk AFD:SSD_E0_S04_XXXXXXXP1 0 05 1
> disk AFD:SSD_E0_S04_XXXXXXXP10 0 05 10
> disk AFD:SSD_E0_S04_XXXXXXXP2 0 05 2
> disk AFD:SSD_E0_S04_XXXXXXXP3 0 05 3
> disk AFD:SSD_E0_S04_XXXXXXXP4 0 05 4
> disk AFD:SSD_E0_S04_XXXXXXXP5 0 05 5
> disk AFD:SSD_E0_S04_XXXXXXXP6 0 05 6
> disk AFD:SSD_E0_S04_XXXXXXXP7 0 05 7
> disk AFD:SSD_E0_S04_XXXXXXXP8 0 05 8
> disk AFD:SSD_E0_S04_XXXXXXXP9 0 05 9
/etc/multipath.conf file is in sync between nodes

It appears that /opt/oracle/extapi/asmappl.config was not synchronized between the nodes. When I compared the file on both nodes (myoda1 and myoda2), I noticed that some lines were missing on myoda1.

At this point, I questioned whether it was a good idea to manually modify this file. To stay safe, I took a backup of the existing file and copied the version from the healthy node to the affected node using scp.

However, after this change, odaadmcli show disk still reported the disk as FAILED.

As a next step, I restarted the OAK daemon. I was initially hesitant because I was not sure whether restarting OAK would impact running databases. However, according to Oracle support note KB572658 – “How to Replace an ODA (Oracle Database Appliance) Online Shared Storage Disk”, it is safe to restart oakd in such disk problem scenarios.

[root@myoda1 ~]# odaadmcli restart oak

After restarting OAK, the disk status was re-evaluated. Within a couple of minutes, odaadmcli show disk confirmed that the disk had returned to ONLINE and GOOD state, and the issue was resolved successfully.

Additional info :

You may wonder whether the “Oak Table” from Oracle’s famous performance tuning community has anything to do with the “OAK daemon” you see on Oracle Database Appliance, especially since the names sound so closely related. In reality, there is no connection at all between them. The Oak Table refers to a group of Oracle experts focused on deep performance internals and tuning philosophy, while OAK in ODA simply stands for Oracle Appliance Kit, the internal framework that manages hardware and storage services on the appliance. The similarity is purely coincidental, two completely different worlds that just happen to share the same word.

Hope it helps.

Oracle Database Appliance Disk State FAILED / DiskRemoved but ASM Healthy: Troubleshooting Guide

A strange disk issue on Oracle Database Appliance X10 HA:

Symptoms

Verification Steps

1. Verify Linux Multipath Status

2. Verify ASM Disk State

3. Smartctl Health Check :

4. Compare ODA Layer

Share this:

Discover More from Osman DİNÇ

Comments

Leave your comment Cancel reply