ODA Disk Failed

Oracle Database Appliance Disk State FAILED / DiskRemoved but ASM Healthy: Troubleshooting Guide

While working on an Oracle Database Appliance X10 HA system, I encountered an interesting inconsistency between the ODA management layer and the actual storage stack.

One disk was reported as:

[root@myoda1 ~]# odaadmcli show disk
NAME PATH TYPE STATE STATE_DETAILS
e0_pd_00 /dev/sdf SSD ONLINE Good
e0_pd_01 /dev/sda SSD ONLINE Good
e0_pd_02 /dev/sdd SSD ONLINE Good
e0_pd_03 /dev/sdb SSD ONLINE Good
e0_pd_04 /dev/sde SSD ONLINE Good
e0_pd_05 /dev/sdi SSD FAILED DiskRemoved

While working on an Oracle Database Appliance X10 HA system, I encountered an interesting inconsistency between the ODA management layer and the actual storage stack.

One disk was reported as:At first glance, this looked like a serious disk failure. However, deeper investigation showed that the disk was still fully operational.

Although odaadmcli reported the disk as removed and failed:

  • Oracle ASM was actively using the disk
  • Linux multipath showed healthy paths
  • Database operations continued normally
  • No ASM rebalance or disk drop activity existed

The environment was:

  • Oracle Database Appliance X10 HA
  • ASM-based storage
  • Multipath enabled
  • No visible database impact

First, I checked the multipath layer. The affected disk still showed: “active ready running” for all paths. This confirmed that the operating system still had healthy access to the device.

Next, I checked the ASM layer.

SQL> SELECT name,
path,
header_status,
mode_status,
state,
mount_status
FROM v$asm_disk where path like '%SSD_E0_S05%'
ORDER BY name;

The disk was visible and healthy:

HEADER_STATUS : MEMBER
MODE_STATUS : ONLINE
STATE : NORMAL

This is the most important validation step.

If ASM reports the disk as NORMAL and ONLINE, then the database is still safely using the disk.

smartctl is a command-line utility used to monitor and manage the health of storage devices such as HDDs, SSDs, NVMe drives, and SAS/SATA disks. It is part of the open-source smartmontools package and works with the disk’s S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology) system.

[root@myoda1 ~]# smartctl --all /dev/sdi
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.17-2136.327.2.el8uek.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: SAMSUNG
Product: MS9AC2DD2SUN7.6T
Revision: RXA0
Compliance: SPC-5
User Capacity: 7,681,501,126,656 bytes [7.68 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
Formatted with type 1 protection
8 bytes of protection information per logical block
LU is resource provisioned, LBPRZ=1
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Logical Unit id: YYYYYYYYY
Serial number: XXXXXXXXX
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Tue May 1 12:49:44 2026 +03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Percentage used endurance indicator: 0%
Current Drive Temperature: 25 C
Drive Trip Temperature: 74 C
Accumulated power on time, hours:minutes 16203:26
Manufactured in week 32 of year 2023
Accumulated start-stop cycles: 8
Specified load-unload count over device lifetime: 0
Accumulated load-unload cycles: 0
Elements in grown defect list: 0
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 0 0 3320205.863 0
write: 0 0 0 0 0 18114.128 0
Non-medium error count: 1
Pending defect count:0 Pending Defects
No Self-tests have been logged

Then I checked ODA storage inventory.

odaadmcli show storage
odaadmcli show disk
odaadmcli show disk e0_pd_05

Only the ODA management layer believed the disk was removed. In

[root@myoda1 ~]# odaadmcli show disk e0_pd_05

State : Failed
StateChangeTs : 1777672380
StateDetails : DiskRemoved

date -d @1777672380
Fri May 1 19:13:00 UTC 2026

On 01 May 2026 at 22:13, there was a temporary network outage according to /var/log/messages. Although connectivity recovered within seconds, odaadmcli disk status remained in a FAILED state.

While searching for a useful command to diagnose more, I found the odaadmcli stordiag command in the X10 Deployment and User’s Guide for Linux x86-64, which helped me troubleshoot and resolve the problem. It collects detailed information about the disk.

[root@myoda1 ~]# odaadmcli stordiag e0_pd_05

It produces a long output with lots of section.

On section 9. 

  9  : asmappl.config and multipath.conf consistency check                                                                                               
         [INFO]: /opt/oracle/extapi/asmappl.config file is not in sync between nodes, differences are following
> disk  AFD:SSD_E0_S05_XXXXXXXP1                 0                     05                                                                                     1
> disk  AFD:SSD_E0_S05_XXXXXXXP10               0                     05                                                                                     10
> disk  AFD:SSD_E0_S05_XXXXXXXP2                 0                     05                                                                                     2
> disk  AFD:SSD_E0_S05_XXXXXXXP3                 0                     05                                                                                     3
> disk  AFD:SSD_E0_S05_XXXXXXXP4                 0                     05                                                                                     4
> disk  AFD:SSD_E0_S05_XXXXXXXP5                 0                     05                                                                                     5
> disk  AFD:SSD_E0_S05_XXXXXXXP6                 0                     05                                                                                     6
> disk  AFD:SSD_E0_S05_XXXXXXXP7                 0                     05                                                                                     7
> disk  AFD:SSD_E0_S05_XXXXXXXP8                 0                     05                                                                                     8
> disk  AFD:SSD_E0_S05_XXXXXXXP9                 0                     05                                                                                     9
         /etc/multipath.conf file is in sync between nodes

It appears that /opt/oracle/extapi/asmappl.config was not synchronized between the nodes. When I compared the file on both nodes (myoda1 and myoda2), I noticed that some lines were missing on myoda1.

At this point, I questioned whether it was a good idea to manually modify this file. To stay safe, I took a backup of the existing file and copied the version from the healthy node to the affected node using scp.

However, after this change, odaadmcli show disk still reported the disk as FAILED.

As a next step, I restarted the OAK daemon. I was initially hesitant because I was not sure whether restarting OAK would impact running databases. However, according to Oracle support note KB572658 – “How to Replace an ODA (Oracle Database Appliance) Online Shared Storage Disk”, it is safe to restart oakd in such disk problem scenarios.

[root@myoda1 ~]# odaadmcli restart oak

After restarting OAK, the disk status was re-evaluated. Within a couple of minutes, odaadmcli show disk confirmed that the disk had returned to ONLINE and GOOD state, and the issue was resolved successfully.

Additional info :

You may wonder whether the “Oak Table” from Oracle’s famous performance tuning community has anything to do with the “OAK daemon” you see on Oracle Database Appliance, especially since the names sound so closely related. In reality, there is no connection at all between them. The Oak Table refers to a group of Oracle experts focused on deep performance internals and tuning philosophy, while OAK in ODA simply stands for Oracle Appliance Kit, the internal framework that manages hardware and storage services on the appliance. The similarity is purely coincidental, two completely different worlds that just happen to share the same word.

Hope it helps.


Discover More from Osman DİNÇ


Comments

Leave your comment