A strange disk issue on Oracle Database Appliance X10 HA:
While working on an Oracle Database Appliance X10 HA system, I encountered an interesting inconsistency between the ODA management layer and the actual storage stack.
One disk was reported as:
[root@myoda1 ~]# odaadmcli show diskNAME PATH TYPE STATE STATE_DETAILSe0_pd_00 /dev/sdf SSD ONLINE Goode0_pd_01 /dev/sda SSD ONLINE Goode0_pd_02 /dev/sdd SSD ONLINE Goode0_pd_03 /dev/sdb SSD ONLINE Goode0_pd_04 /dev/sde SSD ONLINE Goode0_pd_05 /dev/sdi SSD FAILED DiskRemoved
While working on an Oracle Database Appliance X10 HA system, I encountered an interesting inconsistency between the ODA management layer and the actual storage stack.
One disk was reported as:At first glance, this looked like a serious disk failure. However, deeper investigation showed that the disk was still fully operational.
Symptoms
Although odaadmcli reported the disk as removed and failed:
- Oracle ASM was actively using the disk
- Linux multipath showed healthy paths
- Database operations continued normally
- No ASM rebalance or disk drop activity existed
The environment was:
- Oracle Database Appliance X10 HA
- ASM-based storage
- Multipath enabled
- No visible database impact
Verification Steps
1. Verify Linux Multipath Status
First, I checked the multipath layer. The affected disk still showed: “active ready running” for all paths. This confirmed that the operating system still had healthy access to the device.
2. Verify ASM Disk State
Next, I checked the ASM layer.
SQL> SELECT name, path, header_status, mode_status, state, mount_statusFROM v$asm_disk where path like '%SSD_E0_S05%'ORDER BY name;
The disk was visible and healthy:
HEADER_STATUS : MEMBER
MODE_STATUS : ONLINE
STATE : NORMAL
This is the most important validation step.
If ASM reports the disk as NORMAL and ONLINE, then the database is still safely using the disk.
3. Smartctl Health Check :
smartctl is a command-line utility used to monitor and manage the health of storage devices such as HDDs, SSDs, NVMe drives, and SAS/SATA disks. It is part of the open-source smartmontools package and works with the disk’s S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology) system.
[root@myoda1 ~]# smartctl --all /dev/sdismartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.17-2136.327.2.el8uek.x86_64] (local build)Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org=== START OF INFORMATION SECTION ===Vendor: SAMSUNGProduct: MS9AC2DD2SUN7.6TRevision: RXA0Compliance: SPC-5User Capacity: 7,681,501,126,656 bytes [7.68 TB]Logical block size: 512 bytesPhysical block size: 4096 bytesFormatted with type 1 protection8 bytes of protection information per logical blockLU is resource provisioned, LBPRZ=1Rotation Rate: Solid State DeviceForm Factor: 2.5 inchesLogical Unit id: YYYYYYYYYSerial number: XXXXXXXXXDevice type: diskTransport protocol: SAS (SPL-3)Local Time is: Tue May 1 12:49:44 2026 +03SMART support is: Available - device has SMART capability.SMART support is: EnabledTemperature Warning: Enabled=== START OF READ SMART DATA SECTION ===SMART Health Status: OKPercentage used endurance indicator: 0%Current Drive Temperature: 25 CDrive Trip Temperature: 74 CAccumulated power on time, hours:minutes 16203:26Manufactured in week 32 of year 2023Accumulated start-stop cycles: 8Specified load-unload count over device lifetime: 0Accumulated load-unload cycles: 0Elements in grown defect list: 0Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errorsread: 0 0 0 0 0 3320205.863 0write: 0 0 0 0 0 18114.128 0Non-medium error count: 1Pending defect count:0 Pending DefectsNo Self-tests have been logged
4. Compare ODA Layer
Then I checked ODA storage inventory.
odaadmcli show storage
odaadmcli show disk
odaadmcli show disk e0_pd_05
Only the ODA management layer believed the disk was removed. In
[root@myoda1 ~]# odaadmcli show disk e0_pd_05
…
State : Failed
StateChangeTs : 1777672380
StateDetails : DiskRemoved
date -d @1777672380Fri May 1 19:13:00 UTC 2026
On 01 May 2026 at 22:13, there was a temporary network outage according to /var/log/messages. Although connectivity recovered within seconds, odaadmcli disk status remained in a FAILED state.
While searching for a useful command to diagnose more, I found the odaadmcli stordiag command in the X10 Deployment and User’s Guide for Linux x86-64, which helped me troubleshoot and resolve the problem. It collects detailed information about the disk.
[root@myoda1 ~]# odaadmcli stordiag e0_pd_05
It produces a long output with lots of section.
On section 9.
9 : asmappl.config and multipath.conf consistency check
[INFO]: /opt/oracle/extapi/asmappl.config file is not in sync between nodes, differences are following
> disk AFD:SSD_E0_S05_XXXXXXXP1 0 05 1
> disk AFD:SSD_E0_S05_XXXXXXXP10 0 05 10
> disk AFD:SSD_E0_S05_XXXXXXXP2 0 05 2
> disk AFD:SSD_E0_S05_XXXXXXXP3 0 05 3
> disk AFD:SSD_E0_S05_XXXXXXXP4 0 05 4
> disk AFD:SSD_E0_S05_XXXXXXXP5 0 05 5
> disk AFD:SSD_E0_S05_XXXXXXXP6 0 05 6
> disk AFD:SSD_E0_S05_XXXXXXXP7 0 05 7
> disk AFD:SSD_E0_S05_XXXXXXXP8 0 05 8
> disk AFD:SSD_E0_S05_XXXXXXXP9 0 05 9
/etc/multipath.conf file is in sync between nodes
It appears that /opt/oracle/extapi/asmappl.config was not synchronized between the nodes. When I compared the file on both nodes (myoda1 and myoda2), I noticed that some lines were missing on myoda1.
At this point, I questioned whether it was a good idea to manually modify this file. To stay safe, I took a backup of the existing file and copied the version from the healthy node to the affected node using scp.
However, after this change, odaadmcli show disk still reported the disk as FAILED.
As a next step, I restarted the OAK daemon. I was initially hesitant because I was not sure whether restarting OAK would impact running databases. However, according to Oracle support note KB572658 – “How to Replace an ODA (Oracle Database Appliance) Online Shared Storage Disk”, it is safe to restart oakd in such disk problem scenarios.
[root@myoda1 ~]# odaadmcli restart oak
After restarting OAK, the disk status was re-evaluated. Within a couple of minutes, odaadmcli show disk confirmed that the disk had returned to ONLINE and GOOD state, and the issue was resolved successfully.
Additional info :
You may wonder whether the “Oak Table” from Oracle’s famous performance tuning community has anything to do with the “OAK daemon” you see on Oracle Database Appliance, especially since the names sound so closely related. In reality, there is no connection at all between them. The Oak Table refers to a group of Oracle experts focused on deep performance internals and tuning philosophy, while OAK in ODA simply stands for Oracle Appliance Kit, the internal framework that manages hardware and storage services on the appliance. The similarity is purely coincidental, two completely different worlds that just happen to share the same word.
Hope it helps.


Leave your comment