Oracle RAC Global Cache Block Loss: Network and Performance Optimization
INTRO
In the Enterprise Manager Cloud Control’s Overview of Incidents and Problems section, we’ve observed incidents showing ‘Metrics Global Cache Blocks Lost at XXX’ for a database that has been live for only a few days. Initially, this suggests a potential network issue, most likely related to private network interfaces. In this guide, I’ll outline the steps our team took to diagnose and address the issue.
Relevant Documentation
My Oracle Support (MOS) has several documents that are crucial for addressing this issue:
- Troubleshooting gc block lost and Poor Network Performance in a RAC Environment (Doc ID 563566.1)
- WAITEVENT: “gc current/cr block lost” Reference Note (Doc ID 2296681.1)
Overview
Oracle Clusterware and Oracle RAC instances depend greatly on network performance for both node memberships and overall performance. Each node in an Oracle RAC environment has its own local cache to quickly access and manage data blocks. If a node needs a data block that isn’t in its local cache, it has to fetch it from another node’s cache. This process of sharing data blocks between nodes is called remote cache.The Global Cache Service (GCS) handles this sharing. When a node requests a block that another node has, GCS ensures that the block is transferred smoothly to the requesting node.
The gc current/cr block lost wait event indicates a private network issue or inefficiencies in packet processing. If a requested block isn’t received by the requesting instance within 500ms, it’s considered lost. Such losses can lead to GCS communication instability, node evictions, and performance issues. Ideally, the number of blocks lost should be zero or very low.
Current vs CR block lost
- Current: A current block contains changes for all committed and uncommitted transactions. Relates to Data Manipulation Language (DML) operations mostly.
- CR (Consistent Read): A consistent read (CR) version of a block represents a consistent snapshot of the data at a previous point in time. Relates to SELECT operations. CR requires a block with a specific SCN, unlike current mode, which fetches the block with the current SCN.
According to the Doc ID 563566.1, these are the probable causes.
| Id | Probable causes |
| 1 | Faulty or poorly seated cables/cards/Switches |
| 2 | Poorly sized UDP receive (rx) buffer sizes / UDP buffer socket overflows |
| 3 | Poor interconnect performance and high cpu utilization. `netstat -s` reports packet reassembly failures |
| 4 | Network packet corruption resulting from UDP checksum errors and/or send (tx) / receive (rx) transmission errors |
| 5 | Mismatched MTU sizes in the communication path |
| 6 | Interconnect LAN non-dedicated |
| 7 | Lack of Server/Switch Adjacency |
| 8 | IPFILTER configured |
| 9 | Outdated Network driver or NIC firmware |
| 10 | Proprietary interconnect link transport and network protocol |
| 11 | Misconfigured bonding/link aggregation |
| 12 | Misconfigured Jumbo Frames |
| 13 | NIC force full duplex and duplex mode mismatch |
| 14 | Flow-control mismatch in the interconnect communication path |
| 15 | Packet drop at the OS, NIC or switch layer |
| 16 | NIC Driver/Firmware Configuration |
| 17 | NIC send (tx) and receive (rx) queue lengths |
| 18 | Limited capacity and over-saturated bandwidth |
| 19 | Over subscribed CPU and scheduling latencies |
| 20 | Switch related packet processing problems |
| 21 | QoS which negatively impacts the interconnect packet processing |
| 22 | Spanning tree brownouts during reconvergence. |
| 23 | sq_max_size inadequate for STREAMS queuing |
| 24 | For AIX platform only, VIPA and DGD setting incorrect |
| 25 | For Solaris + Veritas LLT environment, misconfigured switch |
| 26 | For 12.1.0.2, Bug 20922010 FALSE ‘GC BLOCKS LOST’ REPORTED ON 12.1 AFTER UPGRADING FROM 11.2.0.3 |
Summary of Potential Issues
- Network Interconnect Hardware issues
- Misconfiguration in Network parameters and settings
- Network Saturation Due to Load
- Firewall Enabled Over Private Interconnect
- High CPU Consumption Causing Delay in Network Packet Processing
- Known Bug
ENOUGH TALK , LETS FIGHT…
Initial Diagnosis
SYS@bltdb1> select inst_id,name, value from gv$sysstat where name like 'gc blocks lost' order by 1;
INST_ID NAME VALUE
---------- ------------------------------ ----------
1 gc blocks lost 11096
2 gc blocks lost 21658
[oracle@blt05 ~]$ netstat -s | grep 'outgoing packets'
5 outgoing packets dropped
[root@blt05 ~]# netstat -s
IP:
...
35 fragments dropped after timeout
81123961 reassemblies required
6388586 packets reassembled ok
45 packet reassembles failed
...
[root@blt06 ~]# netstat -s
IP:
...
3 fragments dropped after timeout
94816433 reassemblies required
9931919 packets reassembled ok
4 packet reassembles failed
...
[root@blt05 ~]# ifconfig
...
ens5f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.6.1 netmask 255.255.255.0 broadcast 192.168.6.255
inet6 fe80::5eb9:1ff:fe8f:130e prefixlen 64 scopeid 0x20<link>
ether 5c:b9:01:8f:13:0e txqueuelen 1000 (Ethernet)
RX packets 207573153 bytes 136678989847 (127.2 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 225931151 bytes 95493005913 (88.9 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0xc9600000-c96fffff
...
ens5f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.7.1 netmask 255.255.255.0 broadcast 192.168.7.255
inet6 fe80::5eb9:1ff:fe8f:130f prefixlen 64 scopeid 0x20<link>
ether 5c:b9:01:8f:13:0f txqueuelen 1000 (Ethernet)
RX packets 140375260 bytes 90551123621 (84.3 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 106505429 bytes 110881262027 (103.2 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0xc9500000-c95fffff
Millions of times reassemblies required.
On some platforms, the extended statistics for the interconnect are available from the internal view X$KSXPIF. This view contains the interface statistics, such as send_errors, receive_errors, packets_dropped, and frame_errors at the interface level. You can use this information in analyzing network errors.
Enable Jumbo Frames
The first step is to enable jumbo frames for private network communication. In Exadata environments, the default configuration uses Infiniband switches with a 64K (65520-byte) MTU size. However, since our databases are deployed on traditional physical machines, we will configure the network interfaces with a 9000-byte MTU size at the OS level and coordinate this with the network team. The network team will then adjust the relevant port configurations using the “mtu 9216” command. This should be the primary action.
We have added MTU=9000 line to all private interconnect network interfaces. I have 2 interfaces on each server for interconnect communication. According to the Highly Available IP (HAIP) FAQ for Release 11.2 (Doc ID 1664291.1) and Private IP Interface Configuration Requirements (oracle.com) Oracle’s best practice is to use unbonded, unteamed NICs, without any additional layers of virtualization for private interconnect communication with multiple interfaces .
[root@blt05 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens5f0
...
MTU=9000
...
[root@blt05 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens5f1
...
MTU=9000
...
[root@blt06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens5f0
...
MTU=9000
...
[root@blt06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens5f0
...
MTU=9000
...
In most cases, if the database isn’t heavily loaded, this setting should resolve the “gc block losses” issue.
In our case, after implementing jumbo frames for interconnect communication, we saw a significant reduction in “gc block lost” events, though they didn’t disappear completely. Given my tendency towards what Gaja Krishna Vaidyanatha refers to as “Compulsive Tuning Disorder” (a term introduced in the book Oracle Insights: Tales of the Oak Table by Apress in 2004), I wasn’t fully satisfied and aimed to further enhance network throughput through additional tweaks in all OSI Model Layers.
By the way, compulsive tuning disorder is not a good habit. Christian Antognini describes this illness as below in “Troubleshooting Oracle Performance”.
The signs of this illness were the excessive checking of many performance-related statistics, most of them ratio-based, and the inability to focus on what was really important. They simply thought that by applying some “simple” rules, it was possible to tune their databases. History teaches us that results were not always as good as expected. Why was this the case? Well, all the rules used to check whether a given ratio (or value) was acceptable were defined independently of the user experience. In other words, false negatives or positives were the rule and not the exception. Even worse, an enormous amount of time was spent on these tasks.
For example, from time to time a database administrator will ask me a question like “On one of our databases I noticed that we have a large amount of waits on latch X. What can I do to reduce or, even better, get rid of such waits?” My typical answer is “Do your users complain because they are waiting on this specific latch? Of course not. So, do not worry about it. Instead, ask them what problems they are facing with the application. Then, by analyzing those problems, you will find out whether the waits on latch X are related to them or not.” I elaborate on this in the next section. Even though I have never worked as a database administrator, I must admit I suffered from compulsive tuning disorder as well. Today, I have, like most other people, gotten over this disease. Unfortunately, as with any bad illness, it takes a very long time to completely vanish. Some people are simply not aware of being infected. Others are aware, but after many years of addiction, it is always difficult to recognize such a big mistake and break the habit.
Enhance network throughput through additional tweaks
On Data Link Layer (Layer 2) of the OSI model – Tune NIC Buffers
Ring buffers are used to manage and optimize the flow of packets between the network interface card (NIC) and the system memory. By adjusting ring buffer sizes, we can influence how efficiently the NIC processes incoming and outgoing network traffic, which in turn impacts the performance and reliability of the data transmission handled at this layer.
Specifically, ethtool is a utility used to query and control network device driver and hardware settings, including the configuration of ring buffers. Under sized ring buffers or receive queues on a network interface are known to cause silent packet loss, e.g. packet loss that is not reported at any layer.
[root@blt05 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens5f0
...
ETHTOOL_OPTS="-G ens5f0 rx 4096 tx 4096"
[root@blt05 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens5f1
...
ETHTOOL_OPTS="-G ens5f1 rx 4096 tx 4096"
[root@blt06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens5f0
...
ETHTOOL_OPTS="-G ens5f0 rx 4096 tx 4096"
[root@blt06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens5f1
...
ETHTOOL_OPTS="-G ens5f1 rx 4096 tx 4096"
How to measure?
[root@blt05 ~]# ethtool -S enp8s0f0 | grep drop
dropped_smbus: 0
tx_dropped: 0
rx_queue_0_drops: 0
rx_queue_1_drops: 0
rx_queue_2_drops: 0
rx_queue_3_drops: 0
rx_queue_4_drops: 0
rx_queue_5_drops: 0
rx_queue_6_drops: 0
rx_queue_7_drops: 0
On Network Layer (Layer 3) of the OSI model – Adjust IP Fragmentation Parameters
This layer is responsible for logical addressing and routing of packets across networks. On this layer, I configured hhe net.ipv4.ipfrag_high_thresh parameter. The ipfrag_high_thresh tells the kernel the maximum amount of memory to use to reassemble IP fragments. When and if the high threshold is reached, the fragment handler will toss all packets until the memory usage reaches ipfrag_low_thresh instead. This means that all fragments that reached us during this time will have to be retransmitted.
By tweaking ipfrag_high_thresh parameter, we have aimed to improve reassemble failure and timeout counters like below.
`netstat -s` IP stat counters:
...
35 fragments dropped after timeout
81123961 reassemblies required
6388586 packets reassembled ok
45 packet reassembles failed
It is configured by editing /etc/sysctl.conf file. To make the change persistent across reboots, add or modify the parameter in the /etc/sysctl.conf file.
--Current value before tweaking
[oracle@blt05 ~]$ cat /proc/sys/net/ipv4/ipfrag_low_thresh
3145728
[oracle@blt05 ~]$ cat /proc/sys/net/ipv4/ipfrag_high_thresh
4194304
[root@blt06 ~]# cat /proc/sys/net/ipv4/ipfrag_low_thresh
3145728
[root@blt06 ~]# cat /proc/sys/net/ipv4/ipfrag_high_thresh
4194304
[root@blt05 ~]# cat /proc/sys/net/ipv4/ipfrag_time
30
[root@blt06 ~]# cat /proc/sys/net/ipv4/ipfrag_time
30
We will only change ipfrag_high_thresh. You may also increase ipfrag_time if timeouts are highly visible in your netstat -s IP stat counters.
[root@blt05 ~]# vi /etc/sysctl.conf
...
net.ipv4.ipfrag_high_thresh = 8388608
[root@blt05 ~]# sysctl -p
[root@blt06 ~]# vi /etc/sysctl.conf
...
net.ipv4.ipfrag_high_thresh = 8388608
[root@blt06 ~]# sysctl -p
[root@blt06 ~]# cat /proc/sys/net/ipv4/ipfrag_high_thresh
8388608
On Transport Layer (Layer 4) of the OSI model – Configure Socket Buffers
This layer is responsible for end-to-end communication and data flow control between systems. It handles the segmentation and reassembly of data into segments or packets. In the context of TCP/IP networking, this layer includes protocols such as TCP and UDP (which is in charge here).
The rmem (socket read memory) and wmem (socket write memory) socket buffers are related to the Transport layer of the OSI model. The configuration of these parameters affects how much memory is allocated for buffering incoming and outgoing data. Since Oracle RAC Global cache block processing is bursty in nature and, consequently, the OS may need to buffer receive(rx) packets while waiting for CPU. Unavailable buffer space may lead to silent packet loss and global cache block loss.
Although socket buffers related parameters are configured with preinstall package of database software.(oracle-database-preinstall-19c) We will increase all of them (wmem_default, wmem_max , rmem_max and rmem_default) to the 4 MB.
To determine if you are experiencing UDP socket buffer overflow and packet loss, on most UNIX platforms, execute
'netstat -s' or 'netstat -su' and look for either "udpInOverflowsudpInOverflows", "packet receive errors", "fragments dropped" or "outgoing packet drop" depending on the platform.
[root@blt05 ~]# vi /etc/sysctl.conf
# oracle-database-preinstall-19c setting for net.core.rmem_default is 262144
net.core.rmem_default = 4194304
# oracle-database-preinstall-19c setting for net.core.rmem_max is 4194304
net.core.rmem_max = 4194304
# oracle-database-preinstall-19c setting for net.core.wmem_default is 262144
net.core.wmem_default = 4194304
# oracle-database-preinstall-19c setting for net.core.wmem_max is 1048576
net.core.wmem_max = 4194304
[root@blt05 ~]# sysctl -p
[root@blt06 ~]# vi /etc/sysctl.conf
# oracle-database-preinstall-19c setting for net.core.rmem_default is 262144
net.core.rmem_default = 4194304
# oracle-database-preinstall-19c setting for net.core.rmem_max is 4194304
net.core.rmem_max = 4194304
# oracle-database-preinstall-19c setting for net.core.wmem_default is 262144
net.core.wmem_default = 4194304
# oracle-database-preinstall-19c setting for net.core.wmem_max is 1048576
net.core.wmem_max = 4194304
[root@blt06 ~]# sysctl -p
On Application Layer (Layer 7) of the OSI model – Database layer
This layer is entirely within our control. So, what more can we do to improve the situation?
Reducing the number of global cache block requests can directly decrease block losses. Fewer requests result in fewer losses. To address this, we have optimized some long-running queries, thereby minimizing the number of such requests.
Conclusion
After implementing these changes, we observed a significant reduction in ‘gc block lost’ events.(almost zero) The tuning efforts contributed to a notable improvement in network performance. Remember, while fine-tuning can improve performance, excessive tuning without considering real user impact may lead to other unexpected returns.
Hope it helps.


Leave your comment