Troubleshooting gc cr/current block lost events and Improving Oracle RAC Network Performance

22 Sep 2024 13:01

—

Oracle RAC Global Cache Block Loss: Network and Performance Optimization

INTRO

In the Enterprise Manager Cloud Control’s Overview of Incidents and Problems section, we’ve observed incidents showing ‘Metrics Global Cache Blocks Lost at XXX’ for a database that has been live for only a few days. Initially, this suggests a potential network issue, most likely related to private network interfaces. In this guide, I’ll outline the steps our team took to diagnose and address the issue.

Relevant Documentation

My Oracle Support (MOS) has several documents that are crucial for addressing this issue:

Overview

Oracle Clusterware and Oracle RAC instances depend greatly on network performance for both node memberships and overall performance. Each node in an Oracle RAC environment has its own local cache to quickly access and manage data blocks. If a node needs a data block that isn’t in its local cache, it has to fetch it from another node’s cache. This process of sharing data blocks between nodes is called remote cache.The Global Cache Service (GCS) handles this sharing. When a node requests a block that another node has, GCS ensures that the block is transferred smoothly to the requesting node.

The gc current/cr block lost wait event indicates a private network issue or inefficiencies in packet processing. If a requested block isn’t received by the requesting instance within 500ms, it’s considered lost. Such losses can lead to GCS communication instability, node evictions, and performance issues. Ideally, the number of blocks lost should be zero or very low.

Current vs CR block lost

Current: A current block contains changes for all committed and uncommitted transactions. Relates to Data Manipulation Language (DML) operations mostly.
CR (Consistent Read): A consistent read (CR) version of a block represents a consistent snapshot of the data at a previous point in time. Relates to SELECT operations. CR requires a block with a specific SCN, unlike current mode, which fetches the block with the current SCN.

According to the Doc ID 563566.1, these are the probable causes.

Id	Probable causes
1	Faulty or poorly seated cables/cards/Switches
2	Poorly sized UDP receive (rx) buffer sizes / UDP buffer socket overflows
3	Poor interconnect performance and high cpu utilization. `netstat -s` reports packet reassembly failures
4	Network packet corruption resulting from UDP checksum errors and/or send (tx) / receive (rx) transmission errors
5	Mismatched MTU sizes in the communication path
6	Interconnect LAN non-dedicated
7	Lack of Server/Switch Adjacency
8	IPFILTER configured
9	Outdated Network driver or NIC firmware
10	Proprietary interconnect link transport and network protocol
11	Misconfigured bonding/link aggregation
12	Misconfigured Jumbo Frames
13	NIC force full duplex and duplex mode mismatch
14	Flow-control mismatch in the interconnect communication path
15	Packet drop at the OS, NIC or switch layer
16	NIC Driver/Firmware Configuration
17	NIC send (tx) and receive (rx) queue lengths
18	Limited capacity and over-saturated bandwidth
19	Over subscribed CPU and scheduling latencies
20	Switch related packet processing problems
21	QoS which negatively impacts the interconnect packet processing
22	Spanning tree brownouts during reconvergence.
23	sq_max_size inadequate for STREAMS queuing
24	For AIX platform only, VIPA and DGD setting incorrect
25	For Solaris + Veritas LLT environment, misconfigured switch
26	For 12.1.0.2, Bug 20922010 FALSE ‘GC BLOCKS LOST’ REPORTED ON 12.1 AFTER UPGRADING FROM 11.2.0.3

Summary of Potential Issues

Network Interconnect Hardware issues
Misconfiguration in Network parameters and settings
Network Saturation Due to Load
Firewall Enabled Over Private Interconnect
High CPU Consumption Causing Delay in Network Packet Processing
Known Bug

ENOUGH TALK , LETS FIGHT…

Initial Diagnosis

SYS@bltdb1> select inst_id,name, value from gv$sysstat where name like 'gc blocks lost' order by 1;

	   INST_ID NAME                                VALUE
	---------- ------------------------------ ----------
	   1       gc blocks lost                      11096
	   2       gc blocks lost                      21658

[oracle@blt05 ~]$ netstat -s | grep 'outgoing packets'
	5 outgoing packets dropped

[root@blt05 ~]#  netstat -s
	IP:
	...
	35 fragments dropped after timeout	
	81123961 reassemblies required
	6388586 packets reassembled ok
	45 packet reassembles failed
	...
[root@blt06 ~]# netstat -s
	IP:
	...
	3 fragments dropped after timeout
	94816433 reassemblies required
	9931919 packets reassembled ok
	4 packet reassembles failed
	...

[root@blt05 ~]# ifconfig
	...
	ens5f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500		
	inet 192.168.6.1  netmask 255.255.255.0  broadcast 192.168.6.255
	inet6 fe80::5eb9:1ff:fe8f:130e  prefixlen 64  scopeid 0x20<link>
	ether 5c:b9:01:8f:13:0e  txqueuelen 1000  (Ethernet)
	RX packets 207573153  bytes 136678989847 (127.2 GiB)
	RX errors 0  dropped 0  overruns 0  frame 0
	TX packets 225931151  bytes 95493005913 (88.9 GiB)
	TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
	device memory 0xc9600000-c96fffff
	...
	ens5f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
	inet 192.168.7.1  netmask 255.255.255.0  broadcast 192.168.7.255
	inet6 fe80::5eb9:1ff:fe8f:130f  prefixlen 64  scopeid 0x20<link>
	ether 5c:b9:01:8f:13:0f  txqueuelen 1000  (Ethernet)
	RX packets 140375260  bytes 90551123621 (84.3 GiB)
	RX errors 0  dropped 0  overruns 0  frame 0
	TX packets 106505429  bytes 110881262027 (103.2 GiB)
	TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
	device memory 0xc9500000-c95fffff

Millions of times reassemblies required.

On some platforms, the extended statistics for the interconnect are available from the internal view X$KSXPIF. This view contains the interface statistics, such as send_errors, receive_errors, packets_dropped, and frame_errors at the interface level. You can use this information in analyzing network errors.

Enable Jumbo Frames

The first step is to enable jumbo frames for private network communication. In Exadata environments, the default configuration uses Infiniband switches with a 64K (65520-byte) MTU size. However, since our databases are deployed on traditional physical machines, we will configure the network interfaces with a 9000-byte MTU size at the OS level and coordinate this with the network team. The network team will then adjust the relevant port configurations using the “mtu 9216” command. This should be the primary action.

We have added MTU=9000 line to all private interconnect network interfaces. I have 2 interfaces on each server for interconnect communication. According to the Highly Available IP (HAIP) FAQ for Release 11.2 (Doc ID 1664291.1) and Private IP Interface Configuration Requirements (oracle.com) Oracle’s best practice is to use unbonded, unteamed NICs, without any additional layers of virtualization for private interconnect communication with multiple interfaces .

[root@blt05 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens5f0
...
MTU=9000
...
[root@blt05 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens5f1
...
MTU=9000
...
[root@blt06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens5f0
...
MTU=9000
...
[root@blt06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens5f0
...
MTU=9000
...

In most cases, if the database isn’t heavily loaded, this setting should resolve the “gc block losses” issue.

In our case, after implementing jumbo frames for interconnect communication, we saw a significant reduction in “gc block lost” events, though they didn’t disappear completely. Given my tendency towards what Gaja Krishna Vaidyanatha refers to as “Compulsive Tuning Disorder” (a term introduced in the book Oracle Insights: Tales of the Oak Table by Apress in 2004), I wasn’t fully satisfied and aimed to further enhance network throughput through additional tweaks in all OSI Model Layers.

By the way, compulsive tuning disorder is not a good habit. Christian Antognini describes this illness as below in “Troubleshooting Oracle Performance”.

The signs of this illness were the excessive checking of many performance-related statistics, most of them ratio-based, and the inability to focus on what was really important. They simply thought that by applying some “simple” rules, it was possible to tune their databases. History teaches us that results were not always as good as expected. Why was this the case? Well, all the rules used to check whether a given ratio (or value) was acceptable were defined independently of the user experience. In other words, false negatives or positives were the rule and not the exception. Even worse, an enormous amount of time was spent on these tasks.

For example, from time to time a database administrator will ask me a question like “On one of our databases I noticed that we have a large amount of waits on latch X. What can I do to reduce or, even better, get rid of such waits?” My typical answer is “Do your users complain because they are waiting on this specific latch? Of course not. So, do not worry about it. Instead, ask them what problems they are facing with the application. Then, by analyzing those problems, you will find out whether the waits on latch X are related to them or not.” I elaborate on this in the next section. Even though I have never worked as a database administrator, I must admit I suffered from compulsive tuning disorder as well. Today, I have, like most other people, gotten over this disease. Unfortunately, as with any bad illness, it takes a very long time to completely vanish. Some people are simply not aware of being infected. Others are aware, but after many years of addiction, it is always difficult to recognize such a big mistake and break the habit.

Enhance network throughput through additional tweaks

On Data Link Layer (Layer 2) of the OSI model – Tune NIC Buffers

Ring buffers are used to manage and optimize the flow of packets between the network interface card (NIC) and the system memory. By adjusting ring buffer sizes, we can influence how efficiently the NIC processes incoming and outgoing network traffic, which in turn impacts the performance and reliability of the data transmission handled at this layer.

Specifically, ethtool is a utility used to query and control network device driver and hardware settings, including the configuration of ring buffers. Under sized ring buffers or receive queues on a network interface are known to cause silent packet loss, e.g. packet loss that is not reported at any layer.

[root@blt05 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens5f0
	...
	ETHTOOL_OPTS="-G ens5f0 rx 4096 tx 4096"

[root@blt05 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens5f1
	...
	ETHTOOL_OPTS="-G ens5f1 rx 4096 tx 4096"

[root@blt06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens5f0
	...
	ETHTOOL_OPTS="-G ens5f0 rx 4096 tx 4096"

[root@blt06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens5f1
	...
	ETHTOOL_OPTS="-G ens5f1 rx 4096 tx 4096"

How to measure?

[root@blt05 ~]# ethtool -S enp8s0f0 | grep drop
     dropped_smbus: 0
     tx_dropped: 0
     rx_queue_0_drops: 0
     rx_queue_1_drops: 0
     rx_queue_2_drops: 0
     rx_queue_3_drops: 0
     rx_queue_4_drops: 0
     rx_queue_5_drops: 0
     rx_queue_6_drops: 0
     rx_queue_7_drops: 0

On Network Layer (Layer 3) of the OSI model – Adjust IP Fragmentation Parameters

This layer is responsible for logical addressing and routing of packets across networks. On this layer, I configured hhe net.ipv4.ipfrag_high_thresh parameter. The ipfrag_high_thresh tells the kernel the maximum amount of memory to use to reassemble IP fragments. When and if the high threshold is reached, the fragment handler will toss all packets until the memory usage reaches ipfrag_low_thresh instead. This means that all fragments that reached us during this time will have to be retransmitted.

By tweaking ipfrag_high_thresh parameter, we have aimed to improve reassemble failure and timeout counters like below.

`netstat -s` IP stat counters:
...
35 fragments dropped after timeout	
81123961 reassemblies required
6388586 packets reassembled ok
45 packet reassembles failed

It is configured by editing /etc/sysctl.conf file. To make the change persistent across reboots, add or modify the parameter in the /etc/sysctl.conf file.

--Current value before tweaking
[oracle@blt05 ~]$  cat /proc/sys/net/ipv4/ipfrag_low_thresh
3145728

[oracle@blt05 ~]$  cat /proc/sys/net/ipv4/ipfrag_high_thresh
4194304

[root@blt06 ~]# cat /proc/sys/net/ipv4/ipfrag_low_thresh
3145728

[root@blt06 ~]# cat /proc/sys/net/ipv4/ipfrag_high_thresh
4194304

[root@blt05 ~]# cat /proc/sys/net/ipv4/ipfrag_time
30

[root@blt06 ~]# cat /proc/sys/net/ipv4/ipfrag_time
30

We will only change ipfrag_high_thresh. You may also increase ipfrag_time if timeouts are highly visible in your netstat -s IP stat counters.

[root@blt05 ~]#  vi /etc/sysctl.conf 
...
net.ipv4.ipfrag_high_thresh = 8388608
[root@blt05 ~]# sysctl -p

[root@blt06 ~]#  vi /etc/sysctl.conf 
...
net.ipv4.ipfrag_high_thresh = 8388608
[root@blt06 ~]# sysctl -p

[root@blt06 ~]#  cat /proc/sys/net/ipv4/ipfrag_high_thresh
8388608

On Transport Layer (Layer 4) of the OSI model – Configure Socket Buffers

This layer is responsible for end-to-end communication and data flow control between systems. It handles the segmentation and reassembly of data into segments or packets. In the context of TCP/IP networking, this layer includes protocols such as TCP and UDP (which is in charge here).

The rmem (socket read memory) and wmem (socket write memory) socket buffers are related to the Transport layer of the OSI model. The configuration of these parameters affects how much memory is allocated for buffering incoming and outgoing data. Since Oracle RAC Global cache block processing is bursty in nature and, consequently, the OS may need to buffer receive(rx) packets while waiting for CPU. Unavailable buffer space may lead to silent packet loss and global cache block loss.

Although socket buffers related parameters are configured with preinstall package of database software.(oracle-database-preinstall-19c) We will increase all of them (wmem_default, wmem_max , rmem_max and rmem_default) to the 4 MB.

To determine if you are experiencing UDP socket buffer overflow and packet loss, on most UNIX platforms, execute

'netstat -s' or 'netstat -su' and look for either "udpInOverflowsudpInOverflows", "packet receive errors", "fragments dropped" or "outgoing packet drop" depending on the platform.

[root@blt05 ~]# vi /etc/sysctl.conf

# oracle-database-preinstall-19c setting for net.core.rmem_default is 262144
net.core.rmem_default = 4194304

# oracle-database-preinstall-19c setting for net.core.rmem_max is 4194304
net.core.rmem_max = 4194304

# oracle-database-preinstall-19c setting for net.core.wmem_default is 262144
net.core.wmem_default = 4194304

# oracle-database-preinstall-19c setting for net.core.wmem_max is 1048576
net.core.wmem_max = 4194304

[root@blt05 ~]# sysctl -p

[root@blt06 ~]# vi /etc/sysctl.conf

# oracle-database-preinstall-19c setting for net.core.rmem_default is 262144
net.core.rmem_default = 4194304

# oracle-database-preinstall-19c setting for net.core.rmem_max is 4194304
net.core.rmem_max = 4194304

# oracle-database-preinstall-19c setting for net.core.wmem_default is 262144
net.core.wmem_default = 4194304

# oracle-database-preinstall-19c setting for net.core.wmem_max is 1048576
net.core.wmem_max = 4194304

[root@blt06 ~]# sysctl -p

On Application Layer (Layer 7) of the OSI model – Database layer

This layer is entirely within our control. So, what more can we do to improve the situation?

Reducing the number of global cache block requests can directly decrease block losses. Fewer requests result in fewer losses. To address this, we have optimized some long-running queries, thereby minimizing the number of such requests.

Conclusion

After implementing these changes, we observed a significant reduction in ‘gc block lost’ events.(almost zero) The tuning efforts contributed to a notable improvement in network performance. Remember, while fine-tuning can improve performance, excessive tuning without considering real user impact may lead to other unexpected returns.

Hope it helps.

Discover More from Osman DİNÇ

Comments

One response to “Troubleshooting gc cr/current block lost events and Improving Oracle RAC Network Performance”

Troubleshooting GC Block Losts and Improving Oracle RAC Network Performance – CodeGurus

22 Sep 2024

[…] Article URL: https://dincosman.com/2024/09/22/gc-block-lost/ […]

LikeLike

Reply