Improve RAC Network Performance

Troubleshooting gc cr/current block lost events and Improving Oracle RAC Network Performance

In the Enterprise Manager Cloud Control’s Overview of Incidents and Problems section, we’ve observed incidents showing ‘Metrics Global Cache Blocks Lost at XXX’ for a database that has been live for only a few days. Initially, this suggests a potential network issue, most likely related to private network interfaces. In this guide, I’ll outline the steps our team took to diagnose and address the issue.

Relevant Documentation

My Oracle Support (MOS) has several documents that are crucial for addressing this issue:

Overview

Oracle Clusterware and Oracle RAC instances depend greatly on network performance for both node memberships and overall performance. Each node in an Oracle RAC environment has its own local cache to quickly access and manage data blocks. If a node needs a data block that isn’t in its local cache, it has to fetch it from another node’s cache. This process of sharing data blocks between nodes is called remote cache.The Global Cache Service (GCS) handles this sharing. When a node requests a block that another node has, GCS ensures that the block is transferred smoothly to the requesting node.

The gc current/cr block lost wait event indicates a private network issue or inefficiencies in packet processing. If a requested block isn’t received by the requesting instance within 500ms, it’s considered lost. Such losses can lead to GCS communication instability, node evictions, and performance issues. Ideally, the number of blocks lost should be zero or very low.

Current vs CR block lost

  • Current: A current block contains changes for all committed and uncommitted transactions. Relates to Data Manipulation Language (DML) operations mostly.
  • CR (Consistent Read): A consistent read (CR) version of a block represents a consistent snapshot of the data at a previous point in time. Relates to SELECT operations. CR requires a block with a specific SCN, unlike current mode, which fetches the block with the current SCN.

According to the Doc ID 563566.1, these are the probable causes.

IdProbable causes
1Faulty or poorly seated cables/cards/Switches
2Poorly sized UDP receive (rx) buffer sizes / UDP buffer socket overflows
3Poor interconnect performance and high cpu utilization. `netstat -s` reports packet reassembly failures
4Network packet corruption resulting from UDP checksum errors and/or send (tx) / receive (rx) transmission errors
5Mismatched MTU sizes in the communication path
6Interconnect LAN non-dedicated
7Lack of Server/Switch Adjacency
8IPFILTER configured
9Outdated Network driver or NIC firmware
10Proprietary interconnect link transport and network protocol
11Misconfigured bonding/link aggregation
12Misconfigured Jumbo Frames
13NIC force full duplex and duplex mode mismatch
14Flow-control mismatch in the interconnect communication path
15Packet drop at the OS, NIC or switch layer
16NIC Driver/Firmware Configuration
17NIC send (tx) and receive (rx) queue lengths
18Limited capacity and over-saturated bandwidth
19Over subscribed CPU and scheduling latencies
20Switch related packet processing problems
21QoS which negatively impacts the interconnect packet processing
22Spanning tree brownouts during reconvergence.
23sq_max_size inadequate for STREAMS queuing
24For AIX platform only, VIPA and DGD setting incorrect
25For Solaris + Veritas LLT environment, misconfigured switch
26For 12.1.0.2, Bug 20922010 FALSE ‘GC BLOCKS LOST’ REPORTED ON 12.1 AFTER UPGRADING FROM 11.2.0.3

Summary of Potential Issues

  •  Network Interconnect Hardware issues
  •  Misconfiguration in Network parameters and settings
  •  Network Saturation Due to Load
  •  Firewall Enabled Over Private Interconnect
  •  High CPU Consumption Causing Delay in Network Packet Processing
  •  Known Bug

Initial Diagnosis

Millions of times reassemblies required.

On some platforms, the extended statistics for the interconnect are available from the internal view X$KSXPIF. This view contains the interface statistics, such as send_errors, receive_errors, packets_dropped, and frame_errors at the interface level. You can use this information in analyzing network errors.

Enable Jumbo Frames

The first step is to enable jumbo frames for private network communication. In Exadata environments, the default configuration uses Infiniband switches with a 64K (65520-byte) MTU size. However, since our databases are deployed on traditional physical machines, we will configure the network interfaces with a 9000-byte MTU size at the OS level and coordinate this with the network team. The network team will then adjust the relevant port configurations using the “mtu 9216” command. This should be the primary action.

We have added MTU=9000 line to all private interconnect network interfaces. I have 2 interfaces on each server for interconnect communication. According to the Highly Available IP (HAIP) FAQ for Release 11.2 (Doc ID 1664291.1) and Private IP Interface Configuration Requirements (oracle.com) Oracle’s best practice is to use unbonded, unteamed NICs, without any additional layers of virtualization for private interconnect communication with multiple interfaces .

In most cases, if the database isn’t heavily loaded, this setting should resolve the “gc block losses” issue.

In our case, after implementing jumbo frames for interconnect communication, we saw a significant reduction in “gc block lost” events, though they didn’t disappear completely. Given my tendency towards what Gaja Krishna Vaidyanatha refers to as “Compulsive Tuning Disorder” (a term introduced in the book Oracle Insights: Tales of the Oak Table by Apress in 2004), I wasn’t fully satisfied and aimed to further enhance network throughput through additional tweaks in all OSI Model Layers.

By the way, compulsive tuning disorder is not a good habit. Christian Antognini describes this illness as below in “Troubleshooting Oracle Performance”.

The signs of this illness were the excessive checking of many performance-related statistics, most of them ratio-based, and the inability to focus on what was really important. They simply thought that by applying some “simple” rules, it was possible to tune their databases. History teaches us that results were not always as good as expected. Why was this the case? Well, all the rules used to check whether a given ratio (or value) was acceptable were defined independently of the user experience. In other words, false negatives or positives were the rule and not the exception. Even worse, an enormous amount of time was spent on these tasks.

For example, from time to time a database administrator will ask me a question like “On one of our databases I noticed that we have a large amount of waits on latch X. What can I do to reduce or, even better, get rid of such waits?” My typical answer is “Do your users complain because they are waiting on this specific latch? Of course not. So, do not worry about it. Instead, ask them what problems they are facing with the application. Then, by analyzing those problems, you will find out whether the waits on latch X are related to them or not.” I elaborate on this in the next section. Even though I have never worked as a database administrator, I must admit I suffered from compulsive tuning disorder as well. Today, I have, like most other people, gotten over this disease. Unfortunately, as with any bad illness, it takes a very long time to completely vanish. Some people are simply not aware of being infected. Others are aware, but after many years of addiction, it is always difficult to recognize such a big mistake and break the habit.

Enhance network throughput through additional tweaks

On Data Link Layer (Layer 2) of the OSI model – Tune NIC Buffers

Ring buffers are used to manage and optimize the flow of packets between the network interface card (NIC) and the system memory. By adjusting ring buffer sizes, we can influence how efficiently the NIC processes incoming and outgoing network traffic, which in turn impacts the performance and reliability of the data transmission handled at this layer.

Specifically, ethtool is a utility used to query and control network device driver and hardware settings, including the configuration of ring buffers. Under sized ring buffers or receive queues on a network interface are known to cause silent packet loss, e.g. packet loss that is not reported at any layer. 


How to measure?

On Network Layer (Layer 3) of the OSI model – Adjust IP Fragmentation Parameters

This layer is responsible for logical addressing and routing of packets across networks. On this layer, I configured hhe net.ipv4.ipfrag_high_thresh parameter. The ipfrag_high_thresh tells the kernel the maximum amount of memory to use to reassemble IP fragments. When and if the high threshold is reached, the fragment handler will toss all packets until the memory usage reaches ipfrag_low_thresh instead. This means that all fragments that reached us during this time will have to be retransmitted.

By tweaking ipfrag_high_thresh parameter, we have aimed to improve reassemble failure and timeout counters like below.

It is configured by editing /etc/sysctl.conf file. To make the change persistent across reboots, add or modify the parameter in the /etc/sysctl.conf file.

We will only change ipfrag_high_thresh. You may also increase ipfrag_time if timeouts are highly visible in your netstat -s IP stat counters.

On Transport Layer (Layer 4) of the OSI model – Configure Socket Buffers

This layer is responsible for end-to-end communication and data flow control between systems. It handles the segmentation and reassembly of data into segments or packets. In the context of TCP/IP networking, this layer includes protocols such as TCP and UDP (which is in charge here).

The rmem (socket read memory) and wmem (socket write memory) socket buffers are related to the Transport layer of the OSI model. The configuration of these parameters affects how much memory is allocated for buffering incoming and outgoing data. Since Oracle RAC Global cache block processing is bursty in nature and, consequently, the OS may need to buffer receive(rx) packets while waiting for CPU. Unavailable buffer space may lead to silent packet loss and global cache block loss.

Although socket buffers related parameters are configured with preinstall package of database software.(oracle-database-preinstall-19c) We will increase all of them (wmem_default, wmem_max , rmem_max and rmem_default) to the 4 MB.

To determine if you are experiencing UDP socket buffer overflow and packet loss, on most UNIX platforms, execute

On Application Layer (Layer 7) of the OSI modelDatabase layer

This layer is entirely within our control. So, what more can we do to improve the situation?

Reducing the number of global cache block requests can directly decrease block losses. Fewer requests result in fewer losses. To address this, we have optimized some long-running queries, thereby minimizing the number of such requests.

After implementing these changes, we observed a significant reduction in ‘gc block lost’ events.(almost zero) The tuning efforts contributed to a notable improvement in network performance. Remember, while fine-tuning can improve performance, excessive tuning without considering real user impact may lead to other unexpected returns.

Hope it helps.


Discover More from Osman DİNÇ


Comments

One response to “Troubleshooting gc cr/current block lost events and Improving Oracle RAC Network Performance”

Leave a reply to Troubleshooting GC Block Losts and Improving Oracle RAC Network Performance – CodeGurus Cancel reply