The TCP Time-out Issue affecting Streams Downstream Capture

Problem

In a 2 node RAC environment where Streams downstream capture is running between 2 clusters, it is possible for the log shipping to hang when the public network is disconnected on one of the target nodes. The Virtual IP (VIP) moves to the surviving node, but connectivity does not immediately failover as expected. This is largely due to a TCP time-out issue.

When the Network Interface Card (NIC) dies or the network cable is unplugged on the server making the TCP/IP network unavailable, the client connection ultimately times-out to TNS-12170, TNS-12560, TNS-12535 and TNS-00505 as seen in the source database alert log below:

**************************************************************

Fatal NI connect error 12170.

  VERSION INFORMATION:
     TNS for Linux: Version 11.1.0.7.0 - Production
     Unix Domain Socket IPC NT Protocol Adaptor for Linux: Version 11.1.0.7.0 - Production
     TCP/IP NT Protocol Adapter for Linux: Version 11.1.0.7.0 - Production
  Time: 14-MAY-2010 11:54:32
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12535

TNS-12535: TNS:operation timed out
    ns secondary err code: 12560
    nt main err code: 505

TNS-00505: Operation timed out
    nt secondary err code: 110
    nt OS err code: 0
 

The following process explains the individual steps the client goes through to try and resolve the connection error:

1. The Client talks to a host service on a host that does not exist, ie. there is no system operational on the IP address the client is trying to connect to. Therefore there is no possibility that something will even respond to that IP address.

2. As per the connection model, the client initiates a TCP/IP three-way handshake, but there is no response.

3. The client waits a specified amount of time (OS configurable) like 200ms.

4. It sends the SYN packet again, but still gets no response. So it waits 400ms and tries again. Still no response, so it waits 800ms and tries again. Again, no response, so it waits 1600ms and tries again. After another wait of 3200ms, the client gives up.

5. The client keeps retrying every 3200ms until a predefined time-out is hit and it stops.

On Linux, the kernel parameter that governs tcp time-out is net.ipv4.tcp_retries2 and defaults to 30 minutes (1800 seconds).

In Oracle 10g and above, SQLNet now has the capability of timing out within a desired period, instead of waiting for the TCP timeout to occur.

The following settings can be used in the sqlnet.ora file on the client or server:

sqlnet.inbound_connect_timeout (server)
sqlnet.send_timeout (client and/or server)
sqlnet.recv_timeout (client and/or server)

However, these are not for connect-time failover, but rather for TAF operations.
In other words, the SQLNet settings will not correct any shortcomings at the TCP layer.
Oracle is heavily reliant on the TCP layer. The timeout values will only work when the TCP/IP address is alive and available.

Solution

The following Linux kernel parameters address the TCP time-out issue:

/proc/sys/net/ipv4/tcp_keepalive_time

• How often TCP sends out keepalive messages when keepalive is enabled. Default: 7200 secs (2 hours)

/proc/sys/net/ipv4/tcp_retries2

• How may times to retry before killing alive TCP connection. Default: 15 corresponds to 13-30min

/proc/sys/net/ipv4/tcp_syn_retries

• Number of SYN packets the kernel will send before giving up on the new connection. Default: 5

Follow the steps below to dynamically reconfigure the Linux kernel parameters from their default settings:

1. As root user on the client node, add the following lines to /etc/sysctl.conf

net.ipv4.tcp_keepalive_time=3000
net.ipv4.tcp_retries2=5
net.ipv4.tcp_syn_retries=1

2. Dynamically update the Linux kernel with the new settings

sysctl -p

All parameters will be displayed following the reload of /etc/sysctl.conf

Conclusion

So, the above solution fixes the TCP timeout issue. However, the LNS process will not resume log shipping to the surviving node, nor will Streams continue to mine and apply the logs, until a logfile switch occurs at the source database.

Should the load (tps) be low at the source, it may be prudent to introduce the archive_lag_target parameter on the source database to force a logfile switch every n seconds.

Furthermore, when the public i/f is disconnected from a node, the DB listener stops. This is expected behaviour because the VIP is relocated to other node. However, it can take up to 10 minutes for CRS to automatically start the listener after the network is restored. This is the default setting for the racgimon process.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.