The TCP Time-out Issue affecting Streams Downstream Capture
Problem
In a 2 node RAC environment where Streams downstream capture is running between 2 clusters, it is possible for the log shipping to hang when the public network is disconnected on one of the target nodes. The Virtual IP (VIP) moves to the surviving node, but connectivity does not immediately failover as expected. This is largely due to a TCP time-out issue.
When the Network Interface Card (NIC) dies or the network cable is unplugged on the server making the TCP/IP network unavailable, the client connection ultimately times-out to TNS-12170, TNS-12560, TNS-12535 and TNS-00505 as seen in the source database alert log below:
************************************************************** Fatal NI connect error 12170. VERSION INFORMATION: TNS for Linux: Version 11.1.0.7.0 - Production Unix Domain Socket IPC NT Protocol Adaptor for Linux: Version 11.1.0.7.0 - Production TCP/IP NT Protocol Adapter for Linux: Version 11.1.0.7.0 - Production Time: 14-MAY-2010 11:54:32 Tracing not turned on. Tns error struct: ns main err code: 12535 TNS-12535: TNS:operation timed out ns secondary err code: 12560 nt main err code: 505 TNS-00505: Operation timed out nt secondary err code: 110 nt OS err code: 0
The following process explains the individual steps the client goes through to try and resolve the connection error:
1. The Client talks to a host service on a host that does not exist, ie. there is no system operational on the IP address the client is trying to connect to. Therefore there is no possibility that something will even respond to that IP address.
2. As per the connection model, the client initiates a TCP/IP three-way handshake, but there is no response.
3. The client waits a specified amount of time (OS configurable) like 200ms.
4. It sends the SYN packet again, but still gets no response. So it waits 400ms and tries again. Still no response, so it waits 800ms and tries again. Again, no response, so it waits 1600ms and tries again. After another wait of 3200ms, the client gives up.
5. The client keeps retrying every 3200ms until a predefined time-out is hit and it stops.
On Linux, the kernel parameter that governs tcp time-out is net.ipv4.tcp_retries2 and defaults to 30 minutes (1800 seconds).
In Oracle 10g and above, SQLNet now has the capability of timing out within a desired period, instead of waiting for the TCP timeout to occur.
The following settings can be used in the sqlnet.ora file on the client or server:
sqlnet.inbound_connect_timeout (server) sqlnet.send_timeout (client and/or server) sqlnet.recv_timeout (client and/or server)
However, these are not for connect-time failover, but rather for TAF operations.
In other words, the SQLNet settings will not correct any shortcomings at the TCP layer.
Oracle is heavily reliant on the TCP layer. The timeout values will only work when the TCP/IP address is alive and available.
Solution
The following Linux kernel parameters address the TCP time-out issue:
/proc/sys/net/ipv4/tcp_keepalive_time
• How often TCP sends out keepalive messages when keepalive is enabled. Default: 7200 secs (2 hours)
/proc/sys/net/ipv4/tcp_retries2
• How may times to retry before killing alive TCP connection. Default: 15 corresponds to 13-30min
/proc/sys/net/ipv4/tcp_syn_retries
• Number of SYN packets the kernel will send before giving up on the new connection. Default: 5
Follow the steps below to dynamically reconfigure the Linux kernel parameters from their default settings:
1. As root user on the client node, add the following lines to /etc/sysctl.conf
net.ipv4.tcp_keepalive_time=3000 net.ipv4.tcp_retries2=5 net.ipv4.tcp_syn_retries=1
2. Dynamically update the Linux kernel with the new settings
sysctl -p
All parameters will be displayed following the reload of /etc/sysctl.conf
Conclusion
So, the above solution fixes the TCP timeout issue. However, the LNS process will not resume log shipping to the surviving node, nor will Streams continue to mine and apply the logs, until a logfile switch occurs at the source database.
Should the load (tps) be low at the source, it may be prudent to introduce the archive_lag_target parameter on the source database to force a logfile switch every n seconds.
Furthermore, when the public i/f is disconnected from a node, the DB listener stops. This is expected behaviour because the VIP is relocated to other node. However, it can take up to 10 minutes for CRS to automatically start the listener after the network is restored. This is the default setting for the racgimon process.