While applying the R12.2 upgrade driver, we faced the issue of WFXLoad.class failing in adworker log but showing up as running on adctrl
Control
Worker Code Context Filename Status
------ -------- ----------------- -------------------------- --------------
1 Run AutoPatch R120 pl WFXLoad.class Running
2 Run AutoPatch R120 pl WFXLoad.class Running
3 Run AutoPatch R120 pl WFXLoad.class Running
4 Run AutoPatch R120 pl WFXLoad.class Running
5 Run AutoPatch R120 pl WFXLoad.class Running
6 Run AutoPatch R120 pl Wait
7 Run AutoPatch R120 pl WFXLoad.class Running
8 Run AutoPatch R120 pl WFXLoad.class Running
9 Run AutoPatch R120 pl WFXLoad.class Running
10 Run AutoPatch R120 pl Wait
adworker log shows:
Exception in thread "main" java.sql.SQLRecoverableException: IO Error: Socket read timed out
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:482)
at oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:678)
at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:238)
at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:34)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:567)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
at oracle.apps.ad.worker.AdJavaWorker.getAppsConnection(AdJavaWorker.java:1041)
at oracle.apps.ad.worker.AdJavaWorker.main(AdJavaWorker.java:276)
Caused by: oracle.net.ns.NetException: Socket read timed out
at oracle.net.ns.Packet.receive(Packet.java:341)
at oracle.net.ns.NSProtocol.connect(NSProtocol.java:308)
at oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:1222)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:330)
... 8 more
This was happening again and again. The DBAs were suspecting network issue, cluster issue, server issue and all the usual suspects. In Database alert log we saw these errors coming every few seconds:
Fatal NI connect error 12537, connecting to:
(LOCAL=NO)
VERSION INFORMATION:
TNS for Linux: Version 11.2.0.3.0 - Production
Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
Time: 11-NOV-2014 19:58:19
Tracing not turned on.
Tns error struct:
ns main err code: 12537
TNS-12537: TNS:connection closed
ns secondary err code: 12560
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
opiodr aborting process unknown ospid (26388) as a result of ORA-609
We tried changing the parameters in sqlnet.ora and listener.ora as instructed in the article:
Troubleshooting Guide for ORA-12537 / TNS-12537 TNS:Connection Closed (Doc ID 555609.1)
Sqlnet.ora: SQLNET.INBOUND_CONNECT_TIMEOUT=180
Listener.ora: INBOUND_CONNECT_TIMEOUT_listener_name=120
However, the errors continued. To rule out any issues in network, I also restarted the network service on Linux:
service network restart
One thing which I noticed was the extra amount of time that the connect was taking 4 seconds:
21:17:38 SQL> conn apps/apps
Connected.
21:17:42 SQL>
Checked from remote app tier and it was same 4 seconds.
Stopped listener and checked on DB server that uses bequeath protocol:
21:18:47 SQL> conn / as sysdba
Connected.
21:18:51 SQL> conn / as sysdba
Connected.
Again it took 4 seconds.
A few days back, I had seen that connect time had increased after turning setting the DB initialization parameter pre_page_sga to true in a different instance. On a hunch, I checked this and indeed pre_page_sga was set to true. I set this back to false:
alter system set pre_page_sga=false scope=spfile;
shutdown immediate;
exit
sqlplus /nolog
conn / as sysdba
startup
SQL> set time on
22:09:46 SQL> conn / as sysdba
Connected.
22:09:49 SQL>
The connections were happening instantly. So I went ahead and resumed the patch after setting:
update fnd_install_processes
set control_code='W', status='W';
commit;
I restarted the patch and all the workers completed successfully. And the patch was running significantly faster. So I did a search on support.oracle.com to substantiate my solution with official Oracle documentation. I found the following articles:
Slow Connection or ORA-12170 During Connect when PRE_PAGE_SGA init.ora Parameter is Set (Doc ID 289585.1)
Health Check Alert: Consider setting PRE_PAGE_SGA to FALSE (Doc ID 957525.1)
The first article (289585.1) says:
PRE_PAGE_SGA can increase the process startup duration, because every process that starts must access every page in the SGA. This approach can be useful with some applications, but not with all applications. Overhead can be significant if your system frequently creates and destroys processes by, for example, continually logging on and logging off. The advantage that PRE_PAGE_SGA can afford depends on page size.
The second article (957525.1) says:
Having the PRE_PAGE_SGA initialization parameter set to TRUE can significantly increase the time required to establish database connections.
The golden words here are "Overhead can be significant if your system frequently creates and destroys processes by, for example, continually logging on and logging off.". That is exactly what happens when you do adpatch or adop.
Keep this in mind, whenever you do adpatch or adop, make sure that pre_page_sga is set to false. It is possible that you may get the error "java.sql.SQLRecoverableException: IO Error: Socket read timed out" if you don't. Also the patch will run significantly slower if pre_page_sga is set to true. So set it to false and avoid these issues.