oracle连接数过多导致系统非常慢分析总结

oracle 连接数过多导致系统非常慢分析总结
个人分类:数据库维护

1 问题描述
2010年1月19日,客户数据库外面客户端无法连接,这时系统非常慢,连操作系统命令ls显示也出不来,然后通过杀掉进程,重启中间件应用服务器,重启数据库后,数据库恢复正常。

现把本次数据库故障分析过程总结如下。

2 数据库环境
2.1 数据库系统
版本 环境 数据库名 实例名 IP地址 所在主机
9.2.0.8 单机 ora9 ora9 xxxxx P570a


3 故障分析
当时通过同事查连接数,LOCAL=NO的进程连接数已经达到990,而正常时只有100-150左右,查询alert.log时发现1.19号有以下报错

Tue Jan 19 10:50:49 2010

skgpspawn failed:category = 27142, depinfo = 17, p = fork, loc = skgpspawn5

skgpspawn failed:category = 27142, depinfo = 11, p = fork, loc = skgpspawn3

Tue Jan 19 10:52:49 2010

skgpspawn failed:category = 27142, depinfo = 11, p = fork, loc = skgpspawn5

skgpspawn failed:category = 27142, depinfo = 11, p = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 11, p = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 11, p = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 11, p = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 11, p = fork, loc = skgpspawn3

Tue Jan 19 10:57:04 2010

skgpspawn failed:category = 27142, depinfo = 11, p = fork, loc = skgpspawn5

skgpspawn failed:category = 27142, depinfo = 11, p = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 11, p = fork, loc = skgpspawn3

Tue Jan 19 11:12:04 2010

skgpspawn failed:category = 27142, depinfo = 11, p = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 11, p = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 11, p = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 11, p = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 11, p = fork, loc = skgpspawn3

skgpspawn failed:category = 27142, depinfo = 11, p = fork, loc = skgpspawn3

通过查询文档,表示进程不能启动或者创建

depinfo = 11,:is the o/s errno [EACCES] error may indicate the requested file is not available which may be an effect that the process did not start and hence its proc entries were not created.

查询listener.log时从2010-1-19下午14:43时开始报TNS-12540错误,超出内存限制,外面客户端无法再连接进来。

19-JAN-2010 14:43:36 * (CONNECT_DATA=(SID=ora9jsy)(CID=(PROGRAM=oracle)(HOST=wfzyk)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.40.30.236)(PORT=48365)) * establish * ora9jsy * 12500

TNS-12500: TNS:listener failed to start a dedicated server process

TNS-12540: TNS:internal limit restriction exceeded

TNS-12560: TNS:protocol adapter error

TNS-00510: Interna

l limit restriction exceeded



通过检查listener.log时发现2010-1-19号0点到14点43分之间,个别时间段192.168.2.1这个ip地址每隔几秒就会建立一个连接,总共有300多个连接数,而正常时192.168.2.1这个ip只有1-2个连接数,可以看出这次连接数增多是由这个ip不断连接引起的。

19-JAN-2010 10:32:38 * (CONNECT_DATA=(SID=ora9jsy)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.2.1)(PORT=1074)) * establish * ora9jsy * 0

19-JAN-2010 10:32:38 * (CONNECT_DATA=(SID=ora9jsy)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.2.1)(PORT=1076)) * establish * ora9jsy * 0

19-JAN-2010 10:32:39 * (CONNECT_DATA=(SID=ora9jsy)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.2.1)(PORT=1077)) * establish * ora9jsy * 0

19-JAN-2010 10:33:33 * (CONNECT_DATA=(SID=ora9jsy)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.2.1)(PORT=1124)) * establish * ora9jsy * 0

19-JAN-2010 10:33:34 * (CONNECT_DATA=(SID=ora9jsy)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.2.1)(PORT=1125)) * establish * ora9jsy * 0

19-JAN-2010 10:33:42 * (CONNECT_DATA=(SID=ora9jsy)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.2.1)(PORT=1135)) * establish * ora9jsy * 0

4 总结及建议
建议用户检查192.168.2.1这个中间件应用服务器,看有什么异常导致连接数突然增多,目前用户数大概100-150左右,分配给数据库使用的物理内存已经够用,如果以后业务发展连接数达到1000以上,可以分配更多的物理内存给数据库使用。




相关文档
最新文档