Friday, June 3, 2011

Here is a nasty problem in 11.1 environments

Filed under Oracle metalink as ASM and Database Instance hang when exceeding around 1800 sessions doc id 858279.1

This can occur in both an 11.1 RAC environment as well as a 11.1 single instance non clustered environment.

If you are running an 11.1 single instance non clustered system and using ASM then you probably ( or may ) have just one Oracle home. You installed the oracle software and when you used dbca to create a database and selected ASM you were prompted to run as root a local config. This puts in part of the clusterware component that is ( for some reason ) necessary to work with an ASM instance.

You can patch the 11.1 system using the database patchset updates but you don't receive a critical fix that ( for some reason ... trying to get Oracle support to get this fixed ) that is only in the CRS patchset updates. ( For example I am now running 11.1.0.7.7 ).

The doc id is 858279.1 ... it says RAC but no ... not ONLY RAC ... not necessarily.

It can happen depending on your configuration at a lot less than 1800 sessons ... depends on which sessions read into buffer cache ( I think that is the crucial part ) and need to talk to/thru ASM.

A real bad hang can occur above a certain number of sessions where the database session has to read in something that is not in the buffer pool ... when that certain number of sessions gets above a certain limit ( the cssd bin executable hits it's max number of open files ).

Many people running RAC will have the CRS patchset updates to handle this problem but other people running single instance and using ASM ( how many of those are there ? hard to guess at ) may be open to this.

Even worse the wait interface does not point to a problem communicating with ASM. Too much stuff in 11.1 oracle code area here is apparently currently uninstrumented at 11.1 ...

Pstack trace of a hung process may look like this ... ( call was an insert ).

# pstack 29318
#1 0x00000000032ed6d7 in ntevpque ()
#2 0x00000000032e9e30 in ntevqone ()
#3 0x0000000003281c10 in nsevwait ()
#4 0x00002af99df1758d in clsc_nswait () from /u01/app/oracle/product/11.1.0/dbo
#5 0x00002af99df13daa in clsc_select_ext () from /u01/app/oracle/product/11.1.o
#6 0x00002af99df124e6 in clsc_receive_wait () from /u01/app/oracle/product/11.o
#7 0x00002af99df120a3 in clscreceive () from /u01/app/oracle/product/11.1.0/dbo
#8 0x00002af99df10c18 in clscconnect () from /u01/app/oracle/product/11.1.0/dbo
#9 0x00002af99defa9e0 in clsssInitNative () from /u01/app/oracle/product/11.1.o
#10 0x00002af99defbad3 in clsssinit () from /u01/app/oracle/product/11.1.0/db_1o
#11 0x0000000006eaf951 in kgxgncin ()
#12 0x0000000003aff6c2 in kfmsInit ()
#13 0x0000000003b00f07 in kfmsSlvReg ()
#14 0x0000000003ae342c in kfmdSlvOpPriv ()
#15 0x0000000003adcf9f in kfmEnslave ()
#16 0x0000000003a0f19b in kfddsGet ()
#17 0x0000000005c7fb57 in kfioTranslateIO () <-- this is the ASM file <-> device address translation function (Tanel)
#18 0x0000000005c81f3a in kfioRqSetPrepare ()
#19 0x0000000005c7e5e9 in kfioSubmitIO ()
#20 0x0000000005c7b9ad in kfioRequestPriv ()
#21 0x0000000005c7b366 in kfioRequest ()
#22 0x0000000005c4c1db in ksfd_kfioRequest ()
#23 0x0000000005c48574 in ksfd_osmio ()
#24 0x0000000007c3f837 in ksfd_io ()
#25 0x0000000007c3df1d in ksfdread1 ()
#26 0x0000000001a01286 in kcfrbd ()
#27 0x0000000000e01407 in kcbzib ()
#28 0x0000000007a7451b in kcbgcur ()
#29 0x0000000000d130de in ktbgcur ()
#30 0x0000000007a15b5f in ktspfpblk ()
#31 0x0000000007a1458d in ktspfsrch ()
#32 0x0000000007a13f01 in ktspscan_bmb ()
#33 0x0000000007a1351f in ktspgsp_main ()
#34 0x0000000001427ec4 in kdisnew ()
#35 0x00000000014258cc in kdisnewle ()
#36 0x000000000140ca16 in kdisle ()
#37 0x00000000013c6899 in kdiins0 ()
#38 0x00000000013d7d6c in kdiinsp ()
#39 0x0000000007aa8516 in kauxsin ()
#40 0x0000000007ca0881 in qesltcLoadIndexList ()
#41 0x0000000007ca04f9 in qesltcLoadIndexes ()
#42 0x0000000007c806fc in __PGOSF606_qerltcNoKdtBufferedInsRowCBK ()
#43 0x0000000007c7e753 in qerltcSingleRowLoad ()
#44 0x0000000007c7d5e7 in qerltcFetch ()
#45 0x0000000007bc742d in insexe ()
#46 0x0000000007c8f4c7 in opiexe ()
#47 0x0000000007c987e0 in opipls ()
#48 0x0000000007b40d0c in opiodr ()
#49 0x0000000007c0520b in __PGOSF150_rpidrus ()
#50 0x0000000007d72400 in skgmstack ()
#51 0x0000000007c055d9 in rpidru ()
#52 0x0000000007c04762 in rpiswu2 ()
#53 0x0000000007c03d67 in rpidrv ()
#54 0x0000000007bede07 in psddr0 ()
#55 0x0000000007bedac0 in psdnal ()
#56 0x0000000007dd27d4 in pevm_EXECC ()
#57 0x0000000007dc6d8c in pfrinstr_EXECC ()
#58 0x0000000007dc5a2f in pfrrun_no_tool ()
#59 0x00000000029e35bf in pfrrun ()
#60 0x00000000029f4b55 in plsql_run ()
#61 0x00000000029daa9c in peicnt ()
#62 0x00000000029da8e5 in peiet_execute_trigger ()
#63 0x00000000023b39b2 in kkxtexe ()
#64 0x000000000190b904 in kxtExecuteTriggerRecursive ()
#65 0x0000000007c04762 in rpiswu2 ()
#66 0x000000000190b08e in kxtExecuteTriggerReal ()
#67 0x000000000190ae67 in kxtexe ()
#68 0x00000000056b762d in kxtifir ()
#69 0x00000000056b7bfe in kxtiiex ()
#70 0x000000000535a035 in insIntr ()
#71 0x0000000007bc792a in insExecStmtExecIniEngine ()
#72 0x0000000007c7c631 in qerltcStart ()
#73 0x0000000007bc74ba in insexe ()
#74 0x0000000007c8f4c7 in opiexe ()
#75 0x0000000001c90969 in kpoal8 ()
#76 0x0000000007b40d0c in opiodr ()
#77 0x0000000007d04cdb in ttcpip ()
#78 0x000000000105c4d9 in opitsk ()
#79 0x000000000105eef6 in opiino ()
#80 0x0000000007b40d0c in opiodr ()
#81 0x0000000001058258 in opidrv ()
#82 0x0000000001762cea in sou2o ()
#83 0x0000000000975483 in opimai_real ()
#84 0x00000000017682a1 in ssthrdmain ()
#85 0x00000000009753af in main ()

1 comment:

  1. I am also working on this version but so far have not faced this issue. Its really good to know in advance about this error that may hang the database sessions. I will try to find a solution to this problem online.
    sap support pack upgrade

    ReplyDelete