Sunday, November 16, 2008

Conditional Restart DB2 DSNJU003

This is a very intresting problem we had some times back.
In Test sub-system, suddenly DB2 crashed, when we tried to make it UP, it was tried to read the archive logs indefintely.
After investigation, we found that one of the user has inserted 100 of millions of rows in a very huge table causing 100s of logs filled up leading to DB2 crash.

After another round of investigation we found that DB2 was trying to recover to a RBA which is not recorded in BSDS. We knew that DB2 will definitely abend after trying to read many archive logs.

Then we think of conditional restart for this by modifying the BSDS.
To do this we need to shutdown the DB2.

Before modifying the BSDS. Create a personal VSAM BSDS to copy one of the BSDS into it as a backup. Here is the BSDS VSAM defination.

//TSG10BDS JOB (ACCT),'CFTUTIL',CLASS=O

//DSNTIC PROC

//* ***************************************************************** */

//* DIRECTORY/CATALOG AMS INVOCATION INSTREAM JCL PROCEDURE */

//* ***************************************************************** */

//DSNTIC EXEC PGM=IDCAMS,COND=(2,LT)

//SYSPRINT DD SYSOUT=*

//SYSUDUMP DD SYSOUT=*

//DSNTIC PEND

//DSNTDBL EXEC DSNTIC

//SYSIN DD *

DEFINE CLUSTER -

( NAME(DSN3.BSDS01.COPY) -

VOLUMES(DSN366) -

REUSE -

SHAREOPTIONS(2 3) ) -

DATA -

( NAME(DSN3.BSDS01.COPY.DATA) -

RECORDS(180 20) -

RECORDSIZE(4089 4089) -

CONTROLINTERVALSIZE(4096) -

FREESPACE(0 20) -

KEYS(4 0) ) -

INDEX -

( NAME(DSN3.BSDS01.COPY.INDEX) -

RECORDS(5 5) -

CONTROLINTERVALSIZE(1024) ) CATALOG(DSN3)

====



Then we did run the following JCL after finding a suitable RBA (this should be multiple of 4K, otherwise DSNJ003 will not update the BSDS and give you the error:

//TSGDSN3A JOB ,
// CLASS=L,MSGCLASS=O,NOTIFY=TSGDSN3,MSGLEVEL=(1,1)
//CONREST EXEC PGM=DSNJU003
//SYSPRINT DD SYSOUT=*
//SYSUT1 DD DSN=DSN3.BSDS01,DISP=SHR
//SYSIN DD *
CRESTART CREATE,ENDRBA=006A7F46D000,FORWARD=NO,BACKOUT=NO
/*

As we have 2 BSDS in place we have copied the same info to other BSDS also. we tried to update the info in BSDS02. but while starting DB2 it gave us synchronisation error. so always update one BSDS and Repro the same into other BSDS.


//TSG10 JOB 1111,'BSDS REC',CLASS=A,MSGCLASS=O
//*
//* ****************************************************
//* *
//* * ***** DB2 MUST BE DOWN WHEN THIS JOB IS RUN. *****
//* *
//* REPRO BSDS01 TO BSDS02
//*
//STEP02 EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=O
//IFILE1 DD DSN=DSN3.BSDS01,
// DISP=SHR
//OFILE1 DD DSN=DSN3.BSDS02,
// DISP=SHR
//SYSIN DD *
REPRO INFILE (IFILE1) -
OUTFILE(OFILE1) -
REUSE
/*

To see the content of DSBS we printed it by:
//TSG10PRN JOB 1111,'BSDS REC',CLASS=L,MSGCLASS=O
//*
//* **************************************************
//* *
//* * ***** DB2 MUST BE DOWN WHEN THIS JOB IS RUN. ***
//* *
//* **************************************************
//*
//* PRINT THE UPDATED BSDS
//*
//PRNTBSDS EXEC PGM=DSNJU004
//SYSPRINT DD SYSOUT=*
//SYSUT1 DD DSN=DSN3.BSDS01,DISP=SHR
//*

Then when we started the subsystem, it started successfully pretty quick:
by giving below message:
STC03816 DSNR014I ! EXCLUDED RURE TABLE 056 T CON-ID CORR-ID AUTHID PLAN S URID DAY TIME
- -------- ------------ -------- -------- - ------------ --- --------
B BATCH DVF26CV DVF26 DSNTEP2 A 006762C6D8C1 221 09:15:50 STC03816 *15 DSNJ245I ! CONDITIONAL RESTART RECORD INDICATES TRUNCATION AT RBA 006A7F46D000.
REPLY Y TO CONTINUE, N TO CANCEL
DSN3MSTR STC03816

Then everything seems right for this test subsystem...
Cheers...
Prakash
IBM Certified DB2 DBA

1 comment:

vijji said...

Hello Prakash,

Its very useful post to all members. My Question is how do you find a suitable RBA to update BSDS vsam file, in this example "006A7F46D000" and how do you know "006A7F46D000" is mulitiple of 4K value.

Appreciate your help!!!