Thursday, October 26, 2023

RMAN-06054: media recovery requesting unknown archived log

I wish every DBA should take care of its backups properly. If they had done so, there could never have been so much damage to people's nervous system 😁 (easier said than done). 

Nevertheless, there was a 100GB gap of (?) lost archived redo log files along with 3 weeks old backup. I had to restore and open that Oracle database at any cost. What did I do :

1) I restored database (cf, spfile, datafiles etc.) from level 0 backup and recovered it as much as possible using level 1 backups. It doesn't need to say - at that point the database was inconsistent (mildly saying);

2) I created pfile and set the following parameters in it and brought database back in mount state (right before opening with resetlogs option) :

"_allow_resetlogs_corruption"    = TRUE
"_allow_error_simulation"        = true
undo_management                  = 'MANUAL'

3) alter database open resetlogs ;

At the end the database was opened successfully (unexpectedly 😀), I recreated another undo tablespace and extracted the data I needed.

That's it ! Keep an eye on your backups ! 

PS1. You may need to recreate controlfile during the recovery. SQL dump of it right after the restoration may be useful, particularly when you need only a part, not the whole, of the database.

PS2. The gap between SCNs in restored datafiles can be huge, and errors like ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], [] are possible. To overcome this, you may need to use the following parameter before opening the database in resetlogs mode :

# scn surge, being used instead of _minimum_giga_scn (in the past versions). level 1 increases # 1mln, level 3 increases scn by 3 mln. look at v$datafile_header of datafiles and compare it  # with checkpoint_change# from v$database.

event="21307096 trace name context forever, level 2048"


Monday, October 9, 2023

CLSRSC-318: Failed to start Oracle OHASD service. Died at crsinstall.pm line 3114. Oracle Linux 9 (OL9)

Caught this error during the upgrade of Oracle Restart (SIHA) from 19.19 to 21.11. Here is the log :

Performing root user operation.

The following environment variables are set as:
   ORACLE_OWNER= grid
   ORACLE_HOME=  /u01/siha_2111

Enter the full pathname of the local bin directory: [/usr/local/bin]: The contents of "dbhome" have not changed. No need to overwrite.
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)  
[n]:    Copying oraenv to /usr/local/bin ...
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)  
[n]:    Copying coraenv to /usr/local/bin ...

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/siha_2111/crs/install/crsconfig_params
The log of current session can be found at:
 /u01/app/grid/crsdata/db-04/crsconfig/roothas_2023-10-09_11-36-24AM.log
2023/10/09 11:36:25 CLSRSC-595: Executing upgrade step 1 of 12: 'UpgPrechecks'.
2023/10/09 11:36:29 CLSRSC-595: Executing upgrade step 2 of 12: 'GetOldConfig'.
2023/10/09 11:36:31 CLSRSC-595: Executing upgrade step 3 of 12: 'GenSiteGUIDs'.
2023/10/09 11:36:31 CLSRSC-595: Executing upgrade step 4 of 12: 'SetupOSD'.
2023/10/09 11:36:31 CLSRSC-595: Executing upgrade step 5 of 12: 'PreUpgrade'.
2023/10/09 11:37:30 CLSRSC-595: Executing upgrade step 6 of 12: 'UpgradeAFD'.
2023/10/09 11:37:31 CLSRSC-595: Executing upgrade step 7 of 12: 'UpgradeOLR'.
clscfg: EXISTING configuration version 0 detected.
Creating OCR keys for user 'grid', privgrp 'oinstall'..
Operation successful.
2023/10/09 11:37:34 CLSRSC-595: Executing upgrade step 8 of 12: 'UpgradeOCR'.
LOCAL ONLY MODE  
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4664: Node db-04 successfully pinned.
2023/10/09 11:37:36 CLSRSC-595: Executing upgrade step 9 of 12: 'CreateOHASD'.
2023/10/09 11:37:37 CLSRSC-595: Executing upgrade step 10 of 12: 'ConfigOHASD'.
2023/10/09 11:37:37 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'
2023/10/09 11:39:56 CLSRSC-214: Failed to start the resource 'ohasd' 

Died at /u01/siha_2111/crs/install/crsinstall.pm line 3114.

/var/log/messages contained lots of the following :

db-04 clsecho: /etc/init.d/init.ohasd: Waiting for ohasd.bin PID 12851 to move. CGROUP

The cause is Linux resource control groups (cgroups v2, which is default in OL9) in operating system. The solution - revert back to the previous state (if possible), enable legacy (v1) cgroups in the kernel command line and rerun the upgrade. You need to add systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller into /etc/default/grub file and regenerate grub2 menu if you'd like to keep it in after reboot.

# cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="rhgb numa=off transparent_hugepage=never crashkernel=1G-64G:448M,64G-:512M systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true

# grub2-mkconfig -o /boot/grub2/grub.cfg

That's it !


Friday, October 6, 2023

Error in invoking target 'irman ioracle' when installing Oracle 19c SIHA on Oracle Linux 9 (OL9)

Although Oracle Corp. hasn't certified yet the use of Oracle 19c software on OL9, I decided to try (at least) the installation of the Oracle Restart "Single Instance High Availability (SIHA)". 

Expected message about (un)supported OS was ignored, in particular, by setting CV_ASSUME_DISTID in the shell before invoking ./gridSetup.sh :

$ export CV_ASSUME_DISTID=OL8.8

Then, after getting an error from the title, I simply scp'ed /usr/lib64/libc_nonshared.a over from another OL8 server, retried the linking, and the installation went on pretty well with one exception at the end - Oracle CVU thrown the error :

INFO:  [Oct 6, 2023 3:08:33 PM] RPM Package Manager database ...FAILED (PRVG-13702)
INFO:  [Oct 6, 2023 3:08:33 PM] Post-check for Oracle Restart configuration was unsuccessful.  
INFO:  [Oct 6, 2023 3:08:33 PM] Failures were encountered during execution of CVU verification request "stage -post hacfg".
INFO:  [Oct 6, 2023 3:08:33 PM] RPM Package Manager database ...FAILED
INFO:  [Oct 6, 2023 3:08:33 PM] PRVG-13702 : RPM Package Manager database files are corrupt on nodes "aaa".

The culprit were existing rpm packages in the system with obsolete SHA1 hash algorithm in their signatures. It worth to mention that this server was gradually upgraded from OL6 to OL9 through out of its life, so there were still some rpms signed by SHA1 signature which isn't supported anymore in OL9. The solution was to temporary implement the support of old (unsupported) signatures :

# update-crypto-policies --set LEGACY

and to check:

# update-crypto-policies --show
LEGACY

Next attempt of running CVU has finally succeeded and the installation process (including root scripts for the upgrade part) finished up without any errors.

I run :

# update-crypto-policies --set DEFAULT

to return the changed things back.