Friday, April 25, 2025

Exadata cellcli throws 'OutOfMemoryError' exception

After installation of the QFSDP for Exadata, I found that I couldn't run any cellcli command on one particular storage cell in interactive mode. Switch '-n' (non-interactive mode) for cellcli worked fine though. 

The culprit was a cellcli history file :

/root/.jline3-oracle.ossmgmt.ms.cli.CellCLI.history 

which contained almost 1M lines. As the result the CPU of a storage server was overloaded by java trying to build an array of command history in the memory and after couple of minutes java thrown the exception like this :

CellCLI> help
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
CellCLI> help   at java.util.Arrays.copyOfRange(Arrays.java:3664)
        at java.lang.String.<init>(String.java:207)
        at java.lang.String.substring(String.java:1933)
        at org.jline.reader.impl.history.DefaultHistory.lambda$trimHistory$2(DefaultHistory.java:255)
        at org.jline.reader.impl.history.DefaultHistory$$Lambda$187/1144648478.accept(Unknown Source)
        at java.util.Iterator.forEachRemaining(Iterator.java:116)
        at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
        at org.jline.reader.impl.history.DefaultHistory.trimHistory(DefaultHistory.java:252)
        at org.jline.reader.impl.history.DefaultHistory.internalWrite(DefaultHistory.java:241)
        at org.jline.reader.impl.history.DefaultHistory.save(DefaultHistory.java:219)
        at org.jline.reader.impl.history.DefaultHistory.add(DefaultHistory.java:384)
        at oracle.ossmgmt.common.util.MyLineReader$MyHistory.add(MyLineReader.java:329)
        at org.jline.reader.impl.LineReaderImpl.finish(LineReaderImpl.java:1140)
        at org.jline.reader.impl.LineReaderImpl.finishBuffer(LineReaderImpl.java:1109)
        at org.jline.reader.impl.LineReaderImpl.readLine(LineReaderImpl.java:689)
        at org.jline.reader.impl.LineReaderImpl.readLine(LineReaderImpl.java:468)
        at oracle.ossmgmt.common.util.MyLineReader.readLine(MyLineReader.java:197)
        at oracle.ossmgmt.common.util.MyLineReader.readLineCliImpl(MyLineReader.java:173)
        at oracle.ossmgmt.ms.cli.CellCLI.readAndProcessLine(CellCLI.java:695)
        at oracle.ossmgmt.common.cli.CLIImpl.processCommands(CLIImpl.java:595)
        at oracle.ossmgmt.ms.cli.CellCLI.processCommands(CellCLI.java:639)
        at oracle.ossmgmt.ms.cli.CellCLI.main(CellCLI.java:355)

The solution - to rename (to move) the cellcli history file.

Wednesday, January 29, 2025

ssh: Bad server host key: Invalid key length

 Add the option to the ssh command :

-o RequiredRSASize=1024

Monday, June 24, 2024

How to export and import a large table with BasicFiles LOB(s) quicker

1. Divide the source table into chunks during export. Number of chunks depends on the capabilities of the hardware (you might want to start with 10). This gives you the opportunity to use number_of_chunks parallel processes during the import.

Below is the example export script (based on the calculation of the remainder in the division of the row block number by the fixed number of chunks) :

#!/bin/bash

case $1 in
start)

 chunks=$2 ; if [ -z $2 ] && exit

 
 for i in $(eval echo {00..${chunks}}) ; do
   expdp user_id/pass job_name=expdp_table_name_${i} tables=owner.table_name query=table_name:\"where mod\(dbms_rowid.rowid_block_number\(rowid\), ${chunks}\) = ${i}\" directory=directory_for_dumpfile dumpfile=table_name_chunk_${i}.dmp logfile=directory_for_logfile:expdp_table_name_chunk_${i}.log &
   echo $i
 done

;;
esac


2. Transfer dump files to the recipient site and run the import.

dd

#!/bin/bash

case $1 in
start)

 # enter the same number of chunks as it was during export
 chunks=$2 ; if [ -z $2 ] && exit

 for i in $(eval echo {00..${chunks}}) ; do
   impdp user_id/pass job_name=impdp_table_name_${i} directory=directory_for_dumpfile dumpfile=table_name_chunk_${i}.dmp logfile=directory_for_logfile:impdp_table_name_chunk_${i}.log
remap_table=table_name:table_name_temp remap_schema=old_schema:new_schema content=data_only data_options=disable_append_hint &
   echo $i
 done

;;
esac

The parameters content= and data_options=, as well as running all of the impdp processes in the background, do the whole job.

P.S. You don't have to make the first step; it's possible to use the direct import over database link (network_link parameter of the impdp).



Wednesday, January 3, 2024

How to perform out-of-place patching of Oracle Restart (SIHA) 19c

1. Create empty directory for future Oracle Restart Home and unpack base grid installation along with required OPatch into it.

# mkdir -p /u01/app/19.21/grid

# chown grid:oinstall /u01/app/19.21/grid

# su - grid

$ cd /u01/app/19.21/grid

$ unzip -q /u01/app/oracle/install/19/SOLARIS.SPARC64_193000_grid_home.zip -d .

$ unzip -q /u01/app/oracle/install/19/p6880880_210000_SOLARIS64.zip -d .

2. Run installation using 'software only' mode. It's possible to perform silent installation using response file from previous installation, in this case set oracle.install.option=HA_SWONLY:

$ ./gridSetup.sh -silent -responseFile /u01/app/19.21/grid/gi_install.rsp -applyRU /u01/app/oracle/install/19/1921/35742441/35642822

Follow all postinstall steps (root.sh)

3. Run roothas.sh with -prepatch option.

If it requires clsecho file, copy it from previous Oracle Restart installation and edit the ORACLE_HOME variable at the beginning of it.

# /u01/app/19.21/grid/crs/install/roothas.sh -verbose -prepatch

At this step the whole Oracle Restart stack is being stopped. 

4. Run roothas.sh with -postinstall option.

# /u01/app/19.21/grid/crs/install/roothas.sh -verbose -postpatch -dstcrshome /u01/app/19.21/grid

At this step the whole Oracle Restart stack is being started from the newest Oracle Restart Home.

5. Set CRS=TRUE for the newest ORACLE_HOME and unset for the previous.

$ /u01/app/19.19/grid/oui/bin/runInstaller -updateNodeList ORACLE_HOME=/u01/app/19.19/grid CRS=FALSE
 

$/ u01/app/19.21/grid/oui/bin/runInstaller -updateNodeList ORACLE_HOME=/u01/app/19.21/grid CRS=TRUE

$ cat /u01/app/oraInventory/ContentsXML/inventory.xml

P.S. Check the owner of the newest ORACLE_HOME (must be root). If not, run roothas.sh with -lock option.

# /u01/app/19.21/grid/crs/install/roothas.sh -verbose -lock

Thursday, October 26, 2023

RMAN-06054: media recovery requesting unknown archived log

I wish every DBA should take care of its backups properly. If they had done so, there could never have been so much damage to people's nervous system 😁 (easier said than done). 

Nevertheless, there was a 100GB gap of (?) lost archived redo log files along with 3 weeks old backup. I had to restore and open that Oracle database at any cost. What did I do :

1) I restored database (cf, spfile, datafiles etc.) from level 0 backup and recovered it as much as possible using level 1 backups. It doesn't need to say - at that point the database was inconsistent (mildly saying);

2) I created pfile and set the following parameters in it and brought database back in mount state (right before opening with resetlogs option) :

"_allow_resetlogs_corruption"    = TRUE
"_allow_error_simulation"        = true
undo_management                  = 'MANUAL'

3) alter database open resetlogs ;

At the end the database was opened successfully (unexpectedly 😀), I recreated another undo tablespace and extracted the data I needed.

That's it ! Keep an eye on your backups ! 

PS1. You may need to recreate controlfile during the recovery. SQL dump of it right after the restoration may be useful, particularly when you need only a part, not the whole, of the database.

PS2. The gap between SCNs in restored datafiles can be huge, and errors like ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], [] are possible. To overcome this, you may need to use the following parameter before opening the database in resetlogs mode :

# scn surge, being used instead of _minimum_giga_scn (in the past versions). level 1 increases # 1mln, level 3 increases scn by 3 mln. look at v$datafile_header of datafiles and compare it  # with checkpoint_change# from v$database.

event="21307096 trace name context forever, level 2048"


Monday, October 9, 2023

CLSRSC-318: Failed to start Oracle OHASD service. Died at crsinstall.pm line 3114. Oracle Linux 9 (OL9)

Caught this error during the upgrade of Oracle Restart (SIHA) from 19.19 to 21.11. Here is the log :

Performing root user operation.

The following environment variables are set as:
   ORACLE_OWNER= grid
   ORACLE_HOME=  /u01/siha_2111

Enter the full pathname of the local bin directory: [/usr/local/bin]: The contents of "dbhome" have not changed. No need to overwrite.
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)  
[n]:    Copying oraenv to /usr/local/bin ...
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)  
[n]:    Copying coraenv to /usr/local/bin ...

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/siha_2111/crs/install/crsconfig_params
The log of current session can be found at:
 /u01/app/grid/crsdata/db-04/crsconfig/roothas_2023-10-09_11-36-24AM.log
2023/10/09 11:36:25 CLSRSC-595: Executing upgrade step 1 of 12: 'UpgPrechecks'.
2023/10/09 11:36:29 CLSRSC-595: Executing upgrade step 2 of 12: 'GetOldConfig'.
2023/10/09 11:36:31 CLSRSC-595: Executing upgrade step 3 of 12: 'GenSiteGUIDs'.
2023/10/09 11:36:31 CLSRSC-595: Executing upgrade step 4 of 12: 'SetupOSD'.
2023/10/09 11:36:31 CLSRSC-595: Executing upgrade step 5 of 12: 'PreUpgrade'.
2023/10/09 11:37:30 CLSRSC-595: Executing upgrade step 6 of 12: 'UpgradeAFD'.
2023/10/09 11:37:31 CLSRSC-595: Executing upgrade step 7 of 12: 'UpgradeOLR'.
clscfg: EXISTING configuration version 0 detected.
Creating OCR keys for user 'grid', privgrp 'oinstall'..
Operation successful.
2023/10/09 11:37:34 CLSRSC-595: Executing upgrade step 8 of 12: 'UpgradeOCR'.
LOCAL ONLY MODE  
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4664: Node db-04 successfully pinned.
2023/10/09 11:37:36 CLSRSC-595: Executing upgrade step 9 of 12: 'CreateOHASD'.
2023/10/09 11:37:37 CLSRSC-595: Executing upgrade step 10 of 12: 'ConfigOHASD'.
2023/10/09 11:37:37 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'
2023/10/09 11:39:56 CLSRSC-214: Failed to start the resource 'ohasd' 

Died at /u01/siha_2111/crs/install/crsinstall.pm line 3114.

/var/log/messages contained lots of the following :

db-04 clsecho: /etc/init.d/init.ohasd: Waiting for ohasd.bin PID 12851 to move. CGROUP

The cause is Linux resource control groups (cgroups v2, which is default in OL9) in operating system. The solution - revert back to the previous state (if possible), enable legacy (v1) cgroups in the kernel command line and rerun the upgrade. You need to add systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller into /etc/default/grub file and regenerate grub2 menu if you'd like to keep it in after reboot.

# cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="rhgb numa=off transparent_hugepage=never crashkernel=1G-64G:448M,64G-:512M systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true

# grub2-mkconfig -o /boot/grub2/grub.cfg

That's it !