Blog about Oracle RDBMS on Linux/Solaris and around it: 2021

Monday, December 20, 2021

Recover Oracle 12.1 physical standby database until time - another "method"

For newer versions of Oracle database (more then 12.1) it's easy to use 'alter database recover ... until time' to rollup physical standby database a little bit. For 12.1, the standby database manual recover using the same command was possible using only 'until change' clause.

However, using 'recover' command of sqlplus, it's possible to rollup the standby database until time, take a look :

$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Mon Dec 20 14:14:34 2021

Copyright (c) 1982, 2014, Oracle. All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics
and Real Application Testing options

> recover automatic standby database until time '2021-12-19:08:00:00'
Media recovery complete.

Good Luck !

Timeout parameters from Oracle Net - extraction from docs

sqlnet.ora	tnsnames.ora	Def. value	server side	Description
SQLNET.EXPIRE_TIMEOUT			YES	TCP keepalive setting (in minutes)
SQLNET.INBOUND_CONNECT_TIMEOUT		60 s	YES	To specify the time, in seconds, for a client to connect with the database server and provide the necessary authentication information. INBOUND_CONNECT_TIMEOUT_lsnr_name from listener.ora should be less than parameter from sqlnet.ora
SQLNET.OUTBOUND_CONNECT_TIMEOUT	CONNECT_TIMEOUT (overrides parameter SQLNET.OUTBOUND_CONNECT_TIMEOUT from sqlnet.ora)			Superset of the TCP connect timeout interval. Without this parameter, a client connection request to the database server may block for the default TCP connect timeout duration (60 seconds) when the database server host system is unreachable.
TCP.CONNECT_TIMEOUT	TRANSPORT_CONNECT_TIMEOUT (overrides TCP.CONNECT_TIMEOUT from sqlnet.ora)	60 s		To specify the time, in seconds, for a client (per address) to establish a TCP connection
SQLNET.SEND_TIMEOUT				Specify the time, in seconds, for a database server to complete a send operation to clients, or for clients to complete a send operation after connection establishment.
SQLNET.RECV_TIMEOUT				Specify the time, in seconds, for a database server to wait for client data after connection establishment. A client must send some data within the specified time interval.

There are several parameter In the upper table which defines the behavior of server (client) processes regarding when to cut off a client connection during establishing link between client and server processes.

Generally the connection establishment goes through the following 3 phases (when TCP protocol is used) and time of connection establishment is consist of sum of times : initial connection establishment using TCP transport, time spent on authentication operations and time spent on establishing the connection with requested service from database.

It worse to notice that these parameters and dependencies between them works when a real Oracle client is trying to establish the connection, not just, let's say, a tnsping utility, which is trying to reach out the listener on the other side of connection (and SQLNET.OUTBOUND_CONNECT_TIMEOUT will not be relevant then).

In so called transport phase data are sent over the network using TCP transport. Here TCP.CONNECT_TIMEOUT (TRANSPORT_CONNECT_TIMEOUT) parameters define the timeout of connecting to every trying address from connection description. If not set then 60 seconds its default value. You may get the error TNS-12543 TNS:destination host unreachable sooner then 60 seconds, when network and routing configuration allows to figure out (using broadcast or multicast packages) that requested address is not available in existing network. You get an answer TNS-12535: TNS:operation timed out after 60 seconds when, for example, there is no allowed direct network access to requested listener port and broadcast packages are forbidden between networks.

After that the authentication services are entering into the game and parameters
SQLNET.OUTBOUND_CONNECT_TIMEOUT (CONNECT_TIMEOUT from tnsnames.ora) on the client side and
SQLNET.INBOUND_CONNECT_TIMEOUT on the server side define when to close the connection with client. INBOUND_CONNECT_TIMEOUT defines the time during which the client must provide all the information necessary for authentication and it's solely server parameter.
OUTBOUND_CONNECT_TIMEOUT is the client parameter which specifies the time in seconds during which the client must establish the connection with database.

Next, after successful connection establishing, SQLNET.SEND_TIMEOUT and SQLNET.RECV_TIMEOUT are taking into account if they're set. They oblige the server or client process to send data during specified time (in seconds). If they're not, the connection is disconnected. Oracle do not recommend to set SQLNET.RECV_TIMEOUT on the server side.

As regarding the connection failover, it's another story :)

Good Luck !

Saturday, December 18, 2021

(ur=a) meaning in the connection description

Hi, in this small so-called post I would like to thank Christian Gohmann from the Oracle Community for the answer :

UR=A stands for "User restricted = accepted". This means, that user are still able to connect to a database in restricted mode. Without this setting you will encounter the following error.

ORA-12526: TNS:listener: all appropriate instances are in restricted mode

Of course your user needs the RESTRICTED SESSION privilege.

Thank you Ghristian for saving my time !

Monday, December 13, 2021

ORA-01017: invalid username/password; logon denied - another silly case

It took some time to resolve it, although the cause was actually simple and silly. It occurred only on 12.1 CDB.

Suddenly I was unable to logon as a common user locally into the root container without specifying the password in the command line. It was a test system, on production it worked well though. The connection over the network was OK :

> sqlplus c##ddi

SQL*Plus: Release 12.1.0.2.0 Production on Mon Dec 13 17:42:16 2021

Copyright (c) 1982, 2014, Oracle. All rights reserved.

Enter password:
ERROR:
ORA-01017: invalid username/password; logon denied

Enter user-name: c##ddi
Enter password:
ERROR:
ORA-01017: invalid username/password; logon denied

Enter user-name: ^C
> sqlplus c##ddi@\"ip_address/cdbm02_dev\"

SQL*Plus: Release 12.1.0.2.0 Production on Mon Dec 13 17:41:28 2021

Copyright (c) 1982, 2014, Oracle. All rights reserved.

Enter password:
Last Successful login time: Mon Dec 13 2021 17:22:17 +03:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics
and Real Application Testing options

C##DDI@ip_address/cdbm02_dev@CDB$ROOT hostname 13.12.21 17:41:31 > Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics
and Real Application Testing options
>

The solution was to add a slash (/) after an username (:-D) :

> sqlplus c##ddi/

SQL*Plus: Release 12.1.0.2.0 Production on Mon Dec 13 17:44:30 2021

Copyright (c) 1982, 2014, Oracle. All rights reserved.

Enter password:
Last Successful login time: Mon Dec 13 2021 17:41:30 +03:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics
and Real Application Testing options

C##DDI@cdbm02_dev@CDB$ROOT hostname 13.12.21 17:44:37 > Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics
and Real Application Testing options

But the real cause of this was operating system PATH, which led to the place of installed Oracle Instant Client, version 18.3. Starting sqlplus from ORACLE_HOME resolved the issue :

> $ORACLE_HOME/bin/sqlplus c##ddi

SQL*Plus: Release 12.1.0.2.0 Production on Mon Dec 13 17:48:31 2021

Copyright (c) 1982, 2014, Oracle. All rights reserved.

Enter password:
Last Successful login time: Mon Dec 13 2021 17:44:37 +03:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics
and Real Application Testing options

C##DDI@cdbm02_dev@CDB$ROOT hostname 13.12.21 17:48:37 >

Good Luck !

Monday, December 6, 2021

PERCENT_SPACE_RECLAIMABLE is 0 for ARCHIVED LOG in v$flash_recovery_area_usage

Recently, after creating new physical standby from duplicated database I found that PERCENT_SPACE_RECLAIMABLE on standby is always 0, while on primary it is equal to PERCENT_SPACE_USED. Archivelog deletion policy was the same on both databases :

CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY;

So in fresh, newly created physical standby I would be in trouble when recovery area free space will come to the end.

The solution was to open that standby database in 'read only' mode, after that PERCENT_SPACE_RECLAIMABLE had been refreshed to right value. After bouncing the standby into MOUNT state and after running it in normal mode the free space in the flash recovery area was shown correctly.

Good Luck !

Tuesday, November 30, 2021

ORA-10458 when opening Oracle physical standby read only after duplicating (restoring from another physical standby)

You might get the following errors stack when trying to open physical standby database read only. In my case it was duplicated (restored from service from different active physical standby), and I got the errors :

SQL> alter database open read only
*
ERROR at line 1:
ORA-10458: standby database requires recovery
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '+DATAC8/BARS12_TST/DATAFILE/system.266.1089980259'

Actually, all files was consistent, all checkpoints was equal etc., but some datafiles was fuzzy (fhsta=64) :

SQL> select count(*) ,fhsta from x$kcvfh group by fhsta;
COUNT(*)| FHSTA
===============|===============

            28|              0
           103|             64
             1|           8256

What did not help :

- register/catalog next not applied archived log and run 'recover database' from rman ;

- register/catalog next not applied archived log and run 'recover standby database' from sqlplus.

What really helped :

- register next not applied archived log and run 'alter database recover managed standby database disconnect' from sqlplus (equals to run 'recover managed standby database disconnect' sqlplus command); next wait while last archived log is implemented, then cancel media recovery and finally open standby database read only without issues.

Good Luck !

Monday, November 15, 2021

ERROR: The home is not clean. This home cannot be used since there was a failed OPatch execution in this home. Use a different home to proceed.

Everyone knows how to install the software for Oracle 19c database or grid infrastructure - Oracle ships preinstalled OH as a base release and a customer is performing runInstaller which is actually applying appropriate Release Update and one-off patches first and then linking libraries, executables and perform another stuff.

First you need to unpack the base release. Then, after running ./runInstaller script, something can potentially go wrong during applying the patches, for example, you might forget to renew opatch beforehand. To resolve the issue consult opatch logfiles located in ORACLE_HOME/cfgtoollogs directory. If the error is correctable, you fix it and rerun the installation, but you may get the following error :

ERROR: The home is not clean. This home cannot be used since there was a failed OPatch execution in this home. Use a different home to proceed.

You can go on by removing the new OH first, unzip a base home and rerun theinstallation, and it's actually the way the Oracle advises to follow on. In my case the opatch was older then required to install RU; it was not so much convenient to unpack or to copy over the network the installation files again.

So running truss (it was Oracle Solaris) I figured out the empty file is created when opatch starting its job during installation and this file is removed after successful all opatch operations. The file is :

$ORACLE_HOME/install/patch

I just removed it and rerun installation without unzipping base home again. All patches was applied successfully and the software was configured OK as well. That's it ! :)

P.S. You might with to clone ORACLE_HOME to avoid long installation and applying patches time.

Good Luck !

Tuesday, November 9, 2021

Another way to create VIP resource in Grid Infrastructure

1. Create additional type based on 'cluster_resource' type (starting with cluster)

# crsctl add type app.appviptypex2.type -basetype ora.cluster_resource.type -file /u01/app/19.13/grid/crs/template/appvipx.type

2. Create VIP resource :

# crsctl add resource vip-resource-name -type app.appviptypex2.type -attr "USR_ORA_VIP=vip_ip_address,NETWORK_RESOURCE=,START_DEPENDENCIES=hard(ora.net1.network) pullup(ora.net1.network),STOP_DEPENDENCIES=hard(intermediate:ora.net1.network),ACL='owner:root:rwx,pgrp:root:r-x,other::r--,user:root:r-x',APPSVIP_FAILBACK="

Carefully check newly created resource parameters and change it if needed :

# crsctl stat res vip-resource-name -p

3. Use it ! :-)

Monday, November 8, 2021

Enabling automatic mounting of nfs shares on Oracle Solaris

1. Edit /etc/vfstab. Example :

nfs_server:/nfs_share - /mnt/nfs_share_mount_point nfs - yes rw,bg,hard,nointr,rsize=1048576,wsize=1048576,vers=3,proto=tcp,forcedirectio,nocto

Here 'yes' means to mount during operating system start.

2. Enable Solaris nfs services (if disabled, check with 'svcs' command) :

# svcadm enable svc:/network/nfs/client:default svc:/network/nfs/status:default svc:/network/nfs/nlockmgr:default

3. Check it during reboot (if possible). Diagnose failed services with 'svcs | grep -v online' and 'svcs -x' commands.

Good Luck !

Thursday, November 4, 2021

ORA-15040 when duplicating or creating database on Exadata (db_unique_name is the same)

I got stuck into the situation when I needed to move the database between RACs. I decided to do it via RMAN : duplicate first and then remove database on the source. After some preparations, when I started to check how the restoration over the network works, I got the error :

Recovery Manager: Release 12.1.0.2.0 - Production on Thu Nov 4 17:03:05 2021

Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.

connected to target database: CDBM02 (not mounted)

RMAN> restore controlfile from service "hostname/service_dev" ;

Starting restore at 04.11.2021 17:03:30
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=292 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: using network backup set from service hostname/service_dev
channel ORA_DISK_1: restoring control file
dbms_backup_restore.restoreCancel() failed
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 11/04/2021 17:03:44
ORA-19660: some files in the backup set could not be verified
ORA-19661: datafile 0 could not be verified
ORA-19849: error while reading backup piece from service hostname/service_dev
ORA-19504: failed to create file "+RECOC9"
ORA-17502: ksfdcre:4 Failed to create file +RECOC9
ORA-15001: diskgroup "RECOC9" does not exist or is not mounted
ORA-15040: diskgroup is incomplete

RMAN>

I started to check compatibility attributes for asm diskgroup (it was set to compatible.rdbms=12.1 and compatible.asm=19.0), permissions of $ORACLE_HOME/bin/oracle binary, alert logs (ASM and database), trace files etc. The trace files shown me the following for the every exadata grid disk :

WARNING: disk locally closed resulting in I/O error [0xf4]

I started to suspect that the cause was inside the storage cells. So consulting with alert log files on the cells gave me the clue :

Warning: Two databases from different clusters have the same DB_UNIQUE_NAME parameter value of "SERVICE_DEV".
Rejecting open request from host ...

Changing db_unique_name parameter on the source resolved the issue.

P.S. Later I found MOS Note "Exadata: Warning Cellsrv Rejecting Open Request from Host Due to DB Name Conflict (Doc ID 2060753.1)" describing the same issue and giving some additional advice regarding it.

Monday, October 25, 2021

"wait for unread message on broadcast channel" event and DG broker configuration

I had an issue with Data Guard the other day.

The configuration was in maximum availability mode. The network failure between primary and standby database caused the failure in transmitting the changes from primary with a lot of timeouts.

The restoration of communication didn't change anything. There were no abnormally not-yet-opened-completely or not-yet-closed-completely network sockets from the perspective of operating systems on the servers. Any try from Data Guard Broker to view the status of physical standby or to change its property (logxptmode, for example) lead to unresponsive broker interface and event 'wait for unread message on broadcast channel' on the server where that change was attempted to apply. The 'disable configuration' or 'disable database' commands didn't work as well with the same result.

The options were to continue to work resolving gaps manually OR to try to do something with broker.

Fortunately, simply re-enabling the broker configuration at primary at first and then on physical standby helped to resolve the issue. Suppose the simple killing of DG Broker processes like NSV0 and INSV could help also, but I didn't try it. The cause was in hanged DG broker processes on all servers involved in configuration.

Good Luck !

Thursday, October 7, 2021

The adventure to enable back root ssh login on Solaris

There are some situations where root login must be enabled (permitted) over ssh. Sometimes it is a tricky task ))

First of all I enabled the

PermitRootLogin yes

in /etc/ssh/sshd_config.

Then I got an error

Received disconnect from server_ip port xx:2: Too many authentication failures

To overcome this, I've commented out the line

#MaxAuthTries 3

It equals 6 by default.

But it wasn't the last step. The ssh daemon continue to ask the root password after entering it after the first attempt. To see what happened I ran ssh daemon in the console :

# svcadm disable ssh

# /usr/lib/ssh/sshd -D -d -f /etc/ssh/sshd_config

Around a lot of debug messages I saw the message

PAM: User account has expired for root from client_ip

It was strange. The root account looked fine, not expired etc. The rambling over the server led me to /etc/user_attr file where root was presented like :

root::::type=role

According to man pages :

          type

              Can be assigned one of these strings: normal, indicating that
              this account is for a normal user, one who logs in; or role,
              indicating that this account is for a role. Roles can only be
              assumed by a normal user after the user has logged in.

So I decided to change the type to normal:

# rolemod -K type=normal root

It was enough )). The root ssh login started to work after that.

Good Luck !

Wednesday, September 29, 2021

Solaris 11.4 and ASM and SAN - conceptual steps

1. Create necessary FC zoning infrastructure (create zones, plug transceivers etc.) if needed. Verify using your favorite method, for example :

# fcinfo hba-port

# luxadm -e dump_map /dev/cfg/c<number_of_attachment_point>

# cfgadm -al -o show_FCP_dev

2. Create virtual volumes and provide vluns (virtual luns) to the Solaris host.

The volumes must be of equal size for one disk group.

For data disk group I would prefer thin provisioning volumes, for fast recovery area and for redo - full provisioning volumes, but it is up to you.

The amount of volumes per disk group depends on the many factors - number of active controllers, number of CPUs inside controllers, number of paths etc.; I would recommend to follow the best practices of the storage vendor at least.

3. On Solaris 11.4, the multipath I/O software is enabled by stmsboot -e command. It requires reboot when multipath support has not being enabled. If multipath was enabled before, to refresh the number of disk devices do the following :

# devfsadm -c disk

Verify that disks were created in the system using

# echo | format

Collate information using virtual volume WWN on OS and on storage area.

You may have checked the file /etc/driver/drv/fp.conf for proper multipath functionality working as well.

4. Create label of newly created disks. If disk size is less then 2TB then Solaris creates SMI Label by default. I would like to suggest to use EFI label (gpt analog in Linux), specifically when you will enlarge diskgroup and the size of each underlying disk will be more then 2TB. To do so, use -e option for format :

# format -e

and choose appropriate label type :

format> label
[0] SMI Label
[1] EFI Label
Specify Label type[1]:

The new disk device (name ends with d0) will be appear in dsk/rdsk directory

5. Configure Solaris zones (if used and needed) to see new devices. For example, to see all the devices (SAN disks) inside the zone, configure the zone as creation of device with matching :

# zonecfg -z $(hostname)-cde info
...
brand: solaris
autoboot: true
...
device:
match: /dev/rdsk/*
#

6. Make permission on new devices to grid home user and asmadmin group.

7. Create/alter diskgroup using multipath generated paths of disks like '/dev/rdsk/....s0'

That's it :)

Wednesday, September 22, 2021

RMAN hangs with SQL*Net Break/reset To Client

Once I tried to recover the standby database over the network using RMAN, the operation started and lasted infinitely in the first file :

RMAN> recover database noredo from service "connection_name" ;

Starting recover at 22-SEP-21
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=10 device type=DISK
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: using network backup set from service dg-bris122-m8f
destination for restore of datafile 00001: +DATA_RCF_CDE/DB_SA_RCF/DATAFILE/system.265.1049997735

Quick analyze shown than rman session on the remote database was INACTIVE with even "SQL*Net Break/reset To Client". The Database Reference says about this event - "The server sends a break or reset message to the client. The session running on the server waits for a reply from the client.". It worth to mention that operating system was Oracle Solaris 11.4, and the both databases situated inside dedicated database zones with dedicated (non-shared) network interfaces, separated by the firewall though. It looked like the server (remote database) send to the client (rman) break message (out-of-bands break or OOB) and the client host didn't received it because of the firewall or the client host hadn't processed it (or unable to).

So I decided to disable OOB in the sqlnet.ora on the rman side via setting the following in sqlnet.ora.

disable_oob = on

It worked (fortunately or not) :)

Wednesday, September 8, 2021

What does awk script do ?

Sometimes it's difficult to understand what the awk program has done. For better understanding, it's useful to debug the program (-D option). But there is another brief way to produce the outline of what program did - to produce execution trace of the program with help of '--profile' option. Let's consider an example.

Suppose that we have some input file which actually can contain anything. In my case it was the file contained disassembled function. The fragment of input file is the following :

0x000000000cd2e32a <+442>:   test   %eax,%eax
  0x000000000cd2e32c <+444>:   je     0xcd2e33a <ksfd_io+458>
  0x000000000cd2e32e <+446>:   callq 0xcdc40f0 <sltrgftime64>
  0x000000000cd2e333 <+451>:   mov    %rax,-0x98(%rbp)
  0x000000000cd2e33a <+458>:   mov    -0x60(%rbp),%rax
  0x000000000cd2e33e <+462>:   mov    0x88(%rax),%r12d

I would like to display all the callq's between two calls kslwtbctx and kgecrs.

The short onliner can be like this :

gawk '!/callq/{next}/kslwtbctx/,/kgecrs/' input

The result is looking like this :

> awk '!/callq/{next}/kslwtbctx/,/kgecrs/' input
  0x000000000cd2e3ac <+572>:   callq 0xc9a3e00 <kslwtbctx>
  0x000000000cd2e3c7 <+599>:   callq 0xce573b0 <skgfrgsz>
  0x000000000cd2e3db <+619>:   callq 0xce418b0 <kghstack_alloc>
  0x000000000cd2e449 <+729>:   callq 0xcd32c80 <ksfd_osdrqfil>
  0x000000000cd2e4a2 <+818>:   callq 0xcd334f0 <ksfd_skgfqio>
  0x000000000cd2e4b4 <+836>:   callq 0xce42580 <kghstack_free>
  0x000000000cd2e4ce <+862>:   callq 0xce4e090 <kgecrs>

What does this short onliner do ? To clarify a behavior, use the profiler :

gawk --profile '!/callq/{next}/kslwtbctx/,/kgecrs/' input

The file awkprof.out is generated (after the execution of awk program) and contains the following :

> cat awkprof.out
       # gawk profile, created Wed Sep 8 16:47:20 2021

       # Rule(s)

4085 ! /callq/ { # 3879
3879          next
       }

  206 /kslwtbctx/, /kgecrs/ { # 7
    7          print $0
       }

Here we can see that program is consisted in two steps. The first one is checking the every input line on the regexp pattern /callq/. If it does not contain callq part of the line, the next line from the input is read.

The next pattern block is a range pattern which filters all the lines between line contained kslwtbctx and kgecrs, including first and last matched lines. But the output contains only callq instructions because of the first block checking /callq/ pattern before. Every input line is checked by the first block and then by second anyway.

Hope if was useful !

Good Luck !

Wednesday, September 1, 2021

RMAN output like in sqlplus (command, then its output)

There are several ways to organize output of rman script.

I prefer 'set echo on' way, because it affords to see the output of each command consequently, although the output of block { } goes only after the entire block, and output line from the previous command can potentially interleave with the output of next command.

Let suppose we've got the following script to test :

set echo on

connect target 'c##ddi/password@orcl2 as sysbackup'

select user from dual ;

exit

You might notice 'set echo on' here, it does the main job to obtain easy to read output. According the documentation it can be useful when the opportunity exists to manipulate standard input and standard output (inside of Unix like operating systems, for example).

You can use several ways of getting the stuff to work (it is not a full list of possibilities) :

$ cat conn2.rman | rman > conn2.rman.out

$ rman @ $(pwd)/conn2.rman > conn2.rman.out

$ rman < conn2.rman > conn2.rman.out

$ rman @ $(pwd)/conn2.rman | tee conn2.rman.out

The output with 'set echo on' but without 'tee' looks like this:

Recovery Manager: Release 12.1.0.2.0 - Production on Wed Sep 1 17:25:54 2021

Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.

RMAN> ;
echo set on
connect target *

connected to target database: ORCL2 (DBID=1043166856)

RMAN>

RMAN> select user from dual ;
using target database control file instead of recovery catalog
USER
------------------------------
SYSBACKUP

RMAN>

RMAN> exit

Without 'set echo on' the output looks like 'list of all the commands --> output of all the commands' (with help of 'tee'):

Recovery Manager: Release 12.1.0.2.0 - Production on Wed Sep 1 23:08:47 2021

Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.

RMAN> set echo on
2>
3> connect target *
4>
5> select user from dual ;
6>
7> exit
echo set on

connected to target database: ORCL2 (DBID=1043166856)

using target database control file instead of recovery catalog
USER
------------------------------
SYSBACKUP

Recovery Manager complete.

Another output without 'set echo on' and without 'tee' looks like this:

Recovery Manager: Release 12.1.0.2.0 - Production on Wed Sep 1 17:26:50 2021

Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.

RMAN>
connected to target database: ORCL2 (DBID=1043166856)

RMAN>
RMAN>
using target database control file instead of recovery catalog
USER
------------------------------
SYSBACKUP

RMAN>
RMAN>

Recovery Manager complete.

P.S. Sometimes we can see the output from connect command trapped into output of select statement.

Good Luck !

rman ORA-01031 RMAN-00554 from command line after connecting AS SYSBACKUP

The way of connecting to database by rman from command line and from the script is a little bit different (the difference is in connection string).

For example, for operating system command line use '"..."' form :

$ rman target '"c##ddi@orcl2 as sysbackup"'

For the script use '...' form

RMAN> connect target 'c##ddi@orcl2 as sysbackup'

P.S. For channels configuration use '"..."' form.

Saturday, August 28, 2021

Unable to start CRS after ASM group mounted temporary on different ASM instance

There was a situation when I had to mount disk group of one ASM instance on another to simplify moving of 15 TB database. What I did :

1. Prepared new CRS and new ASM instance with single disk group (NORMAL redundancy). CRS places OCR and voting files on this group.

2. Stopped newly created CRS.

3. On ASM instance contained the database I changed asm_diskstring parameter to get access to newly asm disk group.

4. Mounted newly created ASM disk group on ASM instance with database.

5. Backed up database via RMAN to new ASM disk group.

6. Dismounted newly created ASM disk group on ASM instance with database and changed asm_diskstring parameter back to the value it had before.

7. Started CRS by crsctl start crs

And nothing happened... :)

The process list looked like this one :

# ps -u oracle
PID TTY         TIME CMD
62224 ?           0:07 diskmon
61638 ?           0:04 oraagent.bin
61687 ?           0:04 evmd.bin
61955 ?           0:08 gipcd.bin
61924 ?           0:03 evmlogger.bin
64258 pts/9       0:00 less
64665 pts/9       0:00 bash
46926 ?           0:04 gpnpd.bin
11263 ?           0:04 ocssd.bin

The ocssd.trc log file said 'no voting files found' :

2021-08-28 23:29:45.985 :    CSSD:29: [     INFO] clssnmvDiskVerify: Successful discovery of 0 disks
2021-08-28 23:29:45.985 :    CSSD:29: [     INFO] clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2021-08-28 23:29:45.985 :    CSSD:29: [     INFO] clssnmvFindInitialConfigs: No voting files found
2021-08-28 23:29:45.986 :    CSSD:29: [     INFO] (:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds
2021-08-28 23:29:46.877 :    CSSD:11: clsssc_CLSFAInit_CB: System not ready for CLSFA initialization

It turned out that voting files was removed by foreign ASM instance :)

2021-08-28T19:47:39.007588+03:00
SUCCESS: alter diskgroup datac6 mount
2021-08-28T19:47:41.312845+03:00
NOTE: Attempting voting file refresh on diskgroup DATAC6
NOTE: Refresh completed on diskgroup DATAC6. No voting file found.
NOTE: Voting file relocation is required in diskgroup DATAC6
NOTE: Attempting voting file relocation on diskgroup DATAC6
NOTE: voting file deletion (replicated) on grp 3 disk AAA
NOTE: voting file deletion on grp 3 disk AAA
NOTE: voting file deletion (replicated) on grp 3 disk BBB
NOTE: voting file deletion on grp 3 disk BBB
NOTE: voting file deletion (replicated) on grp 3 disk CCC
NOTE: voting file deletion on grp 3 disk CCC
NOTE: No voting file found on diskgroup DATAC6

To resolve this, I needed to create new voting files.

1. Stop CRS

# crsctl stop crs -f

If command failed or run too long without response, kill ocssd.bin and gpnpd.bin processes by hand. Repeat to confirm the stop

# crsctl stop crs -f
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Stop failed, or completed with errors.

Also check the process list too.

2. Start CRS in init mode :

# crsctl start crs -excl -nocrs

3. Create new voting files

$ crsctl query css votedisk
Located 0 voting disk(s).
$ crsctl replace votedisk +datac6
Successful addition of voting disk 93480b19b7e64f90bf2d5a70e2fdcbfd.
Successful addition of voting disk eca9bf5347fd4f67bf06a2e7c830e342.
Successful addition of voting disk 1f468b939fbe4f91bfc83375c532a6da.
Successfully replaced voting disk group with +datac6.
CRS-4266: Voting file(s) successfully replaced

4. Stop and CRS

# crsctl stop crs

# crsctl start crs -wait

After these steps the CRS stack worked in the normal mode.

Good Luck !

Thursday, August 19, 2021

Yum on Oracle Linux - work through proxy - how ?

Just modify /etc/sysconfig/rhn/up2date file. Set

enableProxy=1
httpProxy=http://ip_address_of_proxy:port

Use man up2date for details.

How to resize logical volume group (LVM) (example on Oracle Linux VM VirtualBox)

If possible, just add another physical disk into LVM configuration and enlarge logical volume using lvresize command. It's much better and lesser error pruning behavior. The following scratched steps can be used when there are no possibilities to have another physical disk or there is a decision to use existing disk.

0. Identify LV and underlying physical volume to extend (use pvdisplay command)

1. Shutdown VM

2. Increase size of hard disk (CLI or VirtualBox virtual media manager)

3. Start the VM (might be in single user mode)

4. Modify partition table using parted. In other words, remove and recreate new partition with desirable size.

Steps depend on what kind of partition (primary or extended you need to extend). It also depend on are there any partitions behind being resized one. You may drop this partition (which you need to resize) and all the other partitions behind or recreate expended partition and logical partitions completely.

5. Reread modified partition table - reboot or blockdev --rereadpt etc.

6. Extend underlying LVM physical volume (correspond to increased partition) using pvresize command, for example :

# pvresize /dev/sda2

7. Increase size of logical volume

# lvresize --verbose --extents +100%FREE /dev/vg_db01/lv_root

# lvdisplay

8. Increase filesystem size. In my case it was ext4 (possible to increase online) :

# resize2fs /dev/mapper/vg_db01-lv_root

That's it ! Good Luck !

Tuesday, August 10, 2021

How to extend ASM SAN disk group in Solaris zone

1. First of all, identify disk's paths to work with. Use sqlplus, kfod or OS utilities to get the information.

Example using sqlplus (GI home) :

% sqlplus / as sysdba

SYS@+ASM> col path for a50

SYS@+ASM> select path,os_mb,total_mb,free_mb from v$asm_disk ;

Example using kfod (GI home) :

% kfod disks=all

2. Next, increase the space on LUNs using storage system software (on HPE 3PAR, it was made by growvv command)

3. Next, on the global zone, use format utility to extend the space on OS side:

For example :

# format /dev/rdsk/c0t60002AC00000000002003CDE000095DDd0

selecting c0t60002AC00000000002003CDE000095DDd0 <3PARdata-VV-3226-2.49TB>
[disk formatted]
Note: capacity in disk label is less than the disk capacity.
Select <partition> <expand> to adjust the label capacity.

FORMAT MENU:
       disk       - select a disk
       type       - select (define) a disk type
       partition - select (define) a partition table
       current    - describe the current disk
       format     - format and analyze the disk
       repair     - repair a defective sector
       label      - write label to the disk
       analyze    - surface analysis
       defect     - defect list management
       backup     - search for backup labels
       verify     - read and display labels
       inquiry    - show disk ID
       volname    - set 8-character volume name
       !<cmd>     - execute <cmd>, then return
       quit
format> p

PARTITION MENU:
       0      - change `0' partition
       1      - change `1' partition
       2      - change `2' partition
       3      - change `3' partition
       4      - change `4' partition
       5      - change `5' partition
       6      - change `6' partition
       expand - expand label to use the maximum allowed space
       select - select a predefined table
       modify - modify a predefined partition table
       name   - name the current table
       print - display the current table
       label - write partition map and label to the disk
       !<cmd> - execute <cmd>, then return
       quit
partition> ex
The expanded capacity is added to the unallocated space.
partition> p

Current partition table (original):
Total disk sectors available: 5343526845 + 16384 (reserved sectors)

Part      Tag    Flag     First Sector          Size          Last Sector
0        usr    wm                64         2.00TB           4294950847
1 unassigned    wm                 0            0                0
2 unassigned    wm                 0            0                0
3 unassigned    wm                 0            0                0
4 unassigned    wm                 0            0                0
5 unassigned    wm                 0            0                0
6 unassigned    wm                 0            0                0
8   reserved    wm        5343526879         8.00MB           5343543262

partition> 0
Part      Tag    Flag     First Sector          Size          Last Sector
0        usr    wm                64         2.00TB           4294950847

Enter partition id tag[usr]:
Enter partition permission flags[wm]:
Enter new starting sector[64]:
Enter partition size[4294950784b, 4294950847e, 2097143mb, 2047gb, 1tb]: $
partition> p
Current partition table (unnamed):
Total disk sectors available: 5343526845 + 16384 (reserved sectors)

Part      Tag    Flag     First Sector          Size          Last Sector
0        usr    wm                64         2.49TB           5343526877
1 unassigned    wm                 0            0                0
2 unassigned    wm                 0            0                0
3 unassigned    wm                 0            0                0
4 unassigned    wm                 0            0                0
5 unassigned    wm                 0            0                0
6 unassigned    wm                 0            0                0
8   reserved    wm        5343526879         8.00MB           5343543262

partition> label
Ready to label disk, continue? y

partition> q

FORMAT MENU:
       disk       - select a disk
       type       - select (define) a disk type
       partition - select (define) a partition table
       current    - describe the current disk
       format     - format and analyze the disk
       repair     - repair a defective sector
       label      - write label to the disk
       analyze    - surface analysis
       defect     - defect list management
       backup     - search for backup labels
       verify     - read and display labels
       inquiry    - show disk ID
       volname    - set 8-character volume name
       !<cmd>     - execute <cmd>, then return
       quit
format> ver

Volume name = <        >
ascii name = <3PARdata-VV-3226-2.49TB>
bytes/sector    = 512
sectors = 5343543295
accessible sectors = 5343543262
Part      Tag    Flag     First Sector          Size          Last Sector
0        usr    wm                64         2.49TB           5343526877
1 unassigned    wm                 0            0                0
2 unassigned    wm                 0            0                0
3 unassigned    wm                 0            0                0
4 unassigned    wm                 0            0                0
5 unassigned    wm                 0            0                0
6 unassigned    wm                 0            0                0
8   reserved    wm        5343526879         8.00MB           5343543262

format> q

4. Repeat previous step for the rest of LUNs

5. Resize ASM diskgroup

SYS@+ASM> alter diskgroup disk_group_name resize all ;

You might encounter the error

ORA-15099, 00000, "disk '%s' is larger than maximum size of %s MBs"

In my case it was low compatibility.rdbms value (10.1) for the altered diskgroup.

Good Luck !

Thursday, July 1, 2021

Column width in the sqlplus output

The queries might include char typed columns in the output. The output of these columns can be extended by sqlplus when multi-byte character set was set in the environment (NLS_LANG variable). Of course choosing NLS_LANG value should not be determined by the cosmetic effects of the output only; there are more important things like character set of input files, operating system output handling etc. All that I wanted to say - just know that "detail".

There are examples :

1. NLS_LANG=.al32utf8

SQL> r
1* select 'status' from dual

'STATUS'
------------------
status

2. NLS_LANG was unset (then us7ascii character set is used) or set to single-byte character set

SQL> select 'status' from dual ;

'STATU
------
status

Good Luck !

Thursday, May 6, 2021

orabase returns "XPointer evaluation failed: no locset" ; oraenv returns the same and "-bash: [: too many arguments"

The situation specific to old and tight number of Oracle Products used Oracle Universal Installer for lifetime maintenance.

orabase binary executable is used by Oracle products to set ORACLE_BASE environmental variable in the shell session. Contemporary Oracle products have a file $ORACLE_HOME/install/orabasetab which is read by orabase executable to set ORACLE_BASE for current ORACLE_HOME. Some older orabase (32-bit) executables read the file $ORACLE_HOME/inventory/ContentsXML/oraclehomeproperties.xml in order to set ORACLE_BASE.

The issue can be a result of an accident when sometimes (and somehow) the Oracle installer (via installation of the patches or performing a deinstallation (even being operating on another ORACLE_HOME)) corrupts oraclehomeproperties.xml file (located in $ORACLE_HOME/inventory/ContentsXML) cutting out property ORACLE_BASE from it. The fix is to restore it from backup or modify manually.

The following example is an example of corrupted xml file :

<?xml version="1.0" encoding="UTF-8"?>


<ORACLEHOME_INFO>
   <GUID>113393302.#259383652</GUID>
   <HOME/>
   <ARU_PLATFORM_INFO>
       <ARU_ID>226</ARU_ID>
       <ARU_ID_DESCRIPTION>Linux x86-64</ARU_ID_DESCRIPTION>
   </ARU_PLATFORM_INFO>
</ORACLEHOME_INFO>

The following example is an example of restored xml file :

After modification you should relogin to the server because the contents of corrupted xml for the current session is mapped into the memory (by mmap syscall).

Good Luck !