Wednesday, September 29, 2021

Solaris 11.4 and ASM and SAN - conceptual steps

1. Create necessary FC zoning infrastructure (create zones, plug transceivers etc.) if needed. Verify using your favorite method, for example :

# fcinfo hba-port

# luxadm -e dump_map /dev/cfg/c<number_of_attachment_point>

# cfgadm -al -o show_FCP_dev

2. Create virtual volumes and provide vluns (virtual luns) to the Solaris host. 

The volumes must be of equal size for one disk group. 

For data disk group I would prefer thin provisioning volumes, for fast recovery area and for redo - full provisioning volumes, but it is up to you. 

The amount of volumes per disk group depends on the many factors - number of active controllers, number of CPUs inside controllers, number of paths etc.; I would recommend to follow the best practices of the storage vendor at least.

3. On Solaris 11.4, the multipath I/O software is enabled by stmsboot -e command. It requires reboot when multipath support has not being enabled. If multipath was enabled before, to refresh the number of disk devices do the following :

# devfsadm -c disk

Verify that disks were created in the system using 

# echo | format

Collate information using virtual volume WWN on OS and on storage area. 

You may have checked the file /etc/driver/drv/fp.conf for proper multipath functionality working as well.

4. Create label of newly created disks. If disk size is less then 2TB then Solaris creates SMI Label by default. I would like to suggest to use EFI label (gpt analog in Linux), specifically when you will enlarge diskgroup and the size of each underlying disk will be more then 2TB. To do so, use -e option for format :

# format -e

and choose appropriate label type :

format> label
[0] SMI Label
[1] EFI Label
Specify Label type[1]:

The new disk device (name ends with d0) will be appear in dsk/rdsk directory

5. Configure Solaris zones (if used and needed) to see new devices. For example, to see all the devices (SAN disks) inside the zone, configure the zone as creation of device with matching :

# zonecfg -z $(hostname)-cde info
...
brand: solaris
autoboot: true
...
device:
       match: /dev/rdsk/*
#

6. Make permission on new devices to grid home user and asmadmin group.

7. Create/alter diskgroup using multipath generated paths of disks like '/dev/rdsk/....s0'

That's it :)


Wednesday, September 22, 2021

RMAN hangs with SQL*Net Break/reset To Client

Once I tried to recover the standby database over the network using RMAN, the operation started and lasted infinitely in the first file :

RMAN> recover database noredo from service "connection_name" ;

Starting recover at 22-SEP-21
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=10 device type=DISK
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: using network backup set from service dg-bris122-m8f
destination for restore of datafile 00001: +DATA_RCF_CDE/DB_SA_RCF/DATAFILE/system.265.1049997735

Quick analyze shown than rman session on the remote database was INACTIVE with even "SQL*Net Break/reset To Client". The Database Reference says about this event - "The server sends a break or reset message to the client. The session running on the server waits for a reply from the client.". It worth to mention that operating system was Oracle Solaris 11.4, and the both databases situated inside dedicated database zones with dedicated (non-shared) network interfaces, separated by the firewall though. It looked like the server (remote database) send to the client (rman) break message (out-of-bands break or OOB) and the client host didn't received it because of the firewall or the client host hadn't processed it (or unable to).

So I decided to disable OOB in the sqlnet.ora on the rman side via setting the following in sqlnet.ora.

disable_oob = on

It worked (fortunately or not) :)


Wednesday, September 8, 2021

What does awk script do ?

Sometimes it's difficult to understand what the awk program has done. For better understanding, it's useful to debug the program (-D option). But there is another brief way to produce the outline of what program did - to produce execution trace of the program with help of '--profile' option. Let's consider an example.

Suppose that we have some input file which actually can contain anything. In my case it was the file contained disassembled function. The fragment of input file is the following : 

  0x000000000cd2e32a <+442>:   test   %eax,%eax
  0x000000000cd2e32c <+444>:   je     0xcd2e33a <ksfd_io+458>
  0x000000000cd2e32e <+446>:   callq  0xcdc40f0 <sltrgftime64>
  0x000000000cd2e333 <+451>:   mov    %rax,-0x98(%rbp)
  0x000000000cd2e33a <+458>:   mov    -0x60(%rbp),%rax
  0x000000000cd2e33e <+462>:   mov    0x88(%rax),%r12d

I would like to display all the callq's between two calls kslwtbctx and kgecrs.

The short onliner can be like this :

gawk '!/callq/{next}/kslwtbctx/,/kgecrs/' input

The result is looking like this :

 > awk '!/callq/{next}/kslwtbctx/,/kgecrs/' input   
  0x000000000cd2e3ac <+572>:   callq  0xc9a3e00 <kslwtbctx>
  0x000000000cd2e3c7 <+599>:   callq  0xce573b0 <skgfrgsz>
  0x000000000cd2e3db <+619>:   callq  0xce418b0 <kghstack_alloc>
  0x000000000cd2e449 <+729>:   callq  0xcd32c80 <ksfd_osdrqfil>
  0x000000000cd2e4a2 <+818>:   callq  0xcd334f0 <ksfd_skgfqio>
  0x000000000cd2e4b4 <+836>:   callq  0xce42580 <kghstack_free>
  0x000000000cd2e4ce <+862>:   callq  0xce4e090 <kgecrs>

What does this short onliner do ? To clarify a behavior, use the profiler :

gawk --profile '!/callq/{next}/kslwtbctx/,/kgecrs/' input

The file awkprof.out is generated (after the execution of awk program) and contains the following :

> cat awkprof.out
       # gawk profile, created Wed Sep  8 16:47:20 2021

       # Rule(s)

 4085  ! /callq/ { # 3879
 3879          next
       }

  206  /kslwtbctx/, /kgecrs/ { # 7
    7          print $0
       }

Here we can see that program is consisted in two steps. The first one is checking the every input line on the regexp pattern /callq/. If it does not contain callq part of the line, the next line from the input is read.

The next pattern block is a range pattern which filters all the lines between line contained kslwtbctx and kgecrs, including first and last matched lines. But the output contains only callq instructions because of the first block checking /callq/ pattern before. Every input line is checked by the first block and then by second anyway.

Hope if was useful ! 

Good Luck !


Wednesday, September 1, 2021

RMAN output like in sqlplus (command, then its output)

There are several ways to organize output of rman script. 

I prefer 'set echo on' way, because it affords to see the output of each command consequently, although the output of block { } goes only after the entire block, and output line from the previous command can potentially interleave with the output of next command.

Let suppose we've got the following script to test :

set echo on

connect target 'c##ddi/password@orcl2 as sysbackup'

select user from dual ;

exit

You might notice 'set echo on' here, it does the main job to obtain easy to read output. According the documentation it can be useful when the opportunity exists to manipulate standard input and standard output (inside of Unix like operating systems, for example).

You can use several ways of getting the stuff to work (it is not a full list of possibilities) :

$ cat conn2.rman | rman > conn2.rman.out

$ rman @ $(pwd)/conn2.rman  > conn2.rman.out

$ rman < conn2.rman > conn2.rman.out

$ rman @ $(pwd)/conn2.rman | tee conn2.rman.out

The output with 'set echo on' but without 'tee' looks like this:

Recovery Manager: Release 12.1.0.2.0 - Production on Wed Sep 1 17:25:54 2021

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

RMAN> ;
echo set on
connect target *

connected to target database: ORCL2 (DBID=1043166856)

RMAN>  

RMAN> select user from dual ;
using target database control file instead of recovery catalog
USER                           
------------------------------
SYSBACKUP                      


RMAN>  

RMAN> exit

Without 'set echo on' the output looks like 'list of all the commands   --> output of all the commands' (with help of 'tee'):

Recovery Manager: Release 12.1.0.2.0 - Production on Wed Sep 1 23:08:47 2021

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

RMAN> set echo on
2>  
3> connect target *
4>  
5> select user from dual ;
6>  
7> exit
echo set on

connected to target database: ORCL2 (DBID=1043166856)

using target database control file instead of recovery catalog
USER                           
------------------------------
SYSBACKUP                      

Recovery Manager complete.

Another output without 'set echo on' and without 'tee' looks like this:

Recovery Manager: Release 12.1.0.2.0 - Production on Wed Sep 1 17:26:50 2021

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

RMAN>  
connected to target database: ORCL2 (DBID=1043166856)

RMAN>  
RMAN>  
using target database control file instead of recovery catalog
USER                           
------------------------------
SYSBACKUP                      

RMAN>  
RMAN>  

Recovery Manager complete.

P.S. Sometimes we can see the output from connect command trapped into output of select statement.

Good Luck !



rman ORA-01031 RMAN-00554 from command line after connecting AS SYSBACKUP

The way of connecting to database by rman from command line and from the script is a little bit different (the difference is in connection string).

For example, for operating system command line use '"..."' form :

$ rman  target '"c##ddi@orcl2 as sysbackup"' 

For the script use '...' form

RMAN> connect target 'c##ddi@orcl2 as sysbackup'

P.S. For channels configuration use '"..."' form.