Wednesday, September 8, 2021

What does awk script do ?

Sometimes it's difficult to understand what the awk program has done. For better understanding, it's useful to debug the program (-D option). But there is another brief way to produce the outline of what program did - to produce execution trace of the program with help of '--profile' option. Let's consider an example.

Suppose that we have some input file which actually can contain anything. In my case it was the file contained disassembled function. The fragment of input file is the following : 

  0x000000000cd2e32a <+442>:   test   %eax,%eax
  0x000000000cd2e32c <+444>:   je     0xcd2e33a <ksfd_io+458>
  0x000000000cd2e32e <+446>:   callq  0xcdc40f0 <sltrgftime64>
  0x000000000cd2e333 <+451>:   mov    %rax,-0x98(%rbp)
  0x000000000cd2e33a <+458>:   mov    -0x60(%rbp),%rax
  0x000000000cd2e33e <+462>:   mov    0x88(%rax),%r12d

I would like to display all the callq's between two calls kslwtbctx and kgecrs.

The short onliner can be like this :

gawk '!/callq/{next}/kslwtbctx/,/kgecrs/' input

The result is looking like this :

 > awk '!/callq/{next}/kslwtbctx/,/kgecrs/' input   
  0x000000000cd2e3ac <+572>:   callq  0xc9a3e00 <kslwtbctx>
  0x000000000cd2e3c7 <+599>:   callq  0xce573b0 <skgfrgsz>
  0x000000000cd2e3db <+619>:   callq  0xce418b0 <kghstack_alloc>
  0x000000000cd2e449 <+729>:   callq  0xcd32c80 <ksfd_osdrqfil>
  0x000000000cd2e4a2 <+818>:   callq  0xcd334f0 <ksfd_skgfqio>
  0x000000000cd2e4b4 <+836>:   callq  0xce42580 <kghstack_free>
  0x000000000cd2e4ce <+862>:   callq  0xce4e090 <kgecrs>

What does this short onliner do ? To clarify a behavior, use the profiler :

gawk --profile '!/callq/{next}/kslwtbctx/,/kgecrs/' input

The file awkprof.out is generated (after the execution of awk program) and contains the following :

> cat awkprof.out
       # gawk profile, created Wed Sep  8 16:47:20 2021

       # Rule(s)

 4085  ! /callq/ { # 3879
 3879          next
       }

  206  /kslwtbctx/, /kgecrs/ { # 7
    7          print $0
       }

Here we can see that program is consisted in two steps. The first one is checking the every input line on the regexp pattern /callq/. If it does not contain callq part of the line, the next line from the input is read.

The next pattern block is a range pattern which filters all the lines between line contained kslwtbctx and kgecrs, including first and last matched lines. But the output contains only callq instructions because of the first block checking /callq/ pattern before. Every input line is checked by the first block and then by second anyway.

Hope if was useful ! 

Good Luck !


No comments:

Post a Comment