The AWK Programming Language: An Introduction

A hands-on tutorial for learning AWK on Linux

Image for post
Image for post

Introduction

A Brief History and Installation

$ sudo pacman -S gawk       # Arch
$ sudo apt install gawk # Debian and Ubuntu
$ sudo dnf install gawk # Fedora and CentOS
# Output gawk location
$ which gawk
-| /usr/bin/gawk
# Output awk symbolic link location
$ which awk
-| /usr/bin/awk
# Confirm awk is a symbolic link
$ ls -l $(which awk)
-| lrwxrwxrwx 1 root root 4 Apr 15 2020 /usr/bin/awk -> gawk
# Output version (--version)
$ awk -V
-| GNU Awk 5.1.0 ...
# Output license (--copyright)
$ awk -C
-| Copyright (C) 1989, 1991-2020 Free Software Foundation. ...
# Output usage (--help)
$ awk -h
-| Usage: gawk [POSIX or GNU style options] -f progfile ...

Running AWK Programs

# Providing input files
awk 'program' input-file1 input-file2
# Piping the output of a command
command | awk 'program'
# Command line
awk 'program' input-file1 input-file2 ...
# Program file
awk -f program-file input-file1 input-file2 ...

Running Throw-Away Programs

# Output total number of fonts in /usr/share/fonts
$ fc-list | awk 'END { print "Total fonts:", NR }'
# Output all (hidden) files starting with ".bash"
$ ls -la | awk '$NF ~ /^.bash/ { print $NF }'
# Output all filenames that contain more than 10 characters
# (minus file extension)

$ ls | awk '{ i=index($0,"."); i>0?s=substr($0,1,i-1):s=$0;
if(length(s)>10) print }'
# Create a text file listing system fonts
$ fc-list > font-list
# Provide the text file and output total number of fonts
$ awk 'END { print "Total fonts:", NR}' font-list
$ ls -l | awk 'BEGIN { print "About to count files..." } \
{ ++count } \
END { print "Total files: ", count }'

Running AWK Scripts

$ git clone https://github.com/danebulat/awk-samples.git
# Enter samples directory
$ cd awk-samples/samples
# Add executable permissions to the scripts
$ chmod +x *.awk
# Invoke script
$ ls -la | ./filemod.awk
Image for post
Image for post
Running the samples/filemod.awk program.
#! /usr/bin/awk -f
$ ls -la | awk -f filemod.awk
$ ls -la | ./filemod.awk -v c=l    # lowercase conversion
$ ls -la | ./filemod.awk -v c=u # uppercase conversion

Patterns and Actions

# AWK program
pattern { actions } # Rule
pattern { actions } # Rule

Patterns

# Regular expression examples
$ ls -l | awk '/(\.py)&/'
$ ls -l | awk '/README/'
# Conditional expression examples
$ ls -l | awk '$NF == "README.md"'
$ ls -la | awk '$NF ~ /^\./ { printf $NF }'

Actions

$ ls -l | awk '{ print "Filename:", $NF; print "\tOwner:", \
($3 == "" ? "--" : $3) }'
The samples/statements.awk script.
$ ls -l | ./statements.awk
{ print $0 }

BEGIN and END Patterns

The samples/begin.awk script.
$ ls -l | ./begin.awk

Input Records and Fields

Records

# Output input records
$ ls -l | awk '{ print "Current record:" $0 }'
# Same behavior with text files
$ ls -l > file-list
$ awk '{ print "Current record:", $0 }' file-list
# Treat each directory as an input record
$ pwd | awk 'BEGIN { RS="/" } { print "-->", $0 }'
The samples/records.awk script.
$ ./records.awk /etc/passwd /etc/group
-| Outputs first 10 records in /etc/passwd and /etc/group

Fields

$ ls -l | awk '{ print "Number of fields in record:", NF }'
-| Number of fields in record: 2
Number of fields in record: 9
...
# Output filenames and modification dates
$ ls -l | awk '{ if (NR>1) print $NF, "\tmodified on", $6, $7 }'
# Output system users and their home directory
$ awk 'BEGIN { FS=":" } { printf "%-30s%s\n", $1, $6 }' /etc/passwd
$ awk -F: '{ printf "%-30s%s\n", $1, $6 }' /etc/passwd

Record and Field Splitting Summary

Possible values for the RS variable.
Possible values for the FS variable.

Printing Output

The print Function

# Items separated with commas
print item1, item2, ...
print(item1, item2, ...)
# Items separated without commas
print item1 item2 item3
print(item1 item2 item3)
# With commas
$ awk 'BEGIN { print "Hello", "World" }'
-| Hello World
# Without commas
$ awk 'BEGIN { print "Hello" "World" }'
-| HelloWorld
$ awk 'BEGIN { print "Hello""World" }'
# Setting the output field separator
$ awk 'BEGIN { OFS=" ---> "; print "Hello", "AWK", "World"}'
-| Hello ---> AWK ---> World
# The default ORS is a newline character
$ awk 'BEGIN { print "Hello"; print "World" }'
|- Hello
World
# Setting ORS to a new string
$ awk 'BEGIN {ORS=" [END OF RECORD]\n";print "Hello";print "World"}'
|- Hello [END OF RECORD]
World [END OF RECORD]
# Default OFMT is "%.6g"
$ awk 'BEGIN { print 3.141592653, 6.283185306, 12.566370612 }'
-| 3.14159 6.28319 12.5664
# Setting OFMT to a new format
$ awk 'BEGIN { OFMT="%6.3f"; print 3.141592653, 6.283185306, 12.566370612 }'
-| 3.142 6.283 12.566

The printf Function

printf format, item1, item2, ...
printf(format, item1, item2, ...)
The samples/print.awk script.
Image for post
Image for post
Running the samples/print.awk program.
Format specifiers available to the printf function.
Modifiers to tweak printf format specifiers.

Built-In Functions

Built-in functions usd in samples/filemod.awk.
command = sprintf("mv %s %s\n", filename, target)
system(command)

In Conclusion

MSc. Programmer and fan of open source software.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store