Gawk by examples

File persons.txt used for demonstration purposes (name, birth year, height, weight):

Piotr   1989    184 70.2
Tomasz  1994    172 60.8
Paweł   2003    104 48.4
Marcin  1980    174 91.6
Michał  1990    168 74.9
Lucjan  2000    80  44.1
Paweł   2000    124 60.2
Rafał   1980    174 91.6
Michał  1998    139 89.1
Damian  1994    170 68.2

Prints persons.txt file's content line by line: gawk '{ print }' persons.txt

Prints persons.txt file's content line by line with fields separated by colon. All fields of each input line (separated by space) are assigned to variables $1, $2, $3, ...: gawk '{ print $1 ":" $2 ":" $3 ":" $4 }' persons.txt

Prints persons.txt file's content line by line with use of printf function. The function formats a string in a way specific to c language printf function: gawk '{ printf "%s %d %d %.2f\n", $1, $2, $3, $4 }' persons.txt

Prints persons.txt file's content line by line. All fields are formated as a strings of size: name = 10 characters, birth year = 5 characters, height = 5 characters, weight = 5 characters. All fields are right aligned in each column: gawk '{ printf "%10s %5s %5s %5s\n", $1, $2, $3, $4 }' persons.txt Below is the same but columns are left aligned: gawk '{ printf "%-10s %-5s %-5s %-5s\n", $1, $2, $3, $4 }' persons.txt

Prints only lines that match a specified regular expression: gawk '/^P.*$/ { print }' persons.txt or adequately (because the default action is print): gawk '/^P.*$/' persons.txt

Prints name and birth year fields for those rows that contain 8 digit in a birth year (rows that fulfill specified condition): gawk '$2 ~ /8/ {print $1, $2}' persons.txt

Prints only that lines in which birth year is numerically equal to 2003: gawk '$2 == 2003 {print}' persons.txt

Prints only that lines in which birth year is numerically greater than 1990 and smaller than 2015. Logical operators are used in the same way as in c language: gawk '$2 > 1990 && $2 < 2015 {print}' persons.txt

Prints only lines that occur after row matching to the first regular expression (including the row) until row matching to the second regular expression (including the row): gawk '/Paweł/, /Michał/ {print}' persons.txt If the first match will not be found then nothing is printed: gawk '/Paweł123/, /Michał/ {print}' persons.txt If the second match will not be found then all rows starting at the first matched row are printed: gawk '/Paweł/, /Michał123/ {print}' persons.txt If end of file is not reached after the both matches found then the search process is started again beginning at current position in the file.

Below there is an example of executing programme placed in a file:
cat programme:

BEGIN {
    print "============================"
    printf "%-10s %-5s %-6s %-6s\n", "Name", "Year", "Height", "Weight"
    print "============================"
}
{ printf "%-10s %-5s %-6s %-6s\n", $1, $2, $3, $4 }
END {
    print "============================"
}

gawk -f programme persons.txt

Instructions in BEGIN block are executed before data in input file is read in and processed: Instructions in END block are executed after data in input file is processed:

Function length without a parameter returns length of current row, the function with a parameter returns length of the param.
Variable $0 stores current row being processed.
VariableNR holds current row's number. Variable NR occured in the END block holds number of the last row in input data.
Variable NF holds number of fields in current row.

Prints in parenthesis before a colon the length of current row and after the colon content of current row: gawk '{ print "(" length "): " $0 }' persons.txt

Prints in parenthesis before a colon the length of first field in current row and after the colon content of the field: gawk '{ print "(" length($1) "): " $1 }' persons.txt

Prints before a colon current row's number and after the colon content of the row: gawk '{ printf "%2s: %s\n", NR, $0 }' persons.txt

Prints rows starting at 7th and ending at 9th: gawk 'NR == 7 , NR == 9' persons.txt

The programme below is an example of changing fields' content during processing a row. Name "Marcin" in input row is changed to "Paweł" and name "Paweł" in input row is changed to "Marcin". If the change took place then all multiple separators in current line will be replaced by one space character:
cat programme:

{
    tmp = $1
    if (tmp ~ /^Marcin$/) $1 = "Paweł"
    if (tmp ~ /^Paweł$/) $1 = "Marcin"
    print
}

gawk -f programme persons.txt

There is a possibility to write standalone script that can be runned from the command line:
cat programme:

#!/usr/bin/gawk -f
{
    tmp = $1
    if (tmp ~ /^Marcin$/) $1 = "Paweł"
    if (tmp ~ /^Paweł$/) $1 = "Marcin"
    print
}


chmod 755 programme

./programme persons.txt

To change a separator character for output data you should do it by changing OFS variable value. The separaor will be applied to all modified lines: BEGIN { OFS = ":" }

To change a separator character for input data you should do it by changing FS variable value: BEGIN { FS = ":" }

Redirecting output data to a file: gawk '{ print > "file.txt" }' persons.txt

Below there is an example demonstrating use of the association array to count repeatings of the names. By default each element of the array is initialized with 0: { names[$1]++ } END { for (name in names) print name, names[name] }

Splits current row to the array of fields - the splitting character is a colon ':'. If the colon is repeated more then once after itself then specified fields are initialized with empty string: split($0, fields_array, ":")

Reads in the next line of input data: getline linia