File persons.txt used for demonstration purposes (name, birth year, height, weight):
Piotr 1989 184 70.2
Tomasz 1994 172 60.8
Paweł 2003 104 48.4
Marcin 1980 174 91.6
Michał 1990 168 74.9
Lucjan 2000 80 44.1
Paweł 2000 124 60.2
Rafał 1980 174 91.6
Michał 1998 139 89.1
Damian 1994 170 68.2
Prints persons.txt file's content line by line:
gawk '{ print }' persons.txt
Prints persons.txt file's content line by line with fields separated by colon. All fields of each input line
(separated by space) are assigned to variables $1, $2, $3, ...:
gawk '{ print $1 ":" $2 ":" $3 ":" $4 }' persons.txt
Prints persons.txt file's content line by line with use of printf function. The function
formats a string in a way specific to c language printf function:
gawk '{ printf "%s %d %d %.2f\n", $1, $2, $3, $4 }' persons.txt
Prints persons.txt file's content line by line. All fields are formated as a strings of size:
name = 10 characters, birth year = 5 characters, height = 5 characters, weight = 5 characters. All fields are
right aligned in each column:
gawk '{ printf "%10s %5s %5s %5s\n", $1, $2, $3, $4 }' persons.txt
Below is the same but columns are left aligned:
gawk '{ printf "%-10s %-5s %-5s %-5s\n", $1, $2, $3, $4 }' persons.txt
Prints only lines that match a specified regular expression:
gawk '/^P.*$/ { print }' persons.txt
or adequately (because the default action is print):
gawk '/^P.*$/' persons.txt
Prints name and birth year fields for those rows that contain 8 digit in a birth year (rows that fulfill specified
condition):
gawk '$2 ~ /8/ {print $1, $2}' persons.txt
Prints only that lines in which birth year is numerically equal to 2003:
gawk '$2 == 2003 {print}' persons.txt
Prints only that lines in which birth year is numerically greater than 1990 and smaller than 2015.
Logical operators are used in the same way as in c language:
gawk '$2 > 1990 && $2 < 2015 {print}' persons.txt
Prints only lines that occur after row matching to the first regular expression (including the row) until row
matching to the second regular expression (including the row):
gawk '/Paweł/, /Michał/ {print}' persons.txt
If the first match will not be found then nothing is printed:
gawk '/Paweł123/, /Michał/ {print}' persons.txt
If the second match will not be found then all rows starting at the first matched row are printed:
gawk '/Paweł/, /Michał123/ {print}' persons.txt
If end of file is not reached after the both matches found then the search process is started again beginning at
current position in the file.
Below there is an example of executing programme placed in a file:
cat programme:
BEGIN {
print "============================"
printf "%-10s %-5s %-6s %-6s\n", "Name", "Year", "Height", "Weight"
print "============================"
}
{ printf "%-10s %-5s %-6s %-6s\n", $1, $2, $3, $4 }
END {
print "============================"
}
gawk -f programme persons.txt
Instructions in BEGIN block are executed before data in input file is read in and processed: Instructions in END block are executed after data in input file is processed:
Function length without a parameter returns length of current row, the function with a parameter
returns length of the param.
Variable $0 stores current row being processed.
VariableNR holds current row's number. Variable NR occured in the END
block holds number of the last row in input data.
Variable NF holds number of fields in current row.
Prints in parenthesis before a colon the length of current row and after the colon content of current row:
gawk '{ print "(" length "): " $0 }' persons.txt
Prints in parenthesis before a colon the length of first field in current row and after the colon content of the field:
gawk '{ print "(" length($1) "): " $1 }' persons.txt
Prints before a colon current row's number and after the colon content of the row:
gawk '{ printf "%2s: %s\n", NR, $0 }' persons.txt
Prints rows starting at 7th and ending at 9th:
gawk 'NR == 7 , NR == 9' persons.txt
The programme below is an example of changing fields' content during processing a row. Name "Marcin" in input row
is changed to "Paweł" and name "Paweł" in input row is changed to "Marcin". If the change took place then all
multiple separators in current line will be replaced by one space character:
cat programme:
{
tmp = $1
if (tmp ~ /^Marcin$/) $1 = "Paweł"
if (tmp ~ /^Paweł$/) $1 = "Marcin"
print
}
gawk -f programme persons.txt
There is a possibility to write standalone script that can be runned from the command line:
cat programme:
#!/usr/bin/gawk -f
{
tmp = $1
if (tmp ~ /^Marcin$/) $1 = "Paweł"
if (tmp ~ /^Paweł$/) $1 = "Marcin"
print
}
chmod 755 programme
./programme persons.txt
To change a separator character for output data you should do it by changing OFS variable value.
The separaor will be applied to all modified lines:
BEGIN { OFS = ":" }
To change a separator character for input data you should do it by changing FS variable value:
BEGIN { FS = ":" }
Redirecting output data to a file:
gawk '{ print > "file.txt" }' persons.txt
Below there is an example demonstrating use of the association array to count repeatings of the names. By default
each element of the array is initialized with 0:
{ names[$1]++ }
END { for (name in names) print name, names[name] }
Splits current row to the array of fields - the splitting character is a colon ':'. If the colon is repeated more
then once after itself then specified fields are initialized with empty string:
split($0, fields_array, ":")
Reads in the next line of input data:
getline linia