I/O Redirection and Pipes

In typical Unix installations, commands are entered at the keyboard and output resulting from these commands is displayed on the computer screen. Thus, input (by default) comes from the terminal and the resulting output (stream) is displayed on (or directed to) the monitor. Commands typically get their input from a source referred to as standard input (stdin) and typically display their output to a destination referred to as standard output (stdout) as pictured below:

As depicted in the diagram above, input flows (by default) as a stream of bytes from standard input along a channel, is then manipulated (or generated) by the command, and command output is then directed to the standard output. The ls command can then be described as follows; there is really no input (other than the command itself) and the ls command produces output which flows to the destination of stdout (the terminal screen), as below:

The notations of standard input and standard output are actually implemented in Unix as files (as are most things) and referenced by integer file descriptors (or channel descriptors). The file descriptor for standard input is 0 (zero) and the file descriptor for standard output is 1. These are not seen in ordinary use since these are the default values.

Input/Output Redirection

Unix provides the capability to change where standard input comes from, or where output goes using a concept called Input/Output (I/O) redirection. I/O redirection is accomplished using a redirection operator which allows the user to specify the input or output data be redirected to (or from) a file. Note that redirection always results in the data stream going to or coming from a file (the terminal is also considered a file).

The simplest case to demonstrate this is basic output redirection. The output redirection operator is the > (greater than) symbol, and the general syntax looks as follows:

	command > output_file_spec

Spaces around the redirection operator are not mandatory, but do add readability to the command. Thus in our ls example from above, we can observe the following use of output redirection:

	$ ls > my_files [Enter]
	$

Notice there is no output appearing after the command, only the return of the prompt. Why is this, you ask? This is because all output from this command was redirected to the file my_files. Observe in the following diagram, no data goes to the terminal screen, but to the file instead.

Examining the file as follows results in the contents of the my_files being displayed:

	$ cat my_files [Enter]
	foo
	bar
	fred
	barney
	dino
	$

In this example, if the file my_files does not exist, the redirection operator causes its creation, and if it does exist, the contents are overwritten. Consider the example below:

	$ echo "Hello World!" > my_files [Enter]
	$ cat my_files [Enter]
	Hello World!

Notice here that the previous contents of the my_files file are gone, and are replaced with the string "Hello World!" Note also that when using redirection, the output file is created first, then the command left of the redirection operator is executed. Observe the following:

	$ cat my_files [Enter]
	Hello World!
	$ cat my_files > my_files [Enter]
	$ cat my_files [Enter]
	$

Often we wish to add data to an existing file, so the shell provides us with the capability to append output to files. The append operator is the >>. Thus we can do the following:

	$ ls > my_files [Enter]
	$ echo "Hello World!" >> my_files [Enter]
	$ cat my_files [Enter]
	foo
	bar
	fred
	barney
	dino
	Hello World!

The first output redirection creates the file if it does not exist, or overwrites its contents if it does, and the second redirection appends the string "Hello World!" to the end of the file. When using the append redirection operator, if the file does not exist, >> will cause its creation and append the output (to the empty file).

The ability also exists to redirect the standard input using the input redirection operator, the < (less than) symbol. Note the point of the operator implies the direction. The general syntax of input redirection looks as follows:

	command < input_file_spec

Looking in more detail at this, we will use the wc (word count) command. The wc command counts the number of lines, words and bytes in a file. Thus if we do the following using the file created above, we see:

	$ wc my_files [Enter]
	       6       7      39   my_files

where the output indicates 6 lines, 7 words and 39 bytes, followed by the name of the file wc opened.

We can also use wc in conjunction with input redirection as follows:

	$ wc < my_files [Enter]
	       6       7      39

Note here that the numeric values are as in the example above, but with input redirection, the file name is not listed. This is because the wc command does not know the name of the file, only that it received a stream of bytes to count.

Someone will certainly ask if input redirection and output redirection can be combined, and the answer is most definitely yes. They can be combined as follows:

	$ wc < my_files > wc_output [Enter]
	$

There is no output sent to the terminal screen since all output was sent to the file wc_output. If we then looked at the contents of wc_output, it would contain the same data as above.

To this point, we have discussed the standard input stream (descriptor 0) and the standard output stream (descriptor 1). There is another output stream called standard error (stderr) which has file descriptor 2. Typically when programs return errors, they return these using the standard error channel. Both stdout and stderr direct output to the terminal by default, so distinguishing between the two may be difficult. However each of these output channels can be redirected independently. Refer to the diagram below:

The standard error redirection operator is similar to the stdout redirection operator and is the 2> (two followed by the greater than, with no spaces) symbol, and the general syntax looks as follows:

	command 2> output_file_spec

Thus to show an example, we observe the following:

	$ ls foo bar 2> error_file [Enter]
	foo
	
	$ cat error_file [Enter]
	ls: bar: No such file or directory

Note here that only the standard output appears once the standard error stream is redirected into the file named error_file, and when we display the contents of error_file, it contains what was previously displayed on the termimal. To show another example:

	$ ls foo bar > foo_file 2> error_file [Enter]
	$
	
	$ cat foo_file [Enter]
	foo

	$ cat error_file [Enter]
	ls: bar: No such file or directory

In this case both stdout and stderr were redirected to file, thus no output was sent to the terminal. The contents of each output file was what was previously displayed on the screen.

Note there are numerous ways to combine input, output and error redirection.

Another relevant topic that merits discussion here is the special file named /dev/null (sometimes referred to as the "bit bucket"). This virtual device discards all data written to it, and returns an End of File (EOF) to any process that reads from it. I informally describe this file as a "garbage can/recycle bin" like thing, except there's no bottom to it. This implies that it can never fill up, and nothing sent to it can ever be retrieved. This file is used in place of an output redirection file specification, when the redirected stream is not desired. For example, if you never care about viewing the standard output, only the standard error channel, you can do the following:

	$ ls foo bar > /dev/null [Enter]
	ls: bar: No such file or directory

In this case, successful command output will be discarded. The /dev/null file is typically used as an empty destination in such cases where there is a large volume of extraneous output, or cases where errors are handled internally so error messages are not warranted.

One final miscellaneous item is the technique of combining the two output streams into a single file. This is typically done with the 2>&1 command, as follows:

	$ command > /dev/null 2>&1 [Enter]
	$

Here the leftmost redirection operator (>) sends stdout to /dev/null and the 2>&1 indicates that channel 2 should be redirected to the same location as channel 1, thus no output is returned to the terminal.

Redirection Summary

Redirection Operator	Resulting Operation
command > file	stdout written to file, overwriting if file exists
command >> file	stdout written to file, appending if file exists
command < file	input read from file
command 2> file	stderr written to file, overwriting if file exsits
command 2>> file	stderr written to file, appending if file exists
command > file 2>&1	stdout written to file, stderr written to same file descriptor

Pipe Operator

A concept closely related to I/O redirection is the concept of piping and the pipe operator. The pipe operator is the | character (typically located above the enter key). This operator serves to join the standard output stream from one process to the standard input stream of another process in the following manner:

We can look at an example of pipes using the who and the wc commands. Recall that the who command will list each user logged into a machine, one per line as follows:

	$ who [Enter]
	mthomas		pts/2    Oct  1   13:07  
	fflintstone	pts/12   Oct  1   12:07  
	wflintstone	pts/4    Oct  1   13:37  
	brubble		pts/6    Oct  1   13:03

Also recall that the wc command counts characters, words and lines. Thus if we connect the standard output from the who command to the standard input of the wc (using the -l (ell) option), we can count the number of users on the system:

	$ who | wc -l [Enter]
	4

In the first part of this example, each of the four lines from the who command will be "piped" into the wc command, where the -l (ell) option will enable the wc command to count the number of lines.

While this example only uses two commands connected through a single pipe operator, many commands can be connected via multiple pipe operators.

Filters Are Our Friends

Closely related to pipes and the pipe operator is the topic of filters. Filters are commands that alter data passed through them, typically via pipes. Some filters can be used on their own (without pipes), but the true power to manipulate streams of data to the desired output comes from the combination of pipes and filters. Summarized below are some of the more useful Unix filters.

head/tail

Two straightforward commands which are often used as filters are the head and tail commands. When used with file specifications, these two commands display the first or last ten lines (by default) of a file, as follows:

	$ head /etc/passwd [Enter]
	root:x:0:0:root:/root:/bin/bash
	bin:x:1:1:bin:/bin:/sbin/nologin
	daemon:x:2:2:daemon:/sbin:/sbin/nologin
	adm:x:3:4:adm:/var/adm:/sbin/nologin
	lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
	sync:x:5:0:sync:/sbin:/bin/sync
	shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
	halt:x:7:0:halt:/sbin:/sbin/halt
	mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
	news:x:9:13:news:/var/spool/news:

In the above example, we simply see the first ten lines of the /etc/passwd file. However, if we wanted to see a listing of the ten oldest files in a directory, we could do the following:

	$ ls -tl | tail [Enter]
	-rw-r--r--    1 root     root          315 Jun 24  2001 odbcinst.ini
	-rw-r--r--    1 root     root         1913 Jun 24  2001 mtools.conf
	-rw-------    1 root     root          114 Jun 13  2001 securetty
	-rw-r--r--    1 root     root         1229 May 21  2001 bashrc
	-rw-r--r--    1 root     root           17 Jul 23  2000 host.conf
	drwxr-xr-x    2 root     root         4096 May 15  2000 opt
	-rw-r--r--    1 root     root            0 Jan 12  2000 exports
	-rw-r--r--    1 root     root          161 Jan 12  2000 hosts.allow
	-rw-r--r--    1 root     root          347 Jan 12  2000 hosts.deny
	-rw-r--r--    1 root     root            0 Jan 12  2000 motd

In this example, the ls command is used with the t and l (ell) options; the t option sorts by modification time and the l (ell) option results in a long listing format. This output is then piped through the tail filter, which only displays the last ten lines, that is the ten oldest files.

cut

The cut command provides the capability to vertically slice through each line of a file based upon character or field positions. When used with the -c (to specify character) option as follows:

	cut -cstart_pos-end_pos < input_file

the cut command extracts (and keeps) characters in positions start_pos through end_pos (inclusive), discarding the rest of the line. The start_pos and end_pos are integer values ranging between 1 and the length of the line. For example, if we wish to select only the username of each user currently logged into our system, we could do the following:

	$ who | cut -c1-12 [Enter]
	mthomas
	fflintstone
	wflintstone
	brubble

This example pipes the output from the who command into the cut command, where the characters one through twelve are cut (and directed to stdout by default) while all other characters on each line are discarded.

If one wishes to cut from a starting character position to the end of the line, the end position is omitted as follows:

	cut -cstart_pos- < input_file

Users can also cut based upon "fields" of data by using the -f and (perhaps) the -d options. Refer to man pages for additional details.

tr

The translate command provides the ability to translate characters coming from the standard input and directed to the standard output. General syntax for this command is:

	tr set1 set2 < stdin

where each individual character in set1 is translated to their matching positional character in set2. A common usage of the translate filter is translating a string of characters to upper (or lower) case. Examine the following example:

	$ who | cut -c1-12 | tr '[a-z]' '[A-Z]' [Enter]
	MTHOMAS
	FFLINTSTONE
	WFLINTSTONE
	BRUBBLE

In this extension of an earlier example, the who command is piped into cut, where the first 12 characters are cut from each line. These twelve characters are then piped into the translate command where each lower case character is translated to their matching upper case counterpart. Unless redirection occurs, output is written by default to standard output.

sort

The sort command behaves exactly as one might expect, that is, it sorts data directed to it. Thus we can modify our example from above as follows:

	$ who | cut -c1-12 | tr '[a-z]' '[A-Z]' | sort [Enter]
	BRUBBLE
	FFLINTSTONE
	MTHOMAS
	WFLINTSTONE

The sort command has options to sort in reverse order, ignore case when sorting, sort based upon multiple keys and a plethora of other options. A related filter, useful when combined with sort is the uniq filter.

sed

Another filter essential to manipulating strings is the stream editor program, sed. Perhaps the most common use of sed is to substitute one string (i.e. regular expression) with another string. Generic syntax for string/pattern substitution is:

	sed 's/original_string/new_string/' < input_file > output_file

Note when using sed as above, the input redirection operator (<) may or may not be used, behavior in either case will be the same. Additionally, if no output redirection is specified, the output is directed by default to stdout.

A simple example of this is to substitute every occurance of the string UNIX with the string Unix as follows (refer to preface for an explanation of why):

	$  sed 's/UNIX/Unix/' input_file > output_file [Enter]

The s directive to sed implies to substitute an occurance of the first string (UNIX) with the second string (Unix); receiving input from the file input_file and directing output to the file output_file. Note that sed never makes changes to the input file. If the input file is to be permanently changed, one should: 1) save the output from sed to a temporary file, 2) verify the temporary file has been modified as desired, and 3) replace the original file with the temporary file (using mv), demonstrated as follows:

	$  sed 's/UNIX/Unix/' < input_file > temp_file [Enter]
	$  cat temp_file [Enter]			# verify changes are correct
	$  mv temp_file input_file [Enter]

In general, the sed command is not intended for the novice. In the above example, results might not be as desired. My use of the phrase an occurance above should have more accurately read the first occurance on a line. To substitute every occurance on a line, the sed global option (g) should be specified as follows:

	$  sed 's/UNIX/Unix/g' input_file > output_file [Enter]

The description of sed here is not intended to be inclusive or in depth, and is mentioned for further reference.

grep

And yet another very powerful and useful filter is the grep¹ command. The grep command is used to search through files and print lines matching a specified pattern to standard out. The generic (non-piped) form of grep is:

	grep pattern file(s)

Thus we can look for specific user entries in the /etc/passwd file as follows:

	$ grep flintstone /etc/passwd [Enter]
	fflintstone:2Ux9znoiuSpL:518:531:Fred Flintstone:/home/fflintstone:/bin/ksh
	wflintstone:24qza6RiyBZf:519:531:Wilma Flintstone:/home/wflintstone:/bin/ksh

Notice that two lines were displayed by grep, both lines matching the pattern "flintstone". Consider the following example:

	$ grep fred /etc/passwd [Enter]
	$

In this example, no results are displayed because fred does not match Fred since grep is case sensitive (unless using the -i, ignore case option). The pattern can contain any of Unix's metacharacters to more precisely define the pattern. There is also a egrep (extended grep) command (or grep -e) which extends the pattern matching capability.

When using grep with pipes, the file argument is omitted since the input data is arriving via the pipe as follows:

	$ cat /etc/passwd | grep flintstone [Enter]
	fflintstone:2Ux9znoiuSpL:518:531:Fred Flintstone:/home/fflintstone:/bin/ksh
	wflintstone:24qza6RiyBZf:519:531:Wilma Flintstone:/home/wflintstone:/bin/ksh

While some search patterns using grep are straight forward, others while appearing simple are not. See an example here.

¹ Although there are many explanations about what grep stands for, the name came from an ed (an early editor program) command, specifically g/regular-expression/p; where the g stands for global, the regular-expression was shortened to re, and the p stands for print. [Kernighan & Pike]

Command Summary

cut - remove sections from each line in a file
grep - (search for and) print lines matching a pattern
sed - command to modify a text stream
sort - sort lines of text files
tr - translate characters
uniq - remove duplicate lines from a sorted file
wc - print the number of bytes, words, and lines in files
pipe operator - character ( | ) used to link the output of one command to the input of another
redirection - changing the flow of an input, output or error stream