How large should we set the capacity of the pipe to maximize the performance of a Unix pipe? It depends on how the two processes exchange data.
Consider a simple Unix pipe:
$ a | b
We can abstract process a
as a computation-writing
cycle:
+----+ +-----+ +----+ +-----+
|COMP+-->|WRITE+-->|COMP+-->|WRITE+-->
+----+ +-----+ +----+ +-----+
Process b
can be abstracted to a reading-computation
cycle:
+----+ +----+ +----+ +----+
|READ+-->|COMP+-->|READ+-->|COMP+-->
+----+ +----+ +----+ +----+
Based on the above abstraction, there are three patterns that the two processes could exchange data in. In each pattern, the performance of the pipe is determined by the slower process. Thus we analyze the performance of the slower process in each pattern.
One-to-one means one reading phase depends on one writing phase.
In this pattern, a reading phase of b
only needs data
written by one writing phase of a
:
+----+ +-----+ +----+ +-----+
|COMP+-->|WRITE+-->|COMP+-->|WRITE+-->
+----+ +--+--+ +----+ +--+--+
| |
+--------+ +---------+
| |
v v
+----+ +----+ +----+ +----+
|READ+-->|COMP+-->|READ+-->|COMP+-->
+----+ +----+ +----+ +----+
This is the most common pattern, for example, most Unix filters process data line by line.
Theorem 1
In the one-to-one pattern, a small pipe doesn’t cause unnecessary blocking.
If b
is faster, a
will not be blocked by
the capacity. If a
is faster, b
will not be
blocked by the capacity. Thus the slower process will never be blocked
by the capacity. The performance of the pipe is its theoretical optimum
value.
One-to-many means one reading phase depends on many writing phases.
In this pattern, a reading phase of b
needs data written
by two or more writing phases of a
:
+----+ +-----+ +----+ +-----+
|COMP+-->|WRITE+-->|COMP+-->|WRITE+-->
+----+ +--+--+ +----+ +--+--+
| |
+-----------+ |
| |
| +-------------------------+
v v
+----+ +----+ +----+ +----+
|READ+-->|COMP+-->|READ+-->|COMP+-->
+----+ +----+ +----+ +----+
This pattern could happen. For example, process a
prints
texts line by line, but process b
needs at least two lines
to compute.
Theorem 2
In the one-to-many pattern, if process a
is not faster
than b
, a small pipe doesn’t cause unnecessary blocking; if
process a
is faster than b
, making the pipe
large enough to store the data a reading phase wants to read can
maximize the performance of the pipe.
In the above example, this means if a
is faster, we
could set the pipe large enough to contain two text lines to maximize
the performance.
Since if process a
is not the faster one, we could
maximize the performance by ensuring writing phases will not be blocked.
Whatever the capacity is, writing phases won’t be blocked, because the
faster process b
is always ready to read.
If process a
is faster, to maximize the performance, we
should ensure reading phases in b
don’t wait for
computation in a
. If the pipe can’t store all the data a
reading phase needs, the waiting can happen. Because when b
is in a slow computation, the writing phase in a
could
block the pipe, thus the next computation phase in a
is
blocked. When b
finished the computation and needs to read
the data, it must wait for the computation in a
.
Many-to-one means many reading phases depend on one writing phase.
In this pattern, multiple reading phases of b
need data
written by one writing phase of a
:
+----+ +-----+ +----+ +-----+
|COMP+-->|WRITE+-->|COMP+-->|WRITE+-->
+----+ +--+--+ +----+ +-----+
|
|
+--------+--------+
| |
v v
+----+ +----+ +----+ +----+
|READ+-->|COMP+-->|READ+-->|COMP+-->
+----+ +----+ +----+ +----+
Theorem 3
In the many-to-one pattern, if process a
is not slower
than b
, a small pipe doesn’t cause unnecessary blocking; if
process a
is slower than b
, making the pipe
large enough to store data which a writing phase wants to write can
maximize the performance.
Since in this pattern if a
is not the slower one, every
time b
needs data, it won’t be blocked. There are no
unnecessary blocking in the slower process.
If a
is slower in this pattern, the writing phase can be
blocked if the pipe is not large enough. If the pipe is large enough to
not block a writing phase, it won’t block any other writing phases.
Because a
is slower, when it enters the next writing phase,
b
has already popped all the data in the pipe. Thus the
slower process a
has no unnecessary blocking.