onerss- A RSS Feed Merger, and a Practice of Unix Philosophy
onerss program is a Unix program merging multiple
RSS 2.0 feeds into one. It’s also a practice of Unix philosophy. The
onerss program concentrates on its core task and is
designed to be integrated with the Unix ecosystem. It results a program
providing as much functions as possible with fewest options. This report
contains a user’s guide, a discussion of the design and implementation
The code is hosted at GitHub <https://github.com/dongyx/onerss>.
Table of Contents
onerss program takes filenames of the source feeds
and prints the merged feed to the standard output. If there are no
onerss reads source feeds from the standard
input. The following examples show how to use
perform a series of merging tasks, from the simplest one to the most
Merge local feeds
onerss feed1.xml feed2.xml
cat feed1.xml feed2.xml | onerss
Merge remote feeds
curl example.com/feed1.xml example.com/feed2.xml | onerss
Merge mixed feeds
( curl example.com/feed1.xml cat feed2.xml ) | onerss
Specify the title of the merged feed
The merged feed has a default channel title. The
options changes it.
onerss -t 'Merged News' feed1.xml feed2.xml feed3.xml
Prepend source channel titles to item titles
It’s natural that users want to distinguish between different source
feeds while reading the merged feed. The
-p option makes
the source channel title prepended to item titles in the merged feed.
-c option sets the
category field of items
in the merged feed to the source channel title. These would help users
and RSS agent programs to distinguish between different sources.
onerss -p feed1.xml feed2.xml
Rename source feeds before merging
It’s a common task that users would like to change source channel titles. For example, some source feeds may have very long channel titles. It becomes unreadable if we prepend them to item titles. Users may want a short alias for such a feed.
onerss program adds no option for the renaming task.
-t option renames the merged feed. If we call
onerss on a single feed with the
-t option, we
rename that feed. Then we could pipe the renamed feed to another
onerss process for merging. Figure 1 demonstrates the
( onerss -t Name1 feed1.xml curl example.com/feed2.xml | onerss -t Name2 ) | onerss -pt 'Merged News'
Rename source feeds by group before merging
( onerss -t GroupName1 feed1a.xml feed1b.xml onerss -t GroupName2 feed2a.xml feed2b.xml ) | onerss -pt 'Merged News'
Merge Atom feeds and RSS feeds
onerss program doesn’t support Atom naively.
However, we could pipe a Atom-to-RSS program to
( curl example.com/atom-feed.xml | atom2rss cat rss-feed.xml ) | onerss
Usually, feeds are hosted online instead of in the local disk. It
would violate the “Do One Thing Well” rule  if we make
HTTP, HTTPS, FTP, etc, because we already got
curl . However, it would be
cumbersome for users if we request them to download feeds to temporary
files first. Especially that requires file naming.
Recent shells like Bash and Zsh provide the process
substitution feature  . Process substitution allows users to
create inline temporary files conveniently. The following call uses
onerss and process substitution to merge remote feeds.
onerss <(curl example.com/a.xml) <(curl example.com/b.xml)
However, not all shells contain this feature, and the syntax may
differ between different shells. In addition, files created by some
process substitution implementations are not
This would limit our implementation. Most programs won’t check the type
of files specified by command line arguments. Even we considered the
difference, the programs called by our program may not. If we forward
these arguments to the called program, it may fail.
Our solution is making
onerss able to read feeds from
the standard input. Although the standard input is usually
seek-able either, it shows that explicitly. We can’t
accidentally fail a called program by forwarding arguments.
Additionally, this makes
onerss work on shells without
Then we could pipe any other program to
support any protocol. Besides,
onerss prints to the
standard output. This makes it a filter  . The Unix environment is friendly to
filters. That’s the essential reason why
onerss could be so
powerful and so simple.
However, this feature depends on the nature of the input format. It’s not hard to split multiple RSS feeds from a single stream. For an input format that we can’t split multiple sources from a single stream, this approach won’t work.
For example, the
diff program  compares two arbitrary text files. It must
takes two arguments from the command line. If it reads two files from
the standard input, it can’t know where the boundary is.
diff implementations require no
seek-ability of input files. We could use process
substitution to compare two temporarily generated files. The following
call compares two remote files in Bash.
diff <(curl example.com/a) <(curl example.com/b)
Not all programs, however, require no
not all shells support process substitution. These programs and shells
couldn’t make use of process substitution.
This problem can’t be solved by application design. It should be
solved by shells. If a shell provides a more powerful process
substitution feature which allows users to decide whether the generated
seek-able, and standardlize the feature, many
programs would become much more powerful without modification.
Although the Unix philosophy encourages small and simple programs, there are useful functions which are better to be available in a central service . A collection of small programs is powerful, precisely because the environment provides useful common facilities.
onerss program provides no option to rename source
feeds. Instead, it encourages users to recursively call it with the
-t option. This design significantly decreases the
complexity of interfaces.
This approach works because
onerss is a function mapping
a set to its subset. Specifically,
onerss reads a feed set
and prints a feed which is also a kind of feed set, a feed set
containing only one feed. This approach could be used in other programs
with this property.
However, this approach is some kind of hacking. The way it exploits
-t option violates the original purpose of the option.
That’s why we explained a lot in the Introduction chapter.
A more general way suggested by the Unix philosophy  is writing a new
program to rename a feed. Let’s call this program
rsscname, it’s not only that we don’t need to hack the
-t option, but also that the
-t option could
be removed from
rsscname takes two arguments. One is the
channel title. Another is the filename. If the filename is omitted,
rsscname reads the feed from the standard input. Figure 2
shows the command using
rsscname to replace figure 1.
( rsscname 'NewName1' feed1.xml curl example.com/feed2.xml | rsscname 'NewName2' ) | onerss -p | rsscname 'Merged Name'
Figure 2 doesn’t complicate the command much more than figure 1, but it’s more self-explained. In addition, this approach works on a wider range of programs.
Other options of
onerss may also be separated as
independent programs. However, that would produce too many
special-purpose programs. The Unix philosophy encourages making simple
but expressive programs . These small programs, including
rsscname, are too special-purose, not expressive.
A possible solution is developing a general RSS or XML manipulator to
work with or even replace
onerss. However, it’s hard to
design a concise interface for such a program. This type of programs
often ship with a domain-specific language  like CSS selectors , or XPath .
The Unix philosophy encourages process separation , using small programs,
especially those fundamental ones like
awk, glued together to form a larger
program. Shell is born for this type of gluing and
is written in shell.
However, to make this easy, the small programs and the file formats must conform to some conventions. These conventions are called “Good Files and Good Filters” by . We only discuss the “Good Files” part, because we are able to choose what filters we use.
“Good Files” are line-based text files. Each line is an object of interest. When more information is present for each object, the file is still line-by-line, but columnated into fields separated by blanks or tabs.
XML files are clearly not “Good Files”. Fortunately, W3C developed
which contains the
programs to convert between XML files and “Good Files”. The
onerss program is basically a wrapper of the following
hxpipe | awk | hxunpipe
This makes the implementation very concise. At the moment this report was writing, it contained no more than 160 effective lines of shell, including tedious work like parsing arguments, and printing the help and version information.
onerss program, like any shell script piping
multiple processes, has a poor performance.
No matter how low the cost of process creation is, each process
requires application-level initialization code. For example, the
grep program needs to compile the regular expression. Even
we save compiled regular expressions, loading them costs time.
Although in our practices, the performance bottleneck is always
curl for it accesses remote files. The
program could become the bottleneck when merging numerous feeds with
We’ve introduced the
onerss program and discussed
several advantages and flaws of its design and implementation. The
onerss program is a simple, expressive, consicely
implemented, but hacky and slow program.
Basically, most advantages are from the great facilities of the Unix system. Some of the flaws are due to the author’s mistakes. The others are due to the essential imperfection of the Unix philosophy.