The Apache HTTP Server
(httpd
) was the most popular HTTP server in the world. Nginx takes its position nowadays.
However, for some occasions, httpd
still works better,
especially for small, internal web sites. This text discusses how to
configure httpd
2.4+ and provides several solutions for
common occasions.
Paths of httpd
configuration files are different with
different ways of installation. In macOS-default-installed
httpd
, it’s /etc/apache2/httpd.conf
. In
apt-installed httpd
, it’s
/etc/apache2/apache2.conf
. As the Include
directive is supported, the configuration can be organized into multiple
files. The apt-installed version organizes it the best in my opinion.
/etc/apache2/apache2.conf
is the entrance of the
configuration. Other parts of the configuration are in different files
and are referenced by the entrance using Include
directives. The following is a typical structure of the
/etc/apache2
directory of an apt-installed
httpd
.
# In Ubuntu with apt-installed `httpd`
root@localhost:/etc/apache2# ll
total 88
drwxr-xr-x 8 root root 4096 Dec 28 05:19 ./
drwxr-xr-x 94 root root 4096 Dec 24 14:26 ../
-rw-r--r-- 1 root root 7224 Jul 10 08:27 apache2.conf
drwxr-xr-x 2 root root 4096 Jul 6 08:26 conf-available/
drwxr-xr-x 2 root root 4096 Jul 6 08:26 conf-enabled/
-rw-r--r-- 1 root root 1781 Jul 6 09:30 envvars
-rw-r--r-- 1 root root 31063 Oct 10 2018 magic
drwxr-xr-x 2 root root 12288 Jul 6 09:29 mods-available/
drwxr-xr-x 2 root root 4096 Aug 16 08:18 mods-enabled/
-rw-r--r-- 1 root root 451 Jul 10 04:34 ports.conf
drwxr-xr-x 2 root root 4096 Dec 23 06:28 sites-available/
drwxr-xr-x 2 root root 4096 Dec 23 06:28 sites-enabled/
The basic rules of httpd
config grammar are the
following.
httpd
parses configuration from top to bottom, thus
the order is important;
Lines which start with #
are comments;
Directives are case insensitive but the CamelCase is recommended;
If a line is too long, it can be split by
\
.
Most Apache installations have a default config. We usually just make some modifications to it. However, to make a good understanding, we need to learn how to write a config from scratch. Thus after introducing some basic rules, I will start by explaining a minimal config. Then discuss virtual hosts and some frequently-used directives. End with deploying Python WSGI applications.
Table of Contents
Let’s start with a minimal configuration supporting static files,
directory listing, and CGI
scripts. The environment is macOS with default-installed
httpd
. httpd
modules are installed in
/usr/libexec/apache2
.
# A minimal configuration
# Supporting static files, directory listing, and CGI scripts
ServerRoot /usr
LoadModule unixd_module libexec/apache2/mod_unixd.so
LoadModule authz_core_module libexec/apache2/mod_authz_core.so
LoadModule autoindex_module libexec/apache2/mod_autoindex.so
LoadModule cgi_module libexec/apache2/mod_cgi.so
User _www
Group _www
Listen 80
ServerName default
DocumentRoot /var/www
ErrorLog /var/log/apache2/error_log
<Directory />
Require all denied
Options None
AllowOverride None
</Directory>
<Directory /var/www>
Require all granted
</Directory>
<Directory /var/www/cgi-bin>
Options ExecCGI
SetHandler cgi-script
</Directory>
<Directory /var/www/files>
Options +Indexes
</Directory>
The ServerRoot
directive sets the directory in which the
server lives. We rarely modify the value after installation. Most
directives use this value as the root of relative paths. Be careful,
Directory
uses the root of the file system instead of the
server root as the root of relative paths.
Then we load the modules we need.
LoadModule modname modfile
loads modname
module from file modfile
. If modfile
is a
relative path, it uses the server root as the root, mentioned above.
unixd_module
is necessary for httpd
to
run as a daemon. The User
and Group
directives
below are provided by this module.
authz_core_module
is for security. It provides the
Require
directive to control who can access a specified
resource.
autoindex_module
is for directory listing. It
renders a file list of the directory when a user visits a directory with
the Indexes
option.
cgi_module
is to enable the CGI handler.
The next two lines, specify which user and group the
httpd
daemon runs as.
Listen 80
tells httpd
to listen on the port
80.
ServerName
sets the name of the server. It’s not
important in this example and you could use arbitrary value. But it
becomes important when virtual hosts come into play. Virtual hosts will
be discussed later.
DocumentRoot /var/www
means mapping the root of the
network URI to the local path /var/www
. Visiting
http://yourdomain/*
will access the file
/var/www/*
.
ErrorLog
directive, as its name implies, specifies where
to put error logs.
Directives in configuration apply to the entire server. If you wish
some directives apply to only a part of the server, you scope them by
placing them in <Directory>
,
<DirectoryMath>
, <Location>
,
<LocationMatch>
, <Files>
,
<FilesMatch>
, and <VirtualHost>
blocks.
<Directory>
and <Files>
blocks
mean what their names imply. <Location>
means a
network URI. <*Match>
is the corresponding regex
version.
We specify Require all denied
to the directory
/
which is the root of the file system.
Require all denied
means that the server should reject all
requests accessing the directory. Subdirectories inherit the
configuration, so no one can access any file in the host. This is what
people often do: Protect the whole file system first to avoid security
issues, then open specified subdirectories for the web. By default,
httpd
allows the user to put a file named
.htaccess
in a directory to override the configuration of
the directory and its subdirectories. We place
AllowOverride None
to forbid all possible overriding. The
Options
directive specifies some permissions of the
directory and its subdirectories, like the permission to execute CGI
scripts, the permission to list files, etc.. Thus we put
Options None
to forbidden all these behaviors.
As we specify /var/www
as the document root, we need to
grant people to visit the directory. That’s why we put
Require all granted
in the
<Directory /var/www>
block.
Visiting http://yourdomain/*
returns the content of the
file /var/www/*
in default. However, we want to put CGI
scripts in /var/www/cgi-bin
and when visiting
http://yourdomain/cgi-bin/*
, the server should execute the
script and return the output. We must give the directory permission to
do it. That’s what Options ExecCGI
does. Aside from the
permission, we need to tell httpd
to handle files in the
directory as CGI programs instead of static files. So we put
SetHandler cgi-script
.
The final part of this example, Options +Indexes
in the
block <Directory /var/www/files>
, is to permit users
to see the file list of the directory. In this directory, we provide
downloadable files for users. If a user visits
http://yourdomain/files/
he or she will get the file list
of /var/www/files
.
You must have noticed a very subtle difference between
Options ExecCGI
and Options +Indexes
, the
+
sign. Without +
, options will override the
inherited. With +
, options will be added to the inherited.
You may guess there’ll be a -
sign. You’re right.
httpd
allows us to build multiple sites in one host.
This is implemented by the <VirtualHost>
block.
For example, you want to build two sites,
www1.example.com
and www2.example.com
, with a
similar structure to the above minimal configuration. Their document
roots are /var/www/www1
and /var/www/www2
.
ServerRoot /usr
LoadModule unixd_module libexec/apache2/mod_unixd.so
LoadModule authz_core_module libexec/apache2/mod_authz_core.so
LoadModule autoindex_module libexec/apache2/mod_autoindex.so
LoadModule cgi_module libexec/apache2/mod_cgi.so
User _www
Group _www
Listen 80
ErrorLog /var/log/apache2/error_log
<Directory />
Require all denied
AllowOverride None
</Directory>
<VirtualHost *:80>
ServerName www1.example.com
DocumentRoot /var/www/www1
ErrorLog /var/log/apache2/www1_error_log
<Directory /var/www/www1>
Require all granted
</Directory>
<Directory /var/www/www1/cgi-bin>
Options ExecCGI
SetHandler cgi-script
</Directory>
<Directory /var/www/www1/files>
Options +Indexes
</Directory>
</VirtualHost>
<VirtualHost *:80>
ServerName www2.example.com
DocumentRoot /var/www/www2
ErrorLog /var/log/apache2/www2
<Directory /var/www/www2>
Require all granted
</Directory>
<Directory /var/www/www2/cgi-bin>
Options ExecCGI
SetHandler cgi-script
</Directory>
<Directory /var/www/www2/files>
Options +Indexes
</Directory>
</VirtualHost>
After setting domains www1.example.com
and
www2.example.com
pointing to the IP of your host in your
DNS (It can be tested by modifying /etc/hosts
), you can try
to visit two domains, and you will find it gives you two sites, one is
in /var/www/www1
and another is in
var/www/www2
.
Why do we put an ErrorLog
in the global scope? Because
some errors are about the entire server. For example, the
httpd
can’t lunch for some reason.
We can visit the host by two domains now. What if one visits the host by IP? Which site will be served? The answer is the first virtual host. To forbid visiting by IP, we can create an empty virtual host before all other virtual hosts.
<VirtualHost *:80>
ServerName default
DocumentRoot /var/www
<Directory /var/www>
Require all denied
</Directory>
</VirtualHost>
If we visit the host by IP, httpd
finds that no virtual
hosts can match, so it matches the first one and the first one forbids
accessing anything.
At last, we discuss the VirtualHost
directive
itself.
<VirtualHost addr[:port] [addr[:port]] ...> ... </VirtualHost>
addr
is an IP of the host, port
is what the
name implies and it’s optional. A host can have multiple IPs, we can
build each virtual host for each IP. In the above example, we use
<VirtualHost *:80>
. It means all requests to all IPs
port 80 will be sent to this virtual host if the Host
header of the request is the server name specified by
ServerName
.
Be careful, setting addr
and port
in
VirtualHost
can’t replace the Listen
directive. All addr
s and port
s must be
specified by the Listen
directive in the global scope.
We have a basic understanding of httpd
configuration
now. As we write a more and more complicated configuration, it will be
unmaintainable if we write directives all in one file. We can split the
configuration into multiple files and use the Include
directive in the entrance file to include other parts.
Include file-path|directory-path|wildcard
A good example is the apt-installed httpd
. Its
configuration files and modules are all in
/etc/apache2
.
root@localhost:/etc/apache2# ll
total 88
drwxr-xr-x 8 root root 4096 Dec 31 09:43 ./
drwxr-xr-x 94 root root 4096 Dec 24 14:26 ../
-rw-r--r-- 1 root root 7224 Jul 10 08:27 apache2.conf
drwxr-xr-x 2 root root 4096 Jul 6 08:26 conf-available/
drwxr-xr-x 2 root root 4096 Jul 6 08:26 conf-enabled/
-rw-r--r-- 1 root root 1781 Jul 6 09:30 envvars
-rw-r--r-- 1 root root 31063 Oct 10 2018 magic
drwxr-xr-x 2 root root 12288 Jan 2 10:02 mods-available/
drwxr-xr-x 2 root root 4096 Aug 16 08:18 mods-enabled/
-rw-r--r-- 1 root root 451 Jul 10 04:34 ports.conf
drwxr-xr-x 2 root root 4096 Jan 1 13:53 sites-available/
drwxr-xr-x 2 root root 4096 Dec 23 06:28 sites-enabled/
apache2.conf
is the entrance. All virtual hosts are in
sites-available
, one site per file. Not all sites (virtual
hosts) are enabled. For each enabled site, a symbolic link at
sites-enabled
is created.
Just like virtual hosts, modules are organized with
mods-available
and mods-enabled
directories.
More modules are directly integrated with httpd
in this
version, like unixd_module
, we don’t have to explicitly
load.
To enable a site or a module, we don’t need to create symbolic links
by ourselves. The apt-installed httpd
provides two commands
a2ensite
and a2enmod
.
If you have a site in
/etc/apache2/sites-available/example.conf
, you could use
a2ensite example
to enable it. Also we could use
a2enmod mod_proxy
to enable mod_proxy
.
All Listen
directives are placed in
ports.conf
.
Let’s see how apache2.conf
includes virtual hosts.
# Include the virtual host configurations:
IncludeOptional sites-enabled/*.conf
Be noticed, it uses IncludeOptional
instead of
Include
. The difference between them is
IncludeOptional
will be silently ignored (instead of
causing an error) if wildcards are used and they do not match any file
or directory or if a file path does not exist on the file system.
If you’re not using the apt-installed httpd
, the
structure is also recommended by me.
The content about basic concepts of httpd
configuration
is over here. The next part is to introduce some frequently-used
directives and common solutions. You can also stop reading this tutorial
and go to the official documentation.
From the next chapter, I suppose we are using Ubuntu with
apt-installed httpd
.
We have seen the ErrorLog
directive when we discuss the
minimal configuration and virtual hosts. Besides recording errors, we
also want to record accesses. To realize it, we need mod_log_config
.
The apt-installed version already integrated the module. If your
installation had not, load it explicitly.
LoadModule log_config_module path-to-file
Two directives are here. CustomLog
and
LogFormat
. CustomLog
sets the path of the log
file and the format. The format can be a C-style format string or a
nickname. A nickname is of a format predefined by
LogFormat
.
# CustomLog with format nickname
LogFormat "%h %l %u %t \"%r\" %>s %b" common
CustomLog "logs/access_log" common
# CustomLog with explicit format string
CustomLog "logs/access_log" "%h %l %u %t \"%r\" %>s %b"
Here is the complete list of control characters.
mod_alias
provides Redirect
and RedirectMatch
directives
which redirect one URL to another.
Redirect [status] [URL-path] URL
status
sets how the redirect happens at the HTTP level.
Is it a 301 permanent redirection or a 302 temporary redirection?
Redirect permanent /imgs http://your.cdn.com/imgs
With the above directive, any requests to /imgs/logo.gif
will return a 301 redirection to
http://your.cdn.com/imgs/logo.gif
.
RedirectMatch
is the regex version of
Redirect
. To implement the same function to the above
directive, use
RedirectMatch permanent /imgs/(.*) http://your.cdn.com/$1
As we see, regex capture is supported through
$1, $2, ...
like Perl.
Let’s see a real example here. A site has the following domains:
example.com
www.example.com
example.net
www.example.net
What we want is to redirect all requests to
www.example.com
permanently. Have a look at the
configuration of example.net
.
<VirtualHost *:80>
ServerName example.net
ErrorLog ${APACHE_LOG_DIR}/example.net.log
CustomLog ${APACHE_LOG_DIR}/example.net.log combined
Redirect permanent / http://www.example.com/
</VirtualHost>
Be careful here, I use
Redirect permanent / http//www.example.com/
. The target URL
has the suffix slash. If it doesn’t, visiting
http://example.net/abc
will redirect to
http://www.example.comabc
instead of
http://www.example.com/abc
. This only happens when
redirecting the root URI.
You must find I used ${APACHE_LOG_DIR}$
. This is also a
demonstration of how to use environment variables in an
httpd
configuration. Any environment variable can be used
by ${envvar_name}
. In apt-instead httpd
,
httpd
-specified environment variables are defined in
/etc/apache2/envvars
.
root@localhost:/etc/apache2# grep export envvars
export APACHE_RUN_USER=www-data
export APACHE_RUN_GROUP=www-data
export APACHE_PID_FILE=/var/run/apache2$SUFFIX/apache2.pid
export APACHE_RUN_DIR=/var/run/apache2$SUFFIX
export APACHE_LOCK_DIR=/var/lock/apache2$SUFFIX
export APACHE_LOG_DIR=/var/log/apache2$SUFFIX
export LANG=C
export LANG
#export APACHE_LYNX='www-browser -dump'
#export APACHE_ARGUMENTS=''
#export APACHE2_MAINTSCRIPT_DEBUG=1
mod_alias
provides simple redirection functions.
However, sometimes we need complicated behaviors. For example, supose we
have an SPA (Single Page Application) and our requirements are the
following.
Requests to a static file or directory should send to the default handler
Requests to index.html
send to the default
handler
Other requests should send to index.html
; JavaScript
code in index.html
takes care of routes in the
browser-side.
This is different from redirections. We don’t send a 30x response to
the browser but handle within the server. This complicated behavior is
called rewriting and it needs mod_rewrite
.
<VirtualHost *:80>
...
RewriteEngine On
<Location />
RewriteBase /
RewriteRule ^index\.html$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.html [L]
</Location>
</VirtualHost>
Let’s explain them line by line.
RewriteEngine on
Besides loading the module, mod_rewrite
needs this
directive in the virtual host which you want to enable the
module.
RewriteBase /
Specify the URL prefix to / for RewriteRule
directives that substitute a relative path.
RewriteBase
can only apply to a directory scope. It
can’t be placed in the global scope. Here we put it in
<Location />
. It’s equivalent to put it in the
document root directory.
RewriteRule ^index\.html$ - [L]
RewriteRule
defines rules for the rewriting engine.
RewriteRule Pattern Substitution [flags]
Pattern
is a regex representing the original URL. In the
above directive, we want to rewrite only /index.html
.
Substitution
is the string that replaces the original
URL. Captured groups can be used as $1, $2, ...
in
Substitution
. Here we use a dash (-
) which
indicates that no substitution should be performed (the existing path is
passed through untouched).
flags
is to control other behaviors of the rewriting
engine. flags
is a comma-separated list, surrounded by
square brackets. Here we use [L]
only the L
flag which indicates if the rule is hit, stop the rewriting process
immediately and don’t apply any more rules.
RewriteCond %{REQUEST_FILENAME} !-f
The RewriteCond
directive defines a rule condition. One
or more RewriteCond
can precede a RewriteRule
directive. The following rule is then only used if both the current
state of the URI matches its pattern, and if these conditions are
met.
RewriteCond TestString CondPattern [flags]
RewriteCond
checks if TestString
meets
CondPattern
. TestString
can be a supported
attribute of the request like %{REQUEST_FILENAME}
we used
here which indicates the requested filename. Be more specific,
%{REQUEST_FILENAME}
indicates the mapped local path of the
requested URI. CondPattern
we use is !-f
which
indicates the TestString
is not a file.
RewriteCond %{REQUEST_FILENAME} !-d
Apply subsequent rewriting rules if the requested file is not a directory.
RewriteRule . /index.html [L]
Rewrite any request to /index.html
and stop the
rewriting process.
Considering that we’re developing a web application which needs to
fetch data from an external site. We can’t fetch data in JavaScript
because of the same
origin policy. Thus we need to create a service on our own server as
a proxy. Requests to this service will be be sent to the external site
through our server and responses from the external site will be sent to
the client through our server. This behavior is called a gateway or
reverse proxy. httpd
creates gateways by mod_proxy
.
ProxyPass /data http://www.external.com/data
Because we proxy to an HTTP server, so we must ensure
mod_proxy_http
is enabled too.
Now http://yourdomain.com/data/*
will be forward to
http://www.external.com/data/*
. However, if the external
site sends a redirection, the browser will directly visit the original
site which is not what we want. Thus we need the
ProxyPassReverse
directive.
ProxyPass /data http://www.external.com/data
ProxyPassReverse /data http://www.external.com/data
ProxyPassReverse
lets httpd
adjust the URL
in the Location
, Content-Location
and
URI
headers on HTTP redirect responses. This is essential
when httpd
is used as a reverse proxy (or gateway) to avoid
bypassing the reverse proxy because of HTTP redirects on the backend
servers which stay behind the reverse proxy.
Supposing you’re responsible for setting up a blog or an online
document system for your department, what is the simplest way? Actually
we don’t need to write any “real code”. Make a directory in the server
and serve it by httpd
with mod_autoindex
.
Documents are just placed in the directory. A readonly document system
is built. Read the documentation of mod_autoindex
and you
will find you can beatify the UI in many ways. OK, how to allow users to
write? Set up a FTP
serving the directory. You can even get an authentication system of
writing in this way.
The only problem is to set up an authentication system of reading.
httpd
can do this with authn/z
modules.
Enter the /etc/apache2
directory. Execute
htpasswd -c authusers smith
and type a password.
root@localhost:/etc/apache2# htpasswd -c authusers smith
New password:
Re-type new password:
Adding password for user smith
After input and confirm the password, the file authusers
is created in /etc/apache2
. It records a user named
smith
and his encrypted password.
root@localhost:/etc/apache2# cat authusers
smith:$apr1$5316jbpB$7gD0bbTHrpG6ydUsMM.2l.
Then we created a new virtual host in
/etc/apache2/sites-available/papers.your.com.conf
and write
the following.
<VirtualHost *:80>
ServerName papers.your.com
DocumentRoot /var/www/papers.your.com
<Directory /var/www/papers.your.com>
Options +Indexes
AuthType Basic
AuthName YourCompany
AuthUserFile authusers
Require valid-user
</Directory>
</VirtualHost>
Directives are so intuitive that I don’t think there is a need for more explanation.
Using a2ensite papers.your.com
to enable the site. After
DNS is set, you could visit the site and you will find the browser
requires you to input a username and a password.
Let users put their papers in /var/www/papers.yours.com
via FTP and your document system is online now.
How to add more users?
root@localhost:/etc/apache2# htpasswd authusers username
This command will add a user if not exists. If the user does exist,
it updates the password. Do not use -c
or your
authusers
will be overridden.
Deleting a user is also simple.
root@localhost:/etc/apache2# htpasswd -D authusers username
More usages of htpasswd
are here.
Authentication/Authorization modules of httpd
are not
just I demonstrated, you can even set up user groups. Integrating with
<Directory>
and <Location>
block
can give us a very flexible authentication and authorization system.
Check here to see
more.
When you did the authentication thing above. You may find the browser warns you “Your password will be sent unencrypted.”. So, let’s add HTTPS on.
Make sure mod_ssl
is loaded in httpd
. Place your certificates and keys in the
server, for example,
/etc/ca-certificates/papers.your.com.cert
and
/etc/ca-certificates/papers.your.com.key
.
Copy papers.your.com.conf
to
papers.your.com-ssl.conf
and modify it to the
following.
<VirtualHost *:443>
ServerName papers.your.com
DocumentRoot /var/www/papers.your.com
SSLEngine on
SSLCertificateFile /etc/ca-certificates/papers.your.com.cert
SSLCertificateKeyFile /etc/ca-certificates/papers.your.com.key
<Directory /var/www/papers.your.com>
Options +Indexes
AuthType Basic
AuthName YourCompany
AuthUserFile authusers
Require valid-user
</Directory>
</VirtualHost>
The HTTPS site is online now but we’re not finished. When one visits
the HTTP site we wish he will be redirected to the HTTPS. To realize it,
we can modify the papers.your.com.conf
to use a
Redirect
directive. This is acceptable but here we use
Rewrite
and the reason will be explained later.
<VirtualHost *:80>
ServerName papers.your.com
RewriteEngine on
RewriteRule ^ https://%{SERVER_NAME}%{REQUEST_URI} [END,NE,R=permanent]
</VirtualHost>
We rewrite all requests to the HTTPS site. Because
Rewrite
can reference %{SERVER_NAME}
, we can
write the domain only once. We choose Rewrite
instead of
Redirect
for the maintainability.
The R=permanent
flag indicates the rewriting is actually
a 301 redirection. Other flags are not that important. Check the
documentation of mod_rewrite
to see details.
Your document system satisfies you and me. However, your boss may think it’s not cool and a real web application is required. Thus you write one with Python. Now you need to deploy this WSGI application.
Let’s write a simple Flask application as an example.
from flask import Flask
app = Flask(__name__)
@app.route('/hi/<name>')
def hi(name):
return f'Hi, {name}.'
The easiest method is to deploy WSGI as CGI. Write the following
script yourapp
in your cgi-bin directory.
#!/usr/bin/env python3
from wsgiref.handlers import CGIHandler
from yourwsgiapp import app
CGIHandler().run(app)
Visiting http://yourdomain/cgi-bin/yourapp/hi/there
, you
will see “Hi, there.”.
Our WSGI application may be installed in a virtual environment. To make your CGI run in the virtual environment, we can just modify the sharp-bang comment line.
#!/path-to-your-venv/bin/python
from wsgiref.handlers import CGIHandler
from yourwsgiapp import app
CGIHandler().run(app)
Besides CGI, you can use mod_wsgi
. See its documentation.
Both CGI and mod_wsgi
are very old technologies.
Nowadays many people choose standalone WSGI containers like Gunicorn. A standalone WSGI container
often runs on a non-80 port. We can use mod_proxy
to
forward requests of dynamic resources to the standalone WSGI container
and serve static resources by httpd
itself.
Supposing our standalone WSGI container is running on port 8000, we could write the following in the configuration.
<VirtualHost *:80>
...
ProxyPass /dynamic http://localhost:8000
ProxyPassReverse /dynamic http://localhost:8000
...
# Directives servers static files
...
</VirtualHost>