When the browser wants to send an HTTP request with the
GET
method. It will check if the resource has a fresh (not
expired) cache. If there’s a fresh cache, the browser directly uses it
instead of sending the request. If there’s an expired cache a validation
request will be sent. If the server found the cache is actually fresh,
it sends a 304 Not Modified
response to tell the browser
the cache is still fresh. The browser then uses the cache as the
response and marks it fresh. If the server found the cache is expired
indeed, it sends the new content. The browser uses the new content as
the new cache and marks it fresh.
The server uses the Cache-Control
HTTP header to control
how the browser caches the request. The header has the following
values.
no-store
Disallow the browser to cache the response
no-cache
Allow the browser to cache the response. But it must check with the origin server for validation before using the cached copy.
public
Allow intermedia like proxies and CDNs to cache the response.
private
The response must not be stored by a shared cache.
max-age=<seconds>
The maximum amount of time a resource is considered fresh.
For example, Cache-Control: public, max-age=600
means
the response can be cached by both the browser, proxies, and CDNs for 10
minutes. In 10 minutes, if the resource is request again, the cache can
be directly used. After 10 minutes, the cache must be validated before
using.
There’re two ways to validate.
The first way is through the ETag
and
If-None-Match
headers. An HTTP response can have the
ETag
header to represent the current state of the resource.
For example, ETag
can be the hash of the resource. The
browser stores the content of ETag
. When it needs to access
the resource but the cache is considered expired, it sends a validation
request for the resource. The validation request has the
If-None-Match
header with the stored ETag
as
its value. If the server finds the cache is not really expired by
ETag
, it tells the browser the cache is still fresh with a
304 Not Modified
response. If the cache is expired indeed,
the server sends new content and new ETag
.
The second way is using the Last-Modified
and
If-Modified-Since
headers. A response can have the
Last-Modified
header to specify the last modification time
of the resource. When the browser caches the resource, it also stores
the Last-Modified
value. If the cache needs to be
validated, the browser sends a request to the resource. The request has
the If-Modified-Since
headers whose value is the previous
Last-Modified
. The server compares
Last-Modified
with the real last modification time of the
resource to determine if the cache is fresh.
Caching can speed up your web site but it can also make mistakes. If not configured properly, the browser may serve the user a wrong resource. Moreover, the browser may serve the user wrong resource for only a part of you site and this can break the whole web app down.
The simplest way to have balance between performance and usability is
using Cache-Control: no-cache
. It asks the browser to
validate caches every time. It still needs network accessing everytime
but if the cache is fresh, the server can send a very small response to
save traffics.
For an SPA (Single Page Application) built with morden toolchains,
.js
files, .css
files, and other assets like
images are ususally named by their hash values. Thus we can set a very
long fresh duration like
Cache-Control: public, max-age=315360000
. This will make
the cache fresh for about 10 years. However, the cache config of
index.html
must be different. Since if the browser gets a
wrong index.html
, the URLs of assets will also be wrong.
Thus we could use Cache-Control: no-cache
for
index.html
. Moreover, if there’re dynamic resources like
CGI scripts, we should use Cache-Control: no-store
for
them.
In Apache, we could implement above ideas by the following snippet.
# Ensure `mod_header` is enabled
# Assets named by hash can be cached forever
Header set Cache-Control "public, max-age=315360000"
# The entry cache must be validated
<Files index.html>
Header set Cache-Control no-cache
</Files>
# Dynamic resources must **not** be cached
<Directory ${PREFIX}/cgi-bin>
Header set Cache-Control no-store
</Directory>