Author Topic: Improve performance by serving static files with a lightweight HTTP server  (Read 5481 times)

Offline admin

  • Administrator
  • Sr. Member
  • *****
  • Posts: 296
    • View Profile
This article is of interest to you if:
- You run your site(s) on a dedicated server.
- You run a Un*x system.
- You run Apache as your HTTP server.
- You have a lot of traffic, are experiencing performance issues, or anticipating them.

In this article I will explain how to install Mathopd as a static file/image serving HTTP server, and why it will be beneficial.


Understanding Apache's Forking system

For you to understand why what we are going to do is useful, it is first necessary that you understand (from a high level) how apache works.

Basically, Apache handles a request with a process (known as a child). Each simultaneous request is handled by a different process. Therefore, the more simultaneous connections to your server are open, the more child processes Apache will create to handle these connections.
This is why, when you run the command top (or ps), you can see a number of httpd processes running on your server:



If you look into your apache's configuration file (httpd.conf), a number of parameters allow you to manipulate the way this process generation works:

MaxClients defines the max number of child processes that can be created by apache (hence the amx number of simultaneous requests served).
StartServers defines the number of processes that are created by default when apache is started.

Apache doesn't wait for incoming requests to create the processes that will treat them. This creation process takes time, and this would result in bad response times. Instead, it creates the processes in advance, and keeps them running waiting for a new connection. The variables MinSpareServers and MaxSpareServers define the min and max numbers of processes that must be created and waiting for incoming connections.

So the basic life of a child process is:

1) Apache creates the child process because there are less than MinSpareServers waiting for connections.
2) The child process waits for a connection.
3) A connection is made. The child process answers the request.
4) The child process goes back to the wait status to serve a new request.

Processes can be deleted for multiple reasons, for example because there are more than MaxSpareServers waiting, or more than MaxRequestsPerChild requests were already served by this process.

KeepAlive is a mechanism implemented in apache (and many other http servers) in order to keep a child process that has served a request from immediately closing the connection to that request's sender client. Instead, the connection will be "kept alive" for a period of KeepAliveTimeOut seconds, after which the connection will be closed. This is very interesting because opening a connection is a costly mechanism, and because it is very common that a client sends a group of requests in a short period of time.
For example, when you download a web page, you send a request for the page itself, and one request for each file referenced on this page (images, css, javascript, etc.). Without KeepAlive, you would open a new connection to apache for everyone of these requests, and they would all be served by multiple child processes. With KeepAlive on, all requests made when you connect to a page are served over the same TCP connection by the same child process, which allows significant speed inprovements.

If your server has mod-status installed, you can check your server status and view the states of the various active processes:






Fork vs Select

The forking mechanism of Apache (it "forks" new processes), used in combination with KeepAlive, has many advantages, the main being its robustness. By serving every request within a different process, you are sure that one of these request failing for some reason will never crash the whole server.

However when traffic increases and your system starts becoming busy, some problems appear with this mechanism.
Because each apache child process uses CPU and memory, the more connections you have, the more CPU and memory are necessary.
While KeepAlive is necessary, because of the way is works, a visitor or a search engine robot only viewing a page and leaving, or an image remotely linked will result in a child process being used and unavailable to serve other requests for a duration of KeepAliveTimeOut. The more traffic your server gets, the more processes will be created, and the more you risk to face a situation where your MaxClients is reached.

Why not just make MaxClients higher then? will you ask. Well first, the default max value is 255. You need to edit apache's source and recompile to make this value higher. Also, because of the inherent memory and CPU consumption of each child process, allowing more than is reasonable creates the risk of running out of resources. You need to think as well about the case where all these clients would be actual users making regular requests to some of your scripts, these may result in as many SQL connections and could result in your database running out of resources (basically, MaxClients is an easy way to limit incoming connections to your database).

A temporary solution is to reduce the KeepAliveTimeOut value, in order to release child processes more quickly after they have served a client. However this is more a quick fix and with traffic growth it is likely that you run into the same problem later.

Here is a better solution that will not only prove more scalable, but also significantly reduce the resources usage on your server: use a different HTTP server to serve your static files.
In this article we will see how to install and configure Mathopd, but there are other options as per what lightweight HTTP server could be chosen, such as thttpd or Boa.

The main idea behind these servers is to use a different mechanism to serve requests: instead of forking processes to serve requests, all requests are served from the same process and rely on the select() and sendfile() system calls.
This results in a much better performance when serving static files (which are treated directly by the kernel instead of read and written by the server as with apache), and a much lower overall resource usage (due to the fact that only one process is running). Using them only for static files makes the risk of the process crashing very low.

Our strategy here will be to serve all static files from this lightweight HTTP server, and use apache only to serve scripts. By doing this, we now face a situation in which a client will most of the time send only one request to apache at a time, and we can turn keepalive off, which will result in a performance improvement on apache's side as well.



Preparing the server

We will take the example of serving scripts from the regular www.example.com subdomain, and the static files from static.example.com.
First, we want both HTTP servers to be listening on the port 80. This is because many firewalls will not allow any fancy port and you don't want the visitors from behind those firewalls not to get the static files. So you will need 2 different IP addresses configured on your server.
If your server does not come with 2 IP addresses, get one from your host, and have the DNS configured so that it reverses to static.example.com.
Then install the IP address on your server (we assume that 111.222.33.43 was your primary IP and 111.222.33.44 is your new IP):


Code:
# cd /etc/sysconfig/network-scripts/# cp ifcfg-eth0 ifcfg-eth0:1
Then edit ifcfg-eth0:1 so that it looks like (replace 111.222.33.44 with your new ip address):


Code:
# more ifcfg-eth0:1DEVICE=eth0:1BOOTPROTO=staticIPADDR=111.222.33.44NETMASK=255.255.255.0ONBOOT=yes
Then, edit your hosts file:


Code:
# more /etc/hosts127.0.0.1              localhost.localdomain localhost111.222.33.43          my.server.name111.222.33.44          static.example.com
Restart your network

Code:
# /etc/rc.d/init.d/network restart
You need to update your DNS server with the new alias. If you are using bind, this typically means adding a line in /var/named/example.com.hosts with:

Code:
static.example.com.   IN      A       111.222.33.44
Restart named:

Code:
# /etc/rc.d/init.d/named restart
We're almost ready. Last thing we need to make sure is that Apache is now binded to the primary IP:
Edit the BindAddress parameter in httpd.conf:


Code:
BindAddress 111.222.33.43


Installing Mathopd

We can now install mathopd:


Code:
# mkdir mathopdsrc# cd mathopdsrc# wget http://www.mathopd.org/dist/mathopd-1.5p5.tar.gz# gzip -d mathopd-1.5p5.tar.gz# tar xvf mathopd-1.5p5.tar# cd mathopd-1.5p5/src# make
That's it! Note the mathopd tar is only 60kB. Now that's what you may call lightweight .



Configuring Mathopd

We now need to configure mathopd properly. You can view the sample config file provided in the distribution to get the detailed information. Here's the most important things to edit in /usr/local/etc/mathopd.conf:

In Tuning, set NumConnections to the max number of concurrent clients you may expect. Mathopd implements KeepAlive with a mechanism called clobbering. Basically, Mathopd will keep alive all connections, and if NumConnections is reached, will kill idle connections to handle new requests. This is an excellent feature, because unlike apache, you cannot have clients locked out because of idle connections waiting in the KeepAlive state. If a new client comes, mathopd will always be able to serve it, unless all its connections are actually active (reading/writing). If you ever face this situation, you only have to set NumConnections to a higher value which should have minimal impact on performance (mathopd uses very little memory for each connection).

You may configure the Log and LogFormat chapters depending on what you wish to log. If you want optimum performance, you may completely disable logging by having the Log entry set to /dev/null .

You can bind your server to the proper IP address by adding a server entry:


Code:
Server {        # here the IP you want to bind mathopd with        Address 111.222.33.44        Virtual {                AnyHost                Control {                   # here we map a request to static.example.com/ to a specific folder                   # this is similar to apache's document root                        Alias /                        Location /home/example.com/www/                   # here you can add extra headers to your files                        ExtraHeaders { "Cache-Control: max-age=7200, must-revalidate" }                }        }}
As shown in the example above, it is very easy to add expire headers to the HTTP responses, which is probably something you want to do if your images and static files don't change very often, in order to save some bandwidth.



Starting Mathopd

Now we're done with configuration, all we have left to do is start mathopd.


Code:
# /usr/local/mathopd/mathopd -f /usr/local/etc/mathopd.cfg


Update your scripts

You will now need to update your site and forum scripts, so that images are referenced with the new url http://static.example.com/(oldpath) Depending on your scripts complexity and size, and the amount of images used, this could be the longest step of the intallation.

You may then set KeepAlive off in Apache, and restart the server.


Code:
# /etc/rc.d/init.d/httpd restart


Done!

And there you go, mathopd should now be serving all static files.
You'll note that only one process appears in the list.



You can monitor mathopd's status by sending a SIGWINCH signal to its process:


Code:
kill -SIGWINCH `cat /var/run/mathopd.pid`
This will create a dump file in /tmp with similar information as apache's mod-status:


Code:
# more /tmp/mathopd-17756-dump*** Dump performed at 1117542000.877746Uptime: 686399 secondsActive connections: 150 out of 150Max simultaneous connections since last dump: 150Forked child processes: 0Exited child processes: 0Requests executed: 52706071Accepted connections: 50667648Pipelined requests: 119CPU time used by this process:    65262.39                     children:        0.00Connections:-----W--------W--------------W------------W---------W-----W------------W----------W-------------------------------R-------------------------R---------Reading: 2, Writing: 8, Waiting: 140, Forked: 0*** End of dump


Conclusion

This article showed how you could install and configure Mathopd on your server, in order to serve all static files with this lightweight HTTP server. If your site is busy, doing so may improve performance drastically on your site, for the following reasons:

- Mathopd serves static files much faster than Apache.
- Mathopd uses a single process and can serve many more requests to static files than Apache using much less resources.
- Once you have set up a lightweight HTTP to serve all static files, you may disable KeepAlive in Apache, which will also result in better performance.

Mathopd is very robust, I am currently using it to serve several hundred requests per second and have yet to see it create any problems.

The installation and setup may seem a bit complex if you're new to Un*x systems, but the benefits it may bring are definitely worth the effort, and can probably delay or even save a server upgrade when you thought it was needed.