Replacing Apache with NGINX and HTSCANNER

/

WHY?

We run a lot of wordpress sites, and the one thing that seems impossible to manage is the bloat we get from plugin usage and incorrectly sized images uploaded by site administrators.  Often a site can get to the point where it takes 30 seconds plus to load. Needless to say, we then get complaints about the slowness of the site.

Part of KND’s skill set is the development, management and deployment of highly scalable, high performance web clusters. So our Network engineer wondered if he could solve the problem by tuning the web software within an inch of it’s life.

If we could decrease the latency for the content loading, as well as use a webserver architecture that scales out under load we should be able to make these sites load faster.

So on to centre stage came our favorite clustered webserver toolkit; php-fpm, nginx, memcached and htscanner. (He also noticed that the WordPress site itself utilizes NGINX)

Just so you know, memcache usually needs you to populate it’s cache and manage it within your application/website. This is more fiddling than I want to get into right now, so we will just be using memcache to handle PHP’ s session data.

By the way these instructions are heavily nixed; sorry windows kids.

The Tools

You will know when you need to use these tools so I won’t wax lyrical here. You can go to these sites yourself:

PHP-FPM: http://php-fpm.org/Main_Page

HTSCANNER: http://pecl.php.net/package/htscanner

NGINX: http://nginx.net/ (better here: http://wiki.nginx.org/Main)

MEMCACHED: http://www.danga.com/memcached/

The Problem(s)

Ok so there were a couple of problems just changing over, not just that the config files are different between servers.

The first thing you should know when you change from Apache to NGINX is that you are going to lose MOD_REWRITE  and .HTACCESS.

But don’t wig out; there are replacements. NGINX has it’s own request rewriting using regex and some bright sparks have developed a replacement for .htaccess HTSCANNER – you can see that the windows crowd probably pushed this one along.

So I will deal with the problems and their solutions one at a time.

Let’s Begin……..

HTSCANNER: The .htaccess replacement

Installing most of this stuff is really well documented and pretty easy. But I really hit a brick wall with the install of HTSCANNER. The documentation was a little bit light on, and referred specifically to apache-cgi not NGINX . Here are some instructions to help get you going:

Installing HTSCANNER

So my first hurdle was installing; pecl documentation says:

pecl install htscanner

Uh oh this software is alpha, so the instructions need to be:

pecl install channel://pecl.php.net/htscanner-0.9.0 (or version thereof )

Configuring PHP

So you need to configure php.ini to recognize this module. Just pop the following guff into the bottom of your php.ini file ( or wherever makes the most sense for you- debian users I am looking at you here*) :

extension=htscanner.so
[htscanner]
htscanner.config_file=".htscanner"
htscanner.default_docroot="/var/www"
htscanner.stop_on_error = 0
htscanner.default_ttl = 500

You can check whether this change worked by:

php -i | grep htscanner

If you get the following; then you have success:

htscanner
htscanner support => enabled
CVS Id => $Id: htscanner.c,v 1.25 2009/03/04 00:40:16 pajoye Exp $
htscanner.config_file => .htscanner => .htscanner
htscanner.default_docroot => /var/www => /var/www
htscanner.default_ttl => 500 => 500
htscanner.stop_on_error => 0 => 0

An Example .htscanner File

All you need to put into your .htscanner file are your local php settings:

php_value register_long_arrays 1
php_value magic_quotes_gpc 1

Configuring NGINX

It’s not a bad idea to ensure that your .htscanner file can’t be read from the web. So pop the following config in the server stanza for your NGINX configuration:

        location ~ /.ht {
            deny  all;
        }

MOD_REWRITE: where did you go?

So if you use know wordpress you know that it massages the urls so they are search friendly. This is done using mod_rewrite and results in a .htaccess file with the following contents:

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress

Now as we mentioned, NGINX has it’s own URL rewrting method. You could be tempted to use it like so:

 location / {
      index index.php;
        if (-e $request_filename) {
          break;
        }
      rewrite ^(.+)$ /index.php?q=$1 last;
    }

But Igor, the author of NGINX has gone one better and has introduced the following configuration directive:

try_files $uri $uri/ /index.php?q=$uri;

Yes that’s right, all that guff down to one easy to understand directive. Funtastic!!

Conclusion

So maybe the title of this article was a little bit misleading. I will add the instructions for the other components, at a later date. I really just wanted to ease the burden of installing the htscanner module.

The upshot is that our sites now take about 5-9 seconds to load.

Apache can of course be tuned to perform better; but it’s the jack of all trades webserver. The architecture of PHP-FPM and NGINX is one that is already optimized and memory efficient so out of the box I believe it performs better.

*Note Debian users: your directives should probably go into an .ini file along the lines of:
/etc/php5/conf.d/htscanner.ini

Author:

Related Articles