15 September 2010

Apache .htaccess performance issues

The .htaccess file allows us to make configuration changes on a per directory basis., it loads every time a document is requested.

According to the Apache manual, in general, you should never use .htaccess files unless of course, you don't have access to the main server configuration file. The main reason not to use its performance. Apache will look in every directory for the .htaccess file. This search will produce a hit whether or not the file exist. Furthermore, apache will search until the root node for .htaccess files on each HTTP request, so calling /a/b/c/.htaccess means calling /a/b/.htaccess, /a/.htaccess and /.htaccess.

Assuming that you do have access to the configuration files do set the directive AllowOverride to None, this will only load once at Apache start (restart).

For those of you using shared hosting providers, note that this setting can be done only for your domain. If you are told otherwise by the provider, it’s a good time to search for another.

However, if you still want to go ahead and use .htaccess file please do, note that its default name can be changed with the directive AccessFilename, example:
AccessFilename .newName

In sum, if we can avoid hard drive reads for each request by placing the same configuration in a virtual host in the apache configuration, then why not? Even the manual tells us to do it

10 September 2010

Apache mod_rewrite and hotlink protection

Hot link protection it's tecnic to prevent other websites from using your resources like images, videos or scripts.
Hot link example:
<script type="text/javascript" src="http://www.example.com/js/jquery.js"></script>

If this bit of code was placed in the domain foo.bar than the request for the jquery.js file would be done to example.com. This leads to bandwidth costs for example.com and bandwidth savings from the foo.bar domain.

Another advantage of using other websites resources is caching. If the script file already exists in the user computer, because he previously visited example.com, than the file won't have to be downloaded, its ready to be used, bringing speed to the user experience, (extra points for your website). Aware of this, Google for example, allows us to hot link to some JavaScript libraries. The more websites using those libraries, the better.

But coming back to the prevention part. Mod_rewrite is a module for the Apache web server which is normally part of the default installation. It can be set in either in the virtual host configuration or .htaccess (be sure to read my post regarding the usage cost of .htaccess).
RewriteEngine on

After enabling it, you can create entries for the web sites you wish to exclude from accessing your resources.

Lets stop foo.bar domain from hot linking:
RewriteCond %{HTTP_REFERER} !^http://([^.]+\.)?foo.bar [NC]
RewriteRule \.(gif|jpg|png)$ /stophotlinkingme.gif [NC,L]


In the RewriteCond directive we tell apache witch website we whan to apply the rule to and in the RewriteRule part, we redirect any requests for .gif, .jpg and .png files from the foo.bar domain to an image called 'stophotlinkingme.gif'.

A final thought!!!
You could take advantage of the hot link, instead of stophotlinkingme.gif you could place a promotional image (payback time).

You can read more Google libraries in: http://code.google.com/apis/libraries/

09 September 2010

Apache changing index.html to customized.html

Using Apache mod_dir we can define a different default page for a website. Its also possible to change under the server the default for all websites with mod_autoindex which is the responsible for the automatic index generation.

When we type a web address, http://deverasarf.blogspot.com for example, we don't specify which page we want, so Apache will look for the resources specified in the DirectoryIndex directive

We can specify more than one default name and the server will sent the first one it finds. In case the server doesn't find any of the defaults it will generate the listing of the directory.

Example of alternative indexes:
DirectoryIndex index.php welcome.php welcome.html andreferreira.html

02 September 2010

Apache and automatic redirection

Say we need to rename one of our files but the page has a lot of incoming links from third party websites. We don't want to loose that traffic and sent a 404 error page to the user. One way to solve it is to use Apache redirection from the old URL path to the new file. The new URL for the request file (page, image, resource) is returned to the client which will attempt to fetch the page again using the new address. The redirection can be achieved either in the .htacess file or in the virtual host in Apache using mod_alias.

To rename a file, we use its absolute file system path and the old URL next. Example for a Linux machine:
Redirect /location/from/root/file.ext http://www.example.com/index.html

Apache also allows us to rename folders:
Redirect /oldir http://www.newsite.com/newdir

If we want to make the redirect permanent:
Redirect permanent /oldir http://www.newsite.com/newdir
This will send a HTTP status of 301 telling the client (browser) that the resource has moved for good.

By default, the status is 302, but temp can be used, it makes it more readable for us once we have a lot of redirects.

If we removed a resource for good and we won't replace it nor do we wish to sent the user to a specific page, we use the status gone. The HTTP stauts 410 will be sent indicating the resourse has been removed.

If we replaced a resource for another, say we upgrade a library file which contains the version in the name, and we don't want to keep the older version available online, we can sent the See Other status that will indicate that the resource has been replaced.

We can also use Apache mod_rewrite to atchive the redirects.
RewriteEngine On
RewriteRule /.* http://www.example.com/ [R]
This would redirect any request to the server to a new website, or to the root of the website if the domain was example.com.


01 September 2010

Apache custom error pages (ErrorDocument)

Mostly neglected, the custom error pages from Apache don't offer much. They inform the user that a error occurred and that's it. We can use them for that purpose but improve the global user’s website experience that the default error pages offer.

Either in the virtual host configuration file or in .htaccess we can change the defaults to whatever combination of folder and file we desire.

Virtual host configuration example:
<Directory /www/example>
ErrorDocument 401 /errors/authorizationRequired.html
ErrorDocument 403 “Today you can’t come in, try tomorrow”
ErrorDocument 404 /errors/notfound.html
ErrorDocument 500 http://foo.example.com/errors/internalservererror.html
</Directory>

Normally when the user was already browsing the website and gets a error page, he clicks the browser back button. However, if he just got in to the website and the first page he gets its a error page, if he clicks the back button he leaves the website. So changing this 404 example:
<h2>Not Found</h2>
<h3>The requested URL /example.html was not found on this server.</h3>


to:

<h2>The page you are looking for doesn’t seem to exit anymore</h2>
<h3>Please use the search button to try and find the content you are looking for</h3>
< !- - And this play the search form here - - ><div>
</div><div>


will improve the users experience and might just keep them on the website. A simple change!

The Apache manual calls your attention to several bits of relevant and important information regarding this changes, so be sure to read it here.