How To: Remove ‘www.’ Permanently With .htaccess

If you’ve ever tried to remove the ‘www.’ prefix from your URLs in Apache by rewriting them (find out why you should), you may have found that it was harder than you’d imagined. Why are URLs ending with a trailing slash modified while others aren’t?

You’ve probably had a rewrite rule that would redirect URLs like www.sheeped.com/test/ to sheeped.com/test/, but not affect www.sheeped.com/test. You’re experiencing a trailing slash problem.

Excerpt from the Apache mod_rewrite guide:

Every webmaster can sing a song about the problem of the trailing slash on URL’s referencing directories. If they are missing, the server dumps an error, because if you say /~quux/foo instead of /~quux/foo/ then the server searches for a file named foo. And because this file is a directory it complains. Actually it tries to fix it itself in most of the cases, but sometimes this mechanism need to be emulated by you. For instance after you have done a lot of complicated URL rewritings to CGI scripts etc.

If you fix the trailing slash problem prior to removing the ‘www.’, every URL starting with ‘www.’ will be changed, even those lacking a trailing slash. The trailing slash fix alone is also useful if you want to enforce trailing slashes to avoid search engine problems.

This .htaccess bit removes all ‘www.’s from anything, and adds a trailing slash on anything but files with extensions (.gif, .php, etc.):

RewriteEngine On
# Fix trailing slash problem
RewriteRule ^([^\.]+[^/])$ http://sheeped.com/$1/ [R=301,L]
# Remove www., always.
RewriteCond %{HTTP_HOST} ^www\.sheeped\.com$ [NC]
RewriteRule ^(.*)$ http://sheeped.com/$1 [R=301,L]

Try it out with this ‘faulty’ URL (look at the address after the page has loaded): http://www.sheeped.com/2007/01/03/back-on-track

If you do not need to implement a trailing slash fix, you can just use the second rule to remove ‘www.’ from your domain:

RewriteEngine On
# Remove www., always.
RewriteCond %{HTTP_HOST} ^www\.sheeped\.com$ [NC]
RewriteRule ^(.*)$ http://sheeped.com/$1 [R=301,L]

—–

Notice: The afore-mentioned trailing slash implementation does not always work. All Apache setups are different. In fact, many of them are set up (cleverly enough) to not need trailing slash manipulation. I thought of the trailing slash fix not because it was a fix-all-problems solution for everyone, but because that is what has worked on all of the (often poorly set up) shared hosting and specific software I’ve used. If you are using software with periods in its file or directory names, this implementation will not work for you, as it does not allow it. Regex-savvy individuals will notice this immediately when looking at the rewrite pattern.

Update: I’ve been asked why I thought of the trailing slash fix when the original ‘www.’ removal works perfectly. Well, it does not — not always. Many will have implemented this and then realize that whenever they try to access software located within a directory on their website without adding a trailing slash, the site will load, but the ‘www.’ suffix has been magically re-appended, regardless of whether it was previously removed. This is exactly what has happened on almost all of the shared hosting services I’ve used, and it causes annoying problems with cookies and what not. Furthermore, the automatic inclusion of the trailing slash will prevent search engines from indexing the same page twice, one with a trailing slash and one without.


Related Posts