The ultimate urlReplacing character list for Umbraco
Friday, November 09, 2012 8:52:00 AM (GMT Standard Time, UTC+00:00)
If you're not already familiar with the built in character replacing functionality for urls in Umbraco then I highly recommend you check out the umbracoSettings.config file's urlReplacing node section:
urlReplacing: List of characters which will be replaced in generated urls. This ensures that urls does not contain characters that search engines or browsers do not understand. Umbraco comes with a predefined set of characters and you can add your own
One thing I find we often forget to update is the default list of characters -which isn't that conclusive so I thought I would update our default and share it for others so without further ado, here's the list:
<char org=" ">-</char>
I hope this is of use to someone. If you have ones that I've missed please let me know and I'll get them added. In some ways it would be nice if this was a regex rather than a character replace. Maybe that's a commit to the core I would look at one day.
Update: It would appear that my blogging engine/syntax highlighter is causing issues, the last rule should be a Euro symbol (?) and the quotes need to be encoded for XML e.g. ". Thanks @jbclarke and @greystate for spotting those
Is Google using Analytics data to crawl additional pages?
Monday, July 28, 2008 2:19:41 PM (GMT Daylight Time, UTC+01:00)
I've been wondering for a while how Google has managed to find a couple of hidden pages. Although they were securely locked down we noticed a few rejected GoogleBot requests in the audit logs. We put this down to the users having a Google toolbar installed but today we got an error from the new Avant Garde hair salons site that's just gone into beta testing which got me thinking.
This particular link is hidden behind a form post and within a jQuery call (to track an action) so not something the GoogleBot has easy access to. I know they're getting more clever but not *that* clever! We started getting the errors shortly after adding the final Google Analytics code so the only conclusion I can come to is that they're not just registering the URLs for reporting purposes but they're also using them to crawl additional pages.
Does anyone know if they use the URLs tracked in Google Analytics to find new pages? All I can say is if this is the case, you better make sure your "secure" pages check the access permissions on a page level!
What have I been up to?
Friday, September 21, 2007 11:20:01 PM (GMT Daylight Time, UTC+01:00)
It's been rather quiet on my blog recently, if you're wondering why (and don't chat to me on/off-line) I thought I would share with you what we've been working on recently.
For the past month or so The Site Doctor has been developing a new web site (www.wineandhampergifts.co.uk) for Porter and Woodman Gifts Ltd - a local company that produces personalised corporate hampers and gifts. It's been quite a challenge as they have a rather unusual ordering system that allows multiple recipients/addresses multiple items. Looking at it now, it's not so complicated but the delivery charge calculations and initial specs took a while to fully grasp. It's been really enjoyable.
I doubt most of my readers are interested on the in's and out's of the project itself but from an SEO perspective, I for one am expecting pretty decent results. We opted to use the URL Rewriting ISAPI from Helicon this time round over our usual IISMods URL Rewriting ISAPI as for some reason the IISMods site has been offline for a while (and checking now has been converted into a very weird site).
There's still more work that's needed to finalise the content and various aspects of the Wine and Hamper Gifts website but if you have a chance, check out the new Porter and Woodman Gifts Ltd Wine and Hamper Gifts website and leave a comment here letting me know what you think :D
Oh, and they've given us a pretty high target to get before Christmas so if you're thinking about treating your customers to a personalised corporate hamper or gift give a little thought to using www.wineandhampergifts.co.uk
Give your site a pulse
Tuesday, March 13, 2007 10:45:25 AM (GMT Standard Time, UTC+00:00)
Get your finger on the pulse of your site with this great new (free) RSS statistics service “PulseRSS”. I met the developers of PulseRSS the other day at my first Multipack meet (West Midlands based new media meet) which, if you’re nearby you should check out in the future as they’re a lovely bunch of guys (and girls apparently but they were no-where to be seen on Saturday).
Back to PulseRSS! As already mentioned, PulseRSS is a statistics service via an RSS/XML feed that works in a very similar way to Google Analytics but unlike Google Analytics, they’ve followed the principle of KISS which I think works really well, the interface is simple and easy to use and have I already mentioned it was free?
So if you’re looking for a simple free statistics package then check out PulseRSS –I’ve got it running on my blog already so it’ll be interesting to see how the stats compare to Google Analytics...
Custom 404 Error Pages
Friday, June 16, 2006 9:48:54 PM (GMT Daylight Time, UTC+01:00)
I made an interesting discovery this morning. A few weeks ago I was doing a little SEO on The Wargame Company (Devon) and thought I would look into utilising Google SiteMaps. After creating the XML file with the correct format it's just a matter of having Google approve it. They do this by accessing a random page i.e. www.domain.com/GooglesWonderfulPageddmmyyyyhhmmssmmm (which clearly should return a 404) and check the response code -I guess to ensure that you're not trying to spoof the pages in some way.
"What's the problem? I've got custom 404 pages" I hear you cry! Well, if like us you've written some fancy page to handle the error and email you/log it to a database, it turns out that you're not returning a 404 error at all!
What I discovered was that if you configure IIS to handle 404 error pages with a URL you're actually returning a response code of 200. After a little thinking, the only conclusion we could come to was that when setting it as a URL in IIS you're actually redirecting the request which is either a 301 or perhaps a 307 (see http://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html for more information on response codes) and then the final page the user hit's returns a 200 (Response Status "OK") rather than the desired 404 -clearly not what we want!
After a little more investigation we also found that the same thing happened when using ASP.Net's built in handlers and the same thing happens, the only time it doesn't is when you handle the 404 with a File in IIS rather than a URL.
"What can I do about it?" Well that's simple, if you're going to use a URL to handle your 404 errors, make sure you change the Response Status Codes to the correct code, i.e. 404, this is pretty simple to do:
ASP.Net 2.0: Page.Response.StatusCode = 404;
ASP.Net 1.1 (I think): Response.StatusCode = 404;
ASP: Response.Status = "404 You are Unauthorized"
I hope that helps someone out there!
Update: I've just run fiddler on The Wargame Company (Devon) and and can confirm you get a Response Status Code of 301 before the 200.
New TSD Design
Friday, June 09, 2006 5:08:07 PM (GMT Daylight Time, UTC+01:00)
Ok, The Site Doctor has moved on a fair amount since I started it up, we started off with a somewhat techy design (Version 1) which at the time I loved but as time went on felt it was somewhat cheesy so it was replaced with Version 2 in September 2004 and this has remained the face of The Site Doctor ever since. It’s a nice site but as far as the code quality is concerned it’s terrible, not to mention to SEO issues (to say the least!)
Site Design 1
Site Design 2
Site Design 3?
In the past both designs were tabular based and didn’t care too much for accessibility standards which are now at the fore-front of our minds so we felt it was once again time for a change, but what to do? Although I dabble and with enough time I can come up with some snazzy designs, this time I felt it was necessary to have someone “in the know” to put something together for us.
Mike from Butterfly Media stepped up to the mark with some great concepts, the current design he’s finalising for us is based on the following concepts (click the image to see the next example image): The New TSD design
Watch this space for an update in the next few weeks. I hope to have the final TSD design live shortly after I return from holiday along with a few other, well, niceties
I'd be interested to hear other people's thoughts on the new design