feed.rss and sitemap.xml

posted by Stephan Brumme

Discovery

In November 2010 I saw space shuttle Discovery in Cape Canaveral. A completely different kind of discovery is widespread among internet sites, especially blogs:
  1. RSS feeds for news readers
  2. Sitemaps for search engines
Both are available now for create.stephan-brumme.com.

News feed - feed.rss

My RSS 2.0 feed is generated on-the-fly by a simple PHP script.
Except for <lastBuildDate> and <pubDate>, the header (everything up to <item>) is static.
All <item> tags are filled while scanning the file system. The optional <description> is missing at the moment but might appear over the next days. A small excerpt is shown below:
feed.rss: <?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"> <channel> <title>create.stephan-brumme.com News Feed</title> <description>blogging about some weird computer stuff</description> <link>http://create.stephan-brumme.com</link> <atom:link href="http://create.stephan-brumme.com/feed.rss" rel="self" type="application/rss+xml" /> <lastBuildDate>Sat, 1 Oct 2011 22:00:00 +0000</lastBuildDate> <pubDate>Sat, 1 Oct 2011 22:00:00 +0000</pubDate> <language>en-us</language> <copyright>(C)2011 Stephan Brumme</copyright> <generator>Homegrown And Environmentally Friendly</generator> <image> <url>http://create.stephan-brumme.com/favicon.png</url> <title>create.stephan-brumme.com News Feed</title> <link>http://create.stephan-brumme.com</link> <width>16</width> <height>16</height> </image> <item> <title><![CDATA[Feed.rss and Sitemap.xml]]></title> <link>http://create.stephan-brumme.com/misc/rss-and-sitemap.html</link> <pubDate>Sat, 1 Oct 2011 22:00:00 +0000</pubDate> <guid isPermaLink="true">http://create.stephan-brumme.com/adsense/</guid> <source url="http://create.stephan-brumme.com/feed.rss">create.stephan-brumme.com</source> </item> <!-- several more items will follow here ... --> </channel> </rss>
I had to ran several validation tests with the W3 RSS validator until it passed all tests:

[Valid RSS]

My main problem was that the validator is quite picky about the date format. During the process of getting everything right, I learnt about PHP's constant DATE_RFC2822 (I never heard about it before), I learnt that I should use gmdate instead of date and found at least 2 small bugs in my homegrown Content Management System.

Search Engine Feed - sitemap.xml

Search engines like Google are responsible for the vast majority of my web site visitors. Often it takes several days - and sometimes over a month - for their web spiders to discover new additions to my web site. A sitemap.xml speeds up this discovery process by magnitudes. Read more about sitemaps and their XML specification.

The code is almost the same as used for the RSS feed. This time, date("c", $someTimestamp) works best for lastMod. The changefreq is manually set to monthly for all blogs entries and weekly for the front page. Priorities are manually set as well to 0.5 or 0.8. A small excerpt is shown below:
sitemap.xml: <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://create.stephan-brumme.com/</loc> <lastmod>2011-10-02T09:35:09+02:00</lastmod> <changefreq>weekly</changefreq> <priority>0.8</priority> </url> <url> <loc>http://create.stephan-brumme.com/misc/rss-and-sitemap.html</loc> <lastmod>2011-10-02T00:00:00+02:00</lastmod> <changefreq>monthly</changefreq> <priority>0.5</priority> </url> <!-- several more URLs will follow here ... --> </urlset>

Configuring Apache

Most Apache web servers only sent .PHP files to the PHP compiler. There are two options to generate the feeds with PHP:
  1. mod_rewrite
  2. parse .rss and .xml with PHP
If mod_rewrite is available, you can add these rules (regular expressions) to .htaccess:
RewriteEngine On RewriteRule ^sitemap.xml$ /sitemap.php [last] RewriteRule ^feed.rss$ /feed.php [last]
All accesses to sitemap.xml and feed.rss are redirected to some PHP files which in turn must contain the necessary code to generate the proper feeds.

The mod_rewrite techniques works very well but this time I went for method 2 and added this single line to .htaccess:
AddType x-mapp-php5 .xml .rss
sitemap.xml and feed.rss now actually exist on my server and contain all required PHP code.
Note: Be careful when adding other .rss or .xml files because this might produce undesired results, especially mis-interpretation of the first line's <? and ?>.
homepage