RSS as HTML
Update (2024-01-28): I've removed support for this on my site's blog feed. To find out why, see my new post explaining why I removed my XSLT feeds template.
Have you seen the RSS Feed for this blog? Turns out the major browsers all support a decent subset of XSLT.
What's that mean though? It means you don't need to format your content as both a page and a feed. Your pages can be RSS feeds and visa versa. Your audience can visit the same URL in their browser as in their feed reader. Even if you aren't replacing existing pages with their feed alternate, you don't have to link browsers to unstyled XML documents (angle brackets panic the users). Instead, you can use all the same styles, scripting, and multimedia from the rest of your pages to hide RSS in plain sight.
RSS Primer
RSS, despite many companies' desire to see it die, remains one of the best ways to keep tabs on what people publish to the web. Thanks to Aaron Swartz, Dave Winer, and dozens of the folks behind a large number of the most popular CMS, a large contingent of web content authors today produce feeds you can subscribe to. Other people have done a better job espousing the virtues of RSS than I will here. What I hope you're more interested in is the guts of how it works in this context.
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>VE3ZSH - Blog</title>
<link>https://ve3zsh.ca/blog/index.html</link>
<description>Newest posts from VE3ZSH's blog.</description>
<language>en</language>
<item>
<title>RSS as HTML</title>
<link>https://ve3zsh.ca/blog/2020/08/03/rss-as-html.html</link>
</item>
<item>
<title>Animated Reel Menu</title>
<link>https://ve3zsh.ca/blog/2020/04/15/animated-reel-menu.html</link>
</item>
<item>
<title>How To Secure Application Credentials</title>
<link>https://ve3zsh.ca/blog/2020/01/24/how-to-secure-application-credentials.html</link>
</item>
</channel>
</rss>
It's almost comically simple if you really know HTML
(nesting
<div>
elements isn't HTML). Start with a doctype
<?xml …?>
.
Next, a document root <rss>
and body
<channel>
.
Instead of a separate head, you put the channel's metadata in the channel itself along with the
<item>
elements. Each item consists of a <title>
and <link>
. That's it!
Sure, you can add more. You can actually add anything you like. That's the extensible part of XML. It's all a matter of what feed readers are looking for. We'll come back to that later.
Format Wars
Some of you might be asking, what about Atom? I think there's some unnamed law of humans and technology that leads to format wars. Problems it solves:
- Invented at the IETF.
- Requires more code.
- Pretends more than text and HTML work in descriptions.
- Has
xmlns
attributes ensuring you link to the W3C.
You can use either RSS 2.0 or Atom, it doesn't matter. Just about every reader you can find supports both because they're functionally equivalent. Just different names for the same information.
Wait, RSS 2.0? What happened to earlier versions? There's a number of v0.XX versions as RSS went through changes before v1 was tagged. Some claim this protocol versioning and Mozilla's early non-compliant implementation were reasons Atom had the traction it did. While many feed readers can deal with these older versions, it makes much more sense to just use v2.0.
Another fun fact, RSS 3.0 is a thing. Nobody publishes their feed in it as far as I know. No readers support it as far as I know. Aaron standardized and published the specification for it though. What makes it different?
- No more XML; it's essentially plain text.
- No more HTML; it's plain text.
- Dates use an ISO 8601 profile.
Then there's JSON Feed. Again, nobody publishes or reads these as far as I know. What are its differences?
- It uses JSON instead of XML.
- Items don't need titles.
Modern needs such as avatar images…
You can even go h-feed if you want. It's a part
of the Microformats
standard and a way of adding a set of class
attributes to your existing pages. In
theory, feed readers could use your existing pages as a feed. In reality, it seems only to
appease the SEO gods.
There are a couple feed readers for this, but they're not clients. Seems the people working on this
want you to setup a number of servers for everything from authentication to
pub/sub and join what they're calling the IndieWeb.
You can read more about the architecture in their page about
social readers if that interests you.
I'm sure there's a dozen other feed formats that nobody actually reads or writes, specified on some blog, wiki, or webpage. Every single one is the same essential thing: a document that contains a series of URLs where new ones indicate new content.
Theory of Operation
With all of that out of the way, how do you get webpages that work in feed readers? Remember how I
said XML can contain anything you want. You can put
anything inside and it's up to a reader what it wants to look at? Well, you can include an
xml-stylesheet
processing
instruction right below the
<?xml …?>
declaration.
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet href="style.xml" type="text/xsl" media="screen"?>
<rss version="2.0">
<channel>
…
With that, you can link a document containing XSLT. XSLT are just a set of XML elements that an XSLT parser can use to transform an XML document. Feel free to explore the full element list along with the list of XPath functions. Wait, functions? Yeah, XPath and by extension XSLT offer a set of data transformation functions.
XHTML
How does this extra XML document help us? Well, the W3C spent a really long time between HTML version 4 and HTML version 5 specifying and standardizing XHTML. Despite being a complete waste of time, it means you can put HTML directly in XML.
Assuming your HTML isn't
garbage, there's usually little required to convert your
HTML to
XML. The first change from
HTML 5 is that every tag must be closed. This means
self closing tags like
<br>
need
to indicate self closure (<br/>
). This goes for all self closing tags including
<link>
,
<meta>
,
and
<input>
elements. You also can't get away with incorrect nesting
(<p>Hello, <em>World!</p></em>
) or the magic rules around tags like
<p>
and
<li>
which automatically close when they encounter any element other than
phrasing elements.
Likewise, any boolean attributes without a value need to be given their proper value. This includes
attributes like
hidden
,
defer
,
and
checked
.
They all need their own name as the attribute value (hidden="hidden"
).
Besides that, the only other gotcha is
&
entities.
In XML, there are no named entities. All of them
must use their numbered unicode code point escape. For example a
would be
 
and <
would be <
. You can get any
code point fairly easily using
CyberChef.
XSL Templates
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/rss/channel">
<html lang="en">
<head>
<meta charset="utf-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<title><xsl:value-of select="title"/></title>
<script src="script.js" defer="defer"></script>
<link rel="stylesheet" href="style.css"/>
</head>
<body>
<h1>Recent Blog Posts</h1>
<ul>
<xsl:for-each select="item">
<li>
<a href="{link}">
<xsl:copy-of select="title/node()"/>
</a>
</li>
</xsl:for-each>
</ul>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Now that your HTML is also
XML we can define a template using it. We begin with
a doctype
<?xml …?>
.
The next line is an
<xsl:stylesheet>
root element that gives us access to the XSL
namespace (xmlns
) of
XSLT using the xsl:
element name prefix. Technically this element's name is
<http://www.w3.org/1999/XSL/Transform/stylesheet>
but the namespace lets us shorten
that.
With that out of the way we can define the body of our
XSLT which will be an
<xsl:template>
element. This element has a match
attribute which specifies an XPath of the elements in
the document the template is applied to. Here we specify /rss/channel
as there's
nothing we want access to from outside that scope. Inside the template, we put our
XHTML. Note that this can include a
<head>
section with elements like
<link rel="stylesheet">
and
<script>
.
Once the XSLT are applied you
have all the same HTML features you've come to
expect.
One thing we can do in the
<head>
is use an
<xsl:value-of select="title">
to get the value of the RSS <title>
from the <channel>
and use it as the title of the page. We select title
because our template already set the context to /rss/channel
and this is actually
selecting /rss/channel/title
.
Most of the document is boilerplate with a simple
<h1>
and <ul>
list to keep the example simple. The next important element is the
<xsl:for-each select="item">
.
This element loops over every element matching /rss/channel/item
because our current
context is /rss/channel
and it selects item
. For each item it will insert
the value inside itself into the template.
The
<xsl:copy-of select="title/node()">
here is similar to the <xsl:value-of>
we saw in the head except
<xsl:copy-of>
doesn't give us just the text, instead it gives us the element we
select and all its children. We use title/node()
to give us just the contents of the
<title>
element inside itself meaning we don't put a <title>
tag on
the page. Note our context is /rss/channel/item
so it's selecting
/rss/channel/item/title/node()
.
The other new construct is {link}
. These curly brackets can be used in
XSLT when you are inside an
attribute string but want to use an XPath expression to insert a value. Here the XPath is
link
or /rss/channel/item/link
given the context.
Advanced Cases
We've already covered 99% of what you'd need to write your own. In this section I'll go over some things that you might want to try in your own setup.
Accessing Element Attributes
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
<title>VE3ZSH - Blog</title>
<link rel="alternate" href="https://ve3zsh.ca/blog/index.html"/>
<updated>2020-08-03T00:00:00-04:00</updated>
<author>https://ve3zsh.ca/</author>
<id>https://ve3zsh.ca/</id>
<entity>
<title>RSS as HTML</title>
<link rel="alternate" href="https://ve3zsh.ca/blog/2020/08/03/rss-as-html.html">
<id>https://ve3zsh.ca/blog/2020/08/03/rss-as-html.html</id>
<updated>2020-08-03T00:00:00-04:00</updated>
</entity>
</feed>
If you do go the route of an Atom feed, you'll notice links are found inside attributes instead of
the values of <link>
elements. To access attributes in XPath, you use the
@attr
selector. For example, the anchor elements would go from
<a href="{link}">
to <a href="{link/@href}">
.
Descriptions & Other Tags
While all you need is <title>
and <link>
, I'd also suggest the
<description>
element. While some feeds don't include content, many people who use
feed readers prefer to get the content of the site in the feed itself (not me). To do this, you can
add a <description>
element to your
RSS feed <item>
elements. This
description technically supports HTML but it's often
best to stick to plain text for the widest compatibility.
While adding optional elements that some feed readers deem mandatory, I'd also suggest
<pubDate>
as there are some readers that won't display your feed at all without it. It's an
RFC 822
formatted date (e.g.
<pubDate>03 Aug 2020 00:00:00 GMT</pubDate>
).
If you're eager to put images into everyone's feed reader there's the
<image>
element. It lets you link a GIF,
JPEG, or
PNG for the channel to be displayed in a feed reader.
Be sure to read the spec for it as there's multiple required sub-elements. On a fun note, the
specification says the assumed dimensions of your image are
88x31, so feel free to get retro with it.
Sadly, nobody supports the
<cloud>
element as far as I know of, but technically this provides push based
RSS. Using an
HTTP-POST,
XML-RPC,
or SOAP based
API you can actually run a server
implementing the rssCloud
API. I wouldn't suggest going through the
effort to implement it though given broad lack of support in both feeds and readers. It's
interesting to see new standards like ActivityPub
and Micropub don't support or even acknowledge
this existing standard.
Better Dates
One of my only complaints with RSS is its use of RFC 822 dates. To get better dates, you can go one of two routes, substrings or custom element.
The substring method would be using an
<xsl:value-of>
element with a select
that includes the substring()
function to grab
pieces of the date. An example that grabs just the date would be
select="substring(pubDate, 1, 11)"
which would get you the value
03 Aug 2020
. It's not bad, but it's not really the date format I prefer.
To use an alternative format like
ISO 8601
it's simpler to just add a custom element. I use <isoDate>
but you can use whatever
you prefer. If you're using Atom, don't forget to obey the namespacing rules and put your new
element in its own namespace. Fun observation, most feed parsers don't care about
XML dogma.
Conclusion
I know it's been a long one. Why share all this? I'd love if more devs working on projects could
sneak RSS into the system. These days I just check my
feed reader about once a day or so for updates instead of getting sucked into hours of checking and
rechecking all the addicting content portals. Humanist technology should
focus on improving quality of life and getting out of the way. Too bad the economic model in vogue
focuses on trying to do the opposite.engaging