Amtrak, B-Movies, Web Development, and other nonsense

Draining the swamp

It’s best to imagine WordPress’s plugin ecosystem as a swamp. Swamps are terrible. You don’t want to be there. You run a constant risk of disease and/or drowning. Anything that sinks into the swamp–it’s not coming back.

Belarus-Peat Mining near Rudzensk-Swamp-2

I’ve been debugging an odd problem on our WordPress installations involving categories. On some sites, posts which are in have multiple categories don’t display more than one category. That would be strange enough, but the category permalinks are coming out in the format SITE_URL/category/foo with the title baz, where foo is one category and baz a different category:

<a href="http://my.wordpress.site/category/category1">Category2</a>

Strange, seemingly non-deterministic behavior? The usual suspects would be database corruption or a theme bug. Yet neither seemed likely in this case. Database corruption usually isn’t so…predictable…and we quickly verified that this error was occurring in both our custom themes and stock TwentyTwelve. That would leave a core bug (unlikely with something so fundamental, but still possible) or a bad plugin.

After several patient hours of tracing execution I’d narrowed the problem to the function WordPress uses when building up the category list: the_category(). The category link string was correct before going in for formatting and it came out mangled. WordPress uses filters to allow plugins to “hook in” and modify output. A search of our plugin code revealed the culprit: Remove Title Attributes.

WordPress adds title attributes to links by default, a behavior which apparently annoys the hell of many people, including at least one person at Lafayette in the past. This plugin simply removes them with a regex (I would be remiss if I didn’t link to the famous StackExchange thread about why you should never, ever, parse HTML with a regex). To accomplish this the plugin added a filter which washed the generated category code through its regex.

Unfortunately, the regex is improperly written. In jargon, it’s greedy. This is the expression evaluated:

` title='(.+)'`

If you pass a string with multiple URL fragments it’s going to match beginning on the title tag of the first URL and ending on the end tag of the last URL. A more properly focused regex would be this:

` title=\"([^"]*)\"`

That’s it. Mystery solved.

Unresolved, however, is the larger problem with the WordPress plugin ecosystem. This plugin was added to the plugin repository in August 2009. It has never been updated since. It has been broken from the very beginning. The author has disappeared. The support forums are moribund. There’s no github repository for me to fork, should I want to continue support, since WordPress in its infinite wisdom uses SVN for everything. Spend some time Googling and you’ll find people talking up this plugin, never realizing the problems inherent. It’s still being downloaded. This may be inexperience (I’m a Moodle veteran and new to WordPress) but I don’t see a good way to get the word out that this plugin has a serious bug. If WordPress allowed you to usurp a plugin then I could push out an updated version so at least you’d get notified in your Dashboard. All I can do is leave a review indicating that it’s broken in 3.5.1 (for this specific use case) and link back to this post.

Not that it matters overmuch in this case since we’re likely to deep-six it here, but the situation feels inadequate. There’s got to be a way to do better.

4 Comments

  1. Joe Dolson

    Hi! I re-wrote the remote title attribute elements for a plug-in with a larger-scale focus on accessibility called WP Accessibility, which replaces all of the functionality in Remove Title Attributes. I was just browsing the support notes for Remove Title Attributes (which I follow up on because of the nature of the plug-in repository, as you’ve commented…), and noticed your helpful notes. I’m actually using a slightly different regex to replace title attributes, though it’s not the same as what you suggest, and I’d appreciate your observations, if possible, on whether or not it causes the same bug.

    I’m not including the regex here in this comment, because I’m in doubt about whether the comment would get through with it – but I’ll leave a second comment including it.

    Thanks!

  2. Joe Dolson

    The regex:

    /\s*title\s*=\s*(["\']).*?\1/

  3. fultonc

    Looks safe enough, though I have to think that a solution based on DOM parsing would be safer–I don’t know if anyone’s doing that with WordPress. Thanks for following up and we’ll be giving WP Accessibility a look.

  4. Joe Dolson

    DOM Parsing would be better, certainly – but it would require loading a DOM parser into the front end of WordPress, which is likely to come with some significant overhead of its own. Ultimately, patching WordPress to eliminate the unnecessary title attributes is the best solution. Any other choice is just a stop-gap.

    Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *