Improved URL auto-linking in Horde

Horde now uses John Gruber's regex pattern for matching URLs in text. This regex is used in the Horde_Text_Filter_Linkurls class, which already had a solid pattern, but the new one improves it in several ways:

  • Much better support for unicode characters
  • We now auto-link URLs like example.com/foo
  • Support for some limited parentheses matching so that URLs that contain matching parentheses are properly matched, even if a parentheses is the last character in the URL

Horde's pattern differs from the one posted by Gruber in a few ways, as well:

  • It does not match mailto: URLs. This is to better match how Horde has historically separated linking of web addresses from email addresses; mailto: links are handled by Horde_Text_Filter_Emails.
  • It matches URLs that start with ://
  • URL protocols are limited to 20 characters to avoid excessive memory use when PCRE does backtracking
  • "+" is allowed in a protocol (so svn+ssh:// works)

A good-sized list of test data has been incorporated into the Horde_Text_Filter test suite, but let me know if there are kinds of URLs that used to work but now don't, or if you see any other problems with the new auto-linking. Hopefully the new pattern will help polish the experience for those using the new Horde 4 alphas!