It is about the time for the another corollary of the Godwin’s law:
As an online discussion about validation grows longer, the probability of mentioning unencoded ampersands approaches one.
No kidding! The reason is that such ampersands are easily the most common validation error. I heard “Thats does that damn validator wants from me?” more than once. Some know the answer already, but don’t really care. Either it seems so innocent and unimportant that is not worth wasting the time, or code production is so out of control that trying to fix it may bring entire company down.
So who cares about ampersands? Only two? Roger says that unencoded ampersands can be a problem. Inspired by him I wrote a little demo to show how it works.
Nothing too complex — I simply try to pass 14 parameters to my PHP script which displays their values. First link has names of the parameters separated by unencoded ampersands, the second link has properly encoded href attribute. Try clicking them. What do we see? Instead of 14 values we have got only two (more in case you are using Opera — it’s browsers dependent), and they look weird…
Now, can this behaviour break your application? I’ll leave that for you to decide.
One more point to add — valid pages can also behave like this. This is because validator barks not on ampersands in an URL — if ampersand is followed by known entity name validator will be happy. It is an unrecognized entity what produces validation error.
You can check this here. All I did — I just removed parameter dummy from the href. All remaining parameters (except for id which is not precede by ampersand) have their corresponding entities so validator will remain silent. However results produced by the script should make programmer to cry out loudly.
So what can we do? There are some options:
- Do nothing.
- Encode them.
- Avoid ampersands in our
href’s — especially if we pass parameters for the script to extract some content.
Here is more on that. - Avoid ampersands by using different separator. We may use semicolon (;) for that purpose as encouraged by W3C.
If you are using PHP take a look at arg_separator.input and arg_separator.output settings in your php.ini file.