htmlentities() in PHP is Your Friend
Friday, October 14. 2011
After constantly badgering a certain library calendar vendor over 2 years to fix his software's RSS feed charset issues. Personally I don't think getting raw text describing times in the form of "6:30–8:30 p.m." is all that valuable...that's just a single example. The calendar website declared no charset information. I have no idea what charset the database is in, and the RSS feed was declared as ISO-8859-1. Our website, database, and everything else was declared as UTF-8, not that it really mattered though since the raw incoming text from the RSS feed was all garbled to begin with.
Every once and awhile I'd randomly try to find an answer to the problem. I've been through using all sorts of different algorithms to solve the problem. None of them seemed to work, until one day I saw someone mention on StackOverflow (unfortunately I've lost the link) that he tried using htmlentities() to solve his problem and it worked. I thought, "It couldn't be that simple..." However, I had nothing to lose and tested it. It worked. (What???) I still don't know why or how htmlentities managed to run a translation table on the garbled input to output the appropriate values, but I'm happy! Even my attempts at REGEX were unsuccessful, though I probably was just unable to find ALL the right bit-level character code sequences needed. Apparently the translation table that htmlentities uses is pretty darn thorough! Thanks, PHP team!
Okay, so that was the first use of htmlentities(). The second one?
I realized I overlooked a severe security hole in my forms. When users did not provide correct details in their forms, I was simply reinserting the values they provided back in to the HTML form's VALUE tag (or in the case of a textarea, just rendering the value between the tags). For some reason this didn't strike me as being severely stupid at the time. I don't know why. I guess the "never reprint what your users submit to you" only made me think of "back to the DOM" - but only outside form elements. Who knows why. This let someone who actually put some (minimal) thought into it to run whatever PHP code they wanted simply by submitting a form without all required data. Escape the form element by using a standard HTML closing tag, then start writing the PHP. If you wanted valid HTML, just make sure to also include a dummy HTML input or textarea field once done. Simple. (Note: I am also in the process of re-examining CHMOD values of files and folders.)
When I went back to "fix" my stupidity, I also initially thought of using PHP's filter functions. Although they worked, they also would sometimes (depending on user input) remove certain characters. Like a bolt of lightning (while I was eating lunch) it came to me. I just used htmlentities(), why not just use it again? ...so I did. Now my forms are a bit more protected and our RSS feed is no longer displaying obnoxious characters to visitors due to the encoding mishaps of an external developer.
Sometimes PHP's little gems are so awesome...
Every once and awhile I'd randomly try to find an answer to the problem. I've been through using all sorts of different algorithms to solve the problem. None of them seemed to work, until one day I saw someone mention on StackOverflow (unfortunately I've lost the link) that he tried using htmlentities() to solve his problem and it worked. I thought, "It couldn't be that simple..." However, I had nothing to lose and tested it. It worked. (What???) I still don't know why or how htmlentities managed to run a translation table on the garbled input to output the appropriate values, but I'm happy! Even my attempts at REGEX were unsuccessful, though I probably was just unable to find ALL the right bit-level character code sequences needed. Apparently the translation table that htmlentities uses is pretty darn thorough! Thanks, PHP team!
Okay, so that was the first use of htmlentities(). The second one?
I realized I overlooked a severe security hole in my forms. When users did not provide correct details in their forms, I was simply reinserting the values they provided back in to the HTML form's VALUE tag (or in the case of a textarea, just rendering the value between the tags). For some reason this didn't strike me as being severely stupid at the time. I don't know why. I guess the "never reprint what your users submit to you" only made me think of "back to the DOM" - but only outside form elements. Who knows why. This let someone who actually put some (minimal) thought into it to run whatever PHP code they wanted simply by submitting a form without all required data. Escape the form element by using a standard HTML closing tag, then start writing the PHP. If you wanted valid HTML, just make sure to also include a dummy HTML input or textarea field once done. Simple. (Note: I am also in the process of re-examining CHMOD values of files and folders.)
When I went back to "fix" my stupidity, I also initially thought of using PHP's filter functions. Although they worked, they also would sometimes (depending on user input) remove certain characters. Like a bolt of lightning (while I was eating lunch) it came to me. I just used htmlentities(), why not just use it again? ...so I did. Now my forms are a bit more protected and our RSS feed is no longer displaying obnoxious characters to visitors due to the encoding mishaps of an external developer.
Sometimes PHP's little gems are so awesome...
