htmlentities() in PHP is Your Friend

Friday, October 14. 2011

After constantly badgering a certain library calendar vendor over 2 years to fix his software's RSS feed charset issues. Personally I don't think getting raw text describing times in the form of "6:30–8:30 p.m." is all that valuable...that's just a single example. The calendar website declared no charset information. I have no idea what charset the database is in, and the RSS feed was declared as ISO-8859-1. Our website, database, and everything else was declared as UTF-8, not that it really mattered though since the raw incoming text from the RSS feed was all garbled to begin with.

Every once and awhile I'd randomly try to find an answer to the problem. I've been through using all sorts of different algorithms to solve the problem. None of them seemed to work, until one day I saw someone mention on StackOverflow (unfortunately I've lost the link) that he tried using htmlentities() to solve his problem and it worked. I thought, "It couldn't be that simple..." However, I had nothing to lose and tested it. It worked. (What???) I still don't know why or how htmlentities managed to run a translation table on the garbled input to output the appropriate values, but I'm happy! Even my attempts at REGEX were unsuccessful, though I probably was just unable to find ALL the right bit-level character code sequences needed. Apparently the translation table that htmlentities uses is pretty darn thorough! Thanks, PHP team!

Okay, so that was the first use of htmlentities(). The second one?

I realized I overlooked a severe security hole in my forms. When users did not provide correct details in their forms, I was simply reinserting the values they provided back in to the HTML form's VALUE tag (or in the case of a textarea, just rendering the value between the tags). For some reason this didn't strike me as being severely stupid at the time. I don't know why. I guess the "never reprint what your users submit to you" only made me think of "back to the DOM" - but only outside form elements. Who knows why. This let someone who actually put some (minimal) thought into it to run whatever PHP code they wanted simply by submitting a form without all required data. Escape the form element by using a standard HTML closing tag, then start writing the PHP. If you wanted valid HTML, just make sure to also include a dummy HTML input or textarea field once done. Simple. (Note: I am also in the process of re-examining CHMOD values of files and folders.)

When I went back to "fix" my stupidity, I also initially thought of using PHP's filter functions. Although they worked, they also would sometimes (depending on user input) remove certain characters. Like a bolt of lightning (while I was eating lunch) it came to me. I just used htmlentities(), why not just use it again? I did. Now my forms are a bit more protected and our RSS feed is no longer displaying obnoxious characters to visitors due to the encoding mishaps of an external developer.

Sometimes PHP's little gems are so awesome...

jQuery and Google Analytics: Take 2

Thursday, December 9. 2010

Awhile back I had reported on my findings of how to use jQuery to automatically tag links on my site (that link to a site that is not my own) for event tracking with Google Analytics.

Although there are ways to use the pageView method and parse the data within Analytics so that it does not skew your view of overall visits/hits, I simply find it easier to mark these as events since they aren't actually page views, but events.

Unfortunately, it's helpful to see which external links are pertinent to people, and which page they're leaving your site from (and where they're going), so rather than identifying the event type itself (click), I used that field for a value instead so it could be categorized by any of these things from within Google Analytics.

The problem I was having personally was that the values I was using were not strings (in my own tests at least; the code on the blog should have worked, theoretically). I was using document.location (type: object) and passing it to the trackEvent() method, which checks for value types (and denies incorrect calls). Here is the updated (using the new Async method) code which is tested, and works:

Remember to change your own account ID in the code above!

There is no need to define your domain's actual information anywhere unless you want to track both subdomains as well as top-level domains under a single account. For that, take a look at the Google Analytics documentation. I haven't done this myself so would not be able to suggest a best-method approach for it. This code tracks any HTML-based links (links inserted dynamically with JS will not be tracked, you can do this with $(this).live('click', function()...) instead of $(this).click()) that point to anywhere NOT within your domain. If you use a redirect script as part of your domain, (ex: ...tracker.php?go=, then this will not track that page click as-is.

This code specifically defines the event as "Outbound", the category as the currently viewed page's URL, and the action field as the URL that the user will be taken to. Modify 'til your heart's content. :)

For even more accurate results, see how to delay the loading of the external site so that the tracking code has more time to accomplish its task (race conditions are fun): Google's Solution
When you're working with plugins, or trying to be as unobtrusive as possible while doing a bunch of different things to your original markup, sometimes you have the problem of specific elements being triggered by multiple actions.

What happens when one action ends up replacing (visually) or covering up one element with another? Typically it's not a problem - unless the element that was replaced or covered up had an event attached to it. I had this happen to me recently when one jQuery script (dynamic icon overlay using a span) covered up another script (onclick event for an image to load a modal window for a video player). Uh oh!

I could have ignored the problem and simply duplicated the original images and added the "play" icon directly to the images themselves. However, if I ever should want to change the icon, that would require modifying all of the images again - and I'd need to keep the original images somewhere if that were ever the case. I could have built the overlay in to the original script, but then it's sort of losing scope on its original focus and intention.

It seemed the best way to combat the problem was to try to get the event(s) (a click event in this case) that was registered on the IMG element, and apply it to the SPAN that was overlayed on top of it.

While searching for answers, I accidentally stumbled across slides from a talk by John Resig that contained a bit of jQuery that I had never known about before - but apparently every time I now hear about it, every one raves about it. It's jQuery's $.data() utility method. Through using this method (and outputting JSON data directly through an alert box using Firefox's proprietary toSource() function), I was able to ascertain information about the events associated with a specific element.

Note: The following alert would only work in Firefox. To alert JSON in all (native JSON unsupported) browsers, you'd want to use Douglas Crockford's JSON2. In newer browsers, you should be able to just use the stringify() function (I just chose not to).

The above JS code acting on the HTML element (in my code, anyway) created the following string:

At first I thought I could just copy the data from one element's data property to another element. Unfortunately that does not work as it doesn't actually register anything. I asked a question over at StackOverflow, and although the answers that I got weren't exactly what I was looking for, someone had already created a jQuery events copy plugin. I didn't want to have dependencies on other plugins in my code, so I looked over the plugin's code to go about my own solution.

Since not all handlers will be using a plugin (and therefore simply contain a function name), very few will have a namespace. I didn't bother with that in my code (but I may refactor later). Instead, I took the necessary information and simply looped the events registered with the element, and re-binded them using jQuery's own bind() method.

Keep in mind that this specific piece of code I was working on created an overlay (an element that surrounded and rested) atop the target element (thus the parent().bind). Your particular uses may vary. The 2nd bind() was also specific to this code's purpose. Although a SPAN tag was created, if the target element's parent was not an anchor element, it would create one and wrap itself in that. The only way to prevent the default behavior of an anchor tag was to issue a 2nd bind (I didn't know how to pass it in to the initial bind call).

Note: jQuery's "helper" events (ex: mouseenter, mouseleave) duplicate and/or enhance functionality of the mouseover and mouseout behaviors. Because of this, if you use jQuery to create a mouseenter event, internally it binds the element with a mouseover event, and then binds again with its own overriding event (from what I can tell, anyway). Therefore, my code here copies and registers both events (mouseenter and mouseover) and unfortunately will also fire both of them. This code (as-is) works best on native event types.

Unfortunately I was unable to determine how to unregister or remove specific events. After looping I could remove all events from an object, but I couldn't perceive a way to do it within the loop itself. If you had to target a specific handler in your loops, I don't know if it would be possible to move events, but only copy.
At some point in time you may find the need to get a list of all of the controllers within your application. It's actually quite simple so long as you don't need controllers from any plugins.

Place this code in any of your controllers and view it from the web (for instance, from the Users Controller):

You'll see something similar to the following:

The App::objects() method returns an array of objects of the given type, such as: 'model', 'controller', 'helper', or 'plugin' - it also accepts other parameters, such as "path" in case you do eventually need to check controllers for your plugins.

The array_diff() is there as a much simpler method to remove the AppController and PagesController from the returned results as they would most likely exist in your application regardless and aren't normally something you'd need to worry about with ACL as there are other means within Cake to handle access to them. You'll notice that array_diff doesn't return an array starting at index of 0 since it removes keys from the original array too. That shouldn't be a problem, but if it is you can always use a loop instead of array_diff() and just unset or splice the matching values.

I used the CakePHP convenience method of pr() (print_r surrounded by PRE tags), and a die() simply to show the returned results and print them to the screen. You'd probably prefer to double underscore the function name (__listControllers()) to make it a private method to the class (rather than publicly viewable via the web), and change pr() to return.

What's the purpose of this?
- Maybe you'd want to create a web interface for ACL and need to know which controllers to give/deny access to/from
- Maybe you want to create a navigation menu based on your controllers
...maybe you can think of something that I can't. :)

Alter to your own tastes. This is only a starting point.

Other useful links (check version compatibility in these resources):
Quick dessert: List all controllers of a CakePHP application
Automatically load all controllers and actions into ACO tables for ACL with a CakePHP Task
How to list all controllers


Thursday, September 23. 2010

My blog system has been hacked for the 2nd time now. I've begun writing a new system at the root of this domain (no more subdomain), so if your RSS feeds break, I've either moved to my new blog, or I've upgraded this blog to fix the hack for the interim - until I can get comments working (which will take me to just get Akismet integration as comments are already technically ready).

I'm don't think the blog system has been secured, so I'd suggest not posting any comments until or unless I update this post mentioning that it is now safe.

Update: Despire the german theme's "Written by" text, this blog's been reinstalled from scratch with the newest version software, but using the same database. Hopefully nothing's still infected. This should give me a decent amount of time to properly move the article entries and comments to the new blog system without worrying too much (and figure out a mod_rewrite rule for the SEO redirects).
(Page 1 of 38, totaling 188 entries) » next page