ElementParser Library Lets you parse XML & HTML Using CSS Selectors
One of the absolute most common things for an iPhone or Mac app to do is to call out to the web in some way, ask for some data, parse some bits and pieces out of it, and display it to the user all shiny-ly and pretty-ly. But if you’ve ever ventured down this path in development, you’ve probably found that the “parsing” bit can be quite tedious. Especially for those of you coming from the web development world. It almost seems like Apple just forgot to give us that part of their SDKs. Well not anymore. Because now you’ve read this blog post, and now, you’ve learned that the great folks over at Touch Tank have blessed your dev days with the ElementParser library. Allow you to parse that XML and HTML using traditional CSS Selectors!
Presently, accessing and manipulating HTML and XML in Cocoa can be incredibly frustrating. There are two existing choices (NSXMLParser and lib2xml) but neither work with HTML or “real-world” XML documents that are often not “perfect”. Their interfaces put all the work on you to map between the document and your program’s domain objects. They force you to write code that is hard to write and maintain. Somehow, something that starts out looking straightforward ends up becoming a science project or worse.
ElementParser is a lightweight framework to provide easy access to xml and html content. Rather than get lost in the complexities of the HTML and XML specifications, it aspires to not obscure their essential simplicity. It doesn’t do everything, it aspires to do “just enough”.
Let’s begin with some examples:
That document is a special element object that holds the top level element(s) (e.g.
<xml>) of your document. You now have a tree of Element objects which you can walk using methods like
parent. You can also access the data each element contains with methods like
Nice start. And sometimes this is enough. But let’s say you don’t want to walk the tree to find the data you need. How about:
Here we’re using a css-type selector to locate and return a matching element. Nice. Now we can parse a document and conveniently find elements of interest. (And before you wonder, yep, there’s a corresponding
selectElements: method that returns all matches.)
Next, let’s bind together your world of objects and the world of elements more closely. To do this, we’ll use the ElementParser directly to register callbacks into your code when an element is found (and its contents parsed).
Your code could then look like this:
Finally, all these html and xml documents often reside on the web. Wouldn’t it be great if we could use the pattern above to request the data from our URL, process the documents it contained incrementally as soon as they appeared or we had access to their data? Yep it’s got that too:
There’s a lot more available under the hood of ElementParser but this is probably plenty to get you excited. I have to thank the guys over at Touch Tank for really creating an absolutely great library and especially for making it open source to the community. Speaking of, Touch Tank would love to hear your feedback about ElementParser so send any thoughts you have to firstname.lastname@example.org.
The ElementParser framework (and its source code) is free of charge for non commercial uses (via a GPL license). For other commercial uses, the license fee is $100 per product. (That’s a couple of hours of your time, right?) Support plans are also available. Please contact email@example.com.
You can read more about ElementParser, peruse the source code and download the library to drop into your project at the project’s github page here: github.com/Objective3/ElementParser
P.S. - If you’re wondering how to “install” the library into your project, just drag the entire ‘Classes’ folder from the source code you get from github, (probably rename the folder to something like “ElementParser”) into your XCode Project. Then include “ElementParser.h” when you want to use it. By the way, ElementParser also works perfectly well on the iPhone.