DOMDocument, DOMXPath and invalid html
I've been doing some work with DOMDocument and DOMXPath to parse webpages recently, and even though I could get the XPath directly from Firefox or Chrome, it would not match in the $xpath->query();
What I found is that the $dom->LoadHTML($page); will handle invalid HTML, but mostly just by stripping it out.
This is fine, unless you depend on the structure for the XPath query.
The problem was the the page contained tables (we've all built them, missing the TBODY tags).