DOMParser Docs

    Convention used in this code
  • The first option is the default (Used if no argument is provided).

Class Initializer

Parse the HTML. Called once before running the class functions.

$dom = new DOMParser($html [,$options]);
Returns the PHP class object
  • $html [string] -
    • HTML/XML markup. Can be a properly encoded url, a string containing the html, or the html itself.
  • $options [associative array] -
    • curl => [true, false]
    • advancedSelectors => [false, true]
    • stripTags => Comma delimitted list of tags to ignore while creating the xml tree. Defaults to 'comments, php, script'.
      //The spans are considered to be inside the divs.
      <div> <!-- </div> --> <span></span> </div>
      <div> <? echo "</div>"; ?> <span></span> </div>

GET Function

Retrieve any elements without modifying their content

$dom->get($css [,$options]);
Returns the desired elements
  • $css [string] -
    • CSS selectors, see bottom of page.
  • $options [associative array]
    • target => [innerHTML, outerHTML, innerTEXT, tags]
      • eg $dom = new DOMParser('<div>text<img /></div>');
      • innerHTML - returns 'text<img />'
      • outerHTML - returns '<div>text<img /></div>'
      • innerTEXT - (innerHTML without its children) returns 'text'
      • tags - returns array([0]=>'<div>', [1]=>'</div>')
      • object - returns the DOMParser instance (to allow linked actions)
    • action => [run, close]
      • run; do nothing after returning value.
      • close; domparser instance is deleted after returning its value.
    • response => auto / string / array / object
      • 'auto' - if only one item, return as string, otherwise return array.
      • 'string' - '<div>A</div><div>B</div>'
      • 'array' - array([0]=>'<div>A</div>', [1]=>'<div>B</div>')
    • properties => List of properties [defaults to '']. eg:'id, class, rel'.
      • Response will be returned as an associative array in which each property is an element, and the html is referred to as 'content'.
        $dom = new DOMParser('<div id="a">txt</div>');
        $opts = array($properties=>'id class');
        $dom -> get('div', $opts); //array{[id]=>'a', [class]=>null, [content]=>txt }

SET Function

Retrieve and modify elements

$dom -> set([$css][,$html][,$target][,$action]);
Returns the modified html
    Additional convention used in set()
  • All arguments are optional
  • Arguments are not case sensitive.
  • Arguments may be in any order, but $css must be after $html
  • $css [string] -
    • CSS selectors, see here.
      If skipped and queue exists, the queue will be processed.
  • $html [non-associative array of elements or associative list of properties/styles]
      Content to replace selected elements with. If skipped, the element will be deleted. Accepted formats are:
    • [formatted HTML]
    • array([formatted HTML], [formatted HTML])
    • array([property] => [value])
    • array([style]=>[value])
  • $target [innerHTML, outerHTML, top, bottom, before, after, tag, properties, styles]
  • $action [rebuild, run, close, queue]

Selectors

DOMParser supports most CSS2 & CSS3, as well as a few psuedo selectors

CSS3 Selectors
  • Yeah, yeah. We got to make a list of what is and is not supported.
  • See here for the list of CSS3 selectors.
Custom Psuedo Selectors
  • At the moment, there is no support for custom psuedo selectors, but they are scheduled for the 0.7 release.
Native Psuedo Selectors
  • Parent selectors (<): returns the parent of previous selected arguments.
  • Iterator(<3): The following can be followed by an iterator: '>', '<', '+', ''.
    '>3' means all descendants three levels lower, otherwise written as: ' > * > * >'.
  • Iterator includes (<3+): The iterator can be followed by a plus or minus;
    '>3-' means all children that are within 3 generations (inclusive), otherwise written '* > *,* > * > *, * > * > * > *'
    '>3+' means all children from 3 generations on (inclusive), written '* > * > * *.
  • Multiple detracters (div[class!~^=myClas]): All detracters may be combined, so the above would mean "a div with no classes that begin with 'myClas'".
    Why is this not part of the standard?!