This commit is contained in:
steven 2025-08-11 22:23:30 +02:00
commit 72a26edcff
22092 changed files with 2101903 additions and 0 deletions

265
lib/sd/CHANGELOG.md Normal file
View file

@ -0,0 +1,265 @@
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
## [1.9.1] - 2019-10-20
### Fixed
- Fixed broken "text" selectors [#175](https://sourceforge.net/p/simplehtmldom/bugs/175/)
## [1.9] - 2019-05-30
### Added
- Added unit test for bug reports
- Added test for bug [#153](https://sourceforge.net/p/simplehtmldom/bugs/153/)
- Added test for bug [#163](https://sourceforge.net/p/simplehtmldom/bugs/163/)
- Added test for bug [#166](https://sourceforge.net/p/simplehtmldom/bugs/166/)
- Added test for bug [#169](https://sourceforge.net/p/simplehtmldom/bugs/169/)
- Added unit test for character sets UTF-8, CP1251 and CP1252 (#142)
- Added support for meta charset to parse_charset
- Added detection for CP1251 to parse_charset, using iconv
- Added LICENSE file (MIT) to the project root
- Added functions to `simple_html_dom_node`
- `remove`: Removes the current node recursively from the DOM tree
- `removeChild`: Removes a child node recursively from the DOM tree
- `hasClass`: Checks if the current node has the specified class name
- `addClass`: Adds one or more classes to the current node
- `removeClass`: Removes one or more classes from the current node
- `save`: Saves the current node to disk
### Changed
- Changed manual from custom implementation to MkDocs (https://www.mkdocs.org/)
### Fixed
- Fixed warning when trying to clear() the DOM on a null nodes list (#153)
- Fixed missing whitespace when returning plaintext (#163)
- Fixed broken detection of duplicate attributes (#166)
- Fixed broken detection of CP1252 (ISO-8859-1) documents (#142)
- Fixed error using next-sibling combinator ('E + F') on last child
- Fixed selector parsing for attribute selectors ending on "s" or "i" (#169)
## [1.8.1] - 2019-01-13
### Fixed
- Fixed various bugs related to parsing classes and ids
## [1.8] - 2019-01-13
### Added
- Added documentation for `simple_html_dom_node::find`
- Added documentation for `simple_html_dom_node::parse_selector`
- Added documentation for `simple_html_dom_node::seek`
- Added documentation for `simple_html_dom_node::match`
- Added unit tests for bug reports
- Added test for bug [#62](https://sourceforge.net/p/simplehtmldom/bugs/62/)
- Added test for bug [#79](https://sourceforge.net/p/simplehtmldom/bugs/79/)
- Added test for bug [#144](https://sourceforge.net/p/simplehtmldom/bugs/144/)
- Added unit tests for CSS selectors
- Added ability to define constants before simple_html_dom does
- 'DEFAULT_TARGET_CHARSET'
- 'DEFAULT_BR_TEXT'
- 'DEFAULT_SPAN_TEXT'
- 'MAX_FILE_SIZE'
- Added support for CSS combinators
- Added support for Child Combinator (`>`)
- Added support for Next Sibling Combinator (`+`)
- Added support for Subsequent Sibling Combinator (`~`)
- Added support for multiclass selectors (`.class.class.class`)
- Added support for multiattribute selectors (`[attr1][attr2][attribute3]`)
- Added support for attribute selectors
- Added support for pipe selectors (`|=`)
- Added support for tilde selectors (`~=`)
- Added support for case sensitivity selectors (`i` and `s`)
- Added unit tests for PHP compatibility to PHP 5.6+
- Added coding standard using PHP_CodeSniffer
### Changed
- Removed automatic filtering of 'tbody' selectors (#79)
> Remove 'tbody' from all selectors to maintain the previous state!
- Coding standard using PHP_CodeSniffer
### Fixed
- Fixed broken CSS selector attributes with value "0" (#62)
- Fixed broken simple_html_dom::load_file
- Fixed forward slashes in CSS selector breaks value matching using '*=' (#144)
- Fixed Universal Selectors
## [1.7] - 2018-12-10
### Added
- Added code documentation to improve readability
- Added unit tests for `simple_html_dom::$self_closing_tags`
- Added unit tests for `simple_html_dom::$optional_closing_tags`
- Added unit tests for bug reports
- Added test for bug [#56](https://sourceforge.net/p/simplehtmldom/bugs/56/)
- Added test for bug [#97](https://sourceforge.net/p/simplehtmldom/bugs/97/)
- Added test for bug [#116](https://sourceforge.net/p/simplehtmldom/bugs/116/)
- Added test for bug [#121](https://sourceforge.net/p/simplehtmldom/bugs/127/)
- Added test for bug [#127](https://sourceforge.net/p/simplehtmldom/bugs/127/)
- Added test for bug [#154](https://sourceforge.net/p/simplehtmldom/bugs/154/)
- Added test for bug [#160](https://sourceforge.net/p/simplehtmldom/bugs/160/)
- Added unit tests for memory management of the parser
- Added bit flags to `simple_html_dom::load()`
- Added bit flag `HDOM_SMARTY_AS_TEXT` to optionally filter Smarty scripts (#154)\
**Note**: Smarty scripts are no longer filtered by default!\
- Added build script to automate releases
- Added support for attributes without whitespace to separate them
### Changed
- Improved documentation and readability for `$self_closing_tags`
- Improved documentation and readability for `$block_tags`
- Improved documentation and readability for `$optional_closing_tags`
- Updated list of `simple_html_dom::$self_closing_tags`
- Removed 'spacer' (obsolete)
- Added 'area'
- Added 'col'
- Added 'meta'
- Added 'param'
- Added 'source'
- Added 'track'
- Added 'wbr'
- Updated list of `simple_html_dom::$optional_closing_tags`
- Removed "nobr" (obsolete)
- Added 'th' as closable element to 'td'
- Added 'td' as closable element to 'th'
- Added 'optgroup' with 'optgroup' and 'option' as closable elements
- Added 'optgroup' as closable element to 'option'
- Added 'rp' with 'rp' and 'rt' as closable elements
- Added 'rt' with 'rt' and 'rp' as closable elements
- Clarified meaning of `simple_html_dom->parent`
- Changed default `$offset` for `file_get_html()` from -1 to 0 (#161)
- Changed `simple_html_dom::load()` to remove script tags before replacing newline characters
- `simple_html_dom_node::text()` no longer adds whitespace to top level span elements (only to sub-elements)
- `simple_html_dom_node::text()` adds blank lines between paragraphs
- Normalized line endings in the repository to LF via `.gitattributes`
- Improved performance of `simple_html_dom::parse_charset()` by approximately 25%
- Improved performance of `simple_html_dom::parse()` by approximately 10%
### Deprecated
- `str_get_html()` is deprecated and should be replaced by `new simple_html_dom()`
### Removed
- Removed protected function `simple_html_dom::copy_until_char_escaped()`
### Fixed
- Fixed compatibility issues with PHP 7.3
- Fixed typo (#147)
- Fixed handling of incorrectly escaped text (#160)
- Restore functionality of `$maxLen` in `file_get_html()`
- Fixed load_file breaks if an error ocurred in another script
## [1.6] - 2014-05-28
### Added
- Added some ability to insert and create nodes
- Add ability to search the "noise" array
## [1.5] - 2012-09-10
### Added
- Added flag: LOCK_EX while calling "file_put_contents()"
- Added support for detecting the source html character set. This is used to convert characters when plaintext is requested.
- Other little fixes and features, too numerous to categorize
### Changed
- Error of "file_get_contents()" will be thrown as an exception
### Fixed
- Fixed the typo of "token_blank_t"
- Memory leak fixed
## [1.11] - 2008-12-14
### Added
- Supports xpath generated from Firebug
- New method "dump" of "simple_html_dom_node"
- New attribute "xmltext" of "simple_html_dom_node"
### Changed
- Remove preg_quote on selector match function: `[attribute*=value]`
- Element "Comment" will treat as children
### Fixed
- Fixed the problem with `<pre>`
- Fixed bug #2207477 (does not load some pages properly)
- Fixed bug #2315853 (Error with character after < sign)
## [1.10] - 2008-10-25
### Changed
- Negative indexes supports of "find" method, thanks for Vadim Voituk
- Constructor with automatically load contents either text or file/url, thanks for Antcs
- Fully supports wildcard in selectors
### Fixed
- Fixed bug of confusing by the < symbol inside the text
- Fixed bug of dash in selectors
- Fixed bug of `<nobr>`
- Fixed bug #2155883 (Nested List Parses Incorrectly)
- Fixed bug #2155113 (error with unclosed html tags)
## [1.00] - 2008-09-05
### Added
- New method "getAllAttributes" of "simple_html_dom_node"
- Supports full javascript string in selector: `$e->find("a[onclick=alert('hello')]")`
### Changed
- Changed selector "*=" to case-insentive
### Fixed
- Fixed the bug of selector in some critical conditions
- Fixed the bug of striping php tags
- Fixed the bug of remove_noise()
- Fixed the bug of noise in attributes
## [0.99] - 2008-08-03
### Changed
- Performance tuning (boost 10%)
- Memory requirement reduced by 25%
- Changed function name from "file_get_dom()" to "file_get_html()"
- Changed function name from "str_get_dom()" to "str_get_html()"
### Fixed
- Fixed bug #2011286 (Error with unclosed html tags)
- Fixed bug #2012551 (Error parsing divs)
- Fixed bug #2020924 (Error for missed tag)
- Fixed bug (problem with `<body>` tag's innertext)
## [0.98] - 2008-06-24
### Added
- Supports "multiple class" selector feature: `<div class="a b c"></div>`
- New "callback function" feature
- New "multiple selectors" feature: $dom->find('p,a,b')
- New examples
- Supports extract contents from HTML features: $dom->plaintext
### Changed
- Performance tuning (boost 20%)
- Changed simple_html_dom_node method name from "text()" to "makeup()"
### Fixed
- Fixed the bug of $dom->clear()
- Fixed the bug of text nodes' innertext
- Fixed the bug of comment nodes' innertext
- Fixed the bug of decendent selector with optional tags
## [0.97] - 2008-05-09
### Added
- New node type "comment" (eg. $dom->find('comment'))
- Add self-closing tags: 'base', 'spacer'
- New example "simple_html_dom_utility.php"
### Changed
- File and class name changed (html_dom_parser->simple_html_dom)
### Removed
- ($dom->save_file) will not support anymore
- Remove example "example_customize_parser.php"
### Fixed
- Fixed the bug of outertext (th)
- Fixed the bug of regular expression escaping chars ($dom->find)
- Fixed the bug while line-breaker and "\t" in tags
## [0.96] - 2008-04-27
### Added
- Reference section in manual
- Added traverse section in manual
- Added the solution while server behind proxy in FAQ (Thanks to Yousuke Shaggy)
- New method to remove attribute.
- New DOM operations(first_child, last_child, next_sibling, previous_sibling) (Request #1936000)
### Changed
- Now file_get_dom supports full file_get_contents parameters
### Fixed
- Fixed the bug of self-closing tags in the end of file
- Fixed the bug of blanks in the end of tag
- Fixed some typo of testcase
## [0.95] - 2008-04-13
### Added
- Supports tag name with namespace
### Changed
- New attribute filters (Thanks to Yousuke Kumakura)
- Refine structure of testcase
### Fixed
- Fix the bug of optional-closing tags
- Fix the bug of parsing the line break next to the tag's name
## [0.94] - 2008-04-06
### Added
- Add FAQ section in manual
### Fixed
- Fixed infinity loop while the source content is BAD HTML
- Fixed the bug of adding new attributes to self closing tags
- Fixed the bug of customize parser without $dom->remove_noise()

21
lib/sd/LICENSE Normal file
View file

@ -0,0 +1,21 @@
MIT License
Copyright (c) 2019 S.C. Chen, John Schlick, logmanoriginal
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View file

@ -0,0 +1,54 @@
<?php
// example of how to use advanced selector features
include('../simple_html_dom.php');
// -----------------------------------------------------------------------------
// descendant selector
$str = <<<HTML
<div>
<div>
<div class="foo bar">ok</div>
</div>
</div>
HTML;
$html = str_get_html($str);
echo $html->find('div div div', 0)->innertext . '<br>'; // result: "ok"
// -----------------------------------------------------------------------------
// nested selector
$str = <<<HTML
<ul id="ul1">
<li>item:<span>1</span></li>
<li>item:<span>2</span></li>
</ul>
<ul id="ul2">
<li>item:<span>3</span></li>
<li>item:<span>4</span></li>
</ul>
HTML;
$html = str_get_html($str);
foreach($html->find('ul') as $ul) {
foreach($ul->find('li') as $li)
echo $li->innertext . '<br>';
}
// -----------------------------------------------------------------------------
// parsing checkbox
$str = <<<HTML
<form name="form1" method="post" action="">
<input type="checkbox" name="checkbox1" value="checkbox1" checked>item1<br>
<input type="checkbox" name="checkbox2" value="checkbox2">item2<br>
<input type="checkbox" name="checkbox3" value="checkbox3" checked>item3<br>
</form>
HTML;
$html = str_get_html($str);
foreach($html->find('input[type=checkbox]') as $checkbox) {
if ($checkbox->checked)
echo $checkbox->name . ' is checked<br>';
else
echo $checkbox->name . ' is not checked<br>';
}
?>

View file

@ -0,0 +1,37 @@
<?php
// example of how to use basic selector to retrieve HTML contents
include('../simple_html_dom.php');
// get DOM from URL or file
$html = file_get_html('http://www.google.com/');
// find all link
foreach($html->find('a') as $e)
echo $e->href . '<br>';
// find all image
foreach($html->find('img') as $e)
echo $e->src . '<br>';
// find all image with full tag
foreach($html->find('img') as $e)
echo $e->outertext . '<br>';
// find all div tags with id=gbar
foreach($html->find('div#gbar') as $e)
echo $e->innertext . '<br>';
// find all span tags with class=gb1
foreach($html->find('span.gb1') as $e)
echo $e->outertext . '<br>';
// find all td tags with attribite align=center
foreach($html->find('td[align=center]') as $e)
echo $e->innertext . '<br>';
// extract text from table
echo $html->find('td[align="center"]', 1)->plaintext.'<br><hr>';
// extract text from HTML
echo $html->plaintext;
?>

View file

@ -0,0 +1,28 @@
<?php
include_once('../simple_html_dom.php');
// 1. Write a function with parameter "$element"
function my_callback($element) {
if ($element->tag=='input')
$element->outertext = 'input';
if ($element->tag=='img')
$element->outertext = 'img';
if ($element->tag=='a')
$element->outertext = 'a';
}
// 2. create HTML Dom
$html = file_get_html('http://www.google.com/');
// 3. Register the callback function with it's function name
$html->set_callback('my_callback');
// 4. Callback function will be invoked while dumping
echo $html;
?>

View file

@ -0,0 +1,5 @@
<?php
include_once('../simple_html_dom.php');
echo file_get_html('http://www.google.com/')->plaintext;
?>

View file

@ -0,0 +1,18 @@
<?php
// example of how to modify HTML contents
include('../simple_html_dom.php');
// get DOM from URL or file
$html = file_get_html('http://www.google.com/');
// remove all image
foreach($html->find('img') as $e)
$e->outertext = '';
// replace all input
foreach($html->find('input') as $e)
$e->outertext = '[INPUT]';
// dump contents
echo $html;
?>

View file

@ -0,0 +1,44 @@
<?php
include_once('../../simple_html_dom.php');
function scraping_digg() {
// create HTML DOM
$html = file_get_html('http://digg.com/');
// get news block
foreach($html->find('div.news-summary') as $article) {
// get title
$item['title'] = trim($article->find('h3', 0)->plaintext);
// get details
$item['details'] = trim($article->find('p', 0)->plaintext);
// get intro
$item['diggs'] = trim($article->find('li a strong', 0)->plaintext);
$ret[] = $item;
}
// clean up memory
$html->clear();
unset($html);
return $ret;
}
// -----------------------------------------------------------------------------
// test it!
// "http://digg.com" will check user_agent header...
ini_set('user_agent', 'My-Application/2.5');
$ret = scraping_digg();
foreach($ret as $v) {
echo $v['title'].'<br>';
echo '<ul>';
echo '<li>'.$v['details'].'</li>';
echo '<li>Diggs: '.$v['diggs'].'</li>';
echo '</ul>';
}
?>

View file

@ -0,0 +1,59 @@
<?php
include_once('simple_html_dom.php');
function scraping_generic($url, $search) {
// Didn't find it yet.
$return = false;
echo "reading the url: " . $url . "<br/>";
// create HTML DOM
$html = file_get_html($url);
echo "url has been read.<br/>";
// get article block
foreach($html->find($search) as $found) {
// Found at least one.
$return - true;
echo "found a: " . $search . "<br/><pre>";
$found->dump();
echo "</pre><br/>";
}
// clean up memory
$html->clear();
unset($html);
return $return;
}
// ------------------------------------------
error_log ("post:" . print_r($_POST, true));
$url = "";
if (isset($_POST['url']))
{
$url = $_POST['url'];
}
$search = "";
if (isset($_POST['search']))
{
$search = $_POST['search'];
}
?>
<form method="post">
URL: <input name="url" type="text" value="<?=$url;?>"/><br/>
Search: <input name="search" type="text" value="<?=$search;?>"/>
<input name="submit" type="submit" value="Submit"/>
</form>
<?php
// -----------------------------------------------------------------------------
// test it!
if (isset ($_POST['submit']))
{
$response = scraping_generic($_POST['url'], $_POST['search']);
if (!$response)
{
echo "Did not find any: " . $_POST['search'] . "<br />";
}
}
?>

View file

@ -0,0 +1,51 @@
<?php
include_once('../../simple_html_dom.php');
function scraping_IMDB($url) {
// create HTML DOM
$html = file_get_html($url);
// get title
$ret['Title'] = $html->find('title', 0)->innertext;
// get rating
$ret['Rating'] = $html->find('div[class="general rating"] b', 0)->innertext;
// get overview
foreach($html->find('div[class="info"]') as $div) {
// skip user comments
if($div->find('h5', 0)->innertext=='User Comments:')
return $ret;
$key = '';
$val = '';
foreach($div->find('*') as $node) {
if ($node->tag=='h5')
$key = $node->plaintext;
if ($node->tag=='a' && $node->plaintext!='more')
$val .= trim(str_replace("\n", '', $node->plaintext));
if ($node->tag=='text')
$val .= trim(str_replace("\n", '', $node->plaintext));
}
$ret[$key] = $val;
}
// clean up memory
$html->clear();
unset($html);
return $ret;
}
// -----------------------------------------------------------------------------
// test it!
$ret = scraping_IMDB('http://imdb.com/title/tt0335266/');
foreach($ret as $k=>$v)
echo '<strong>'.$k.' </strong>'.$v.'<br>';
?>

View file

@ -0,0 +1,35 @@
<?php
include_once('../../simple_html_dom.php');
function scraping_slashdot() {
// create HTML DOM
$html = file_get_html('http://slashdot.org/');
// get article block
foreach($html->find('div[id^=firehose-]') as $article) {
// get title
$item['title'] = trim($article->find('a.datitle', 0)->plaintext);
// get body
$item['body'] = trim($article->find('div.body', 0)->plaintext);
$ret[] = $item;
}
// clean up memory
$html->clear();
unset($html);
return $ret;
}
// -----------------------------------------------------------------------------
// test it!
$ret = scraping_slashdot();
foreach($ret as $v) {
echo $v['title'].'<br>';
echo '<ul>';
echo '<li>'.$v['body'].'</li>';
echo '</ul>';
}
?>

View file

@ -0,0 +1,35 @@
<?php
include_once('../simple_html_dom.php');
// -----------------------------------------------------------------------------
// remove HTML comments
function html_no_comment($url) {
// create HTML DOM
$html = file_get_html($url);
// remove all comment elements
foreach($html->find('comment') as $e)
$e->outertext = '';
$ret = $html->save();
// clean up memory
$html->clear();
unset($html);
return $ret;
}
// -----------------------------------------------------------------------------
// search elements that contains an specific text
function find_contains($html, $selector, $keyword, $index=-1) {
$ret = array();
foreach ($html->find($selector) as $e) {
if (strpos($e->innertext, $keyword)!==false)
$ret[] = $e;
}
if ($index<0) return $ret;
return (isset($ret[$index])) ? $ret[$index] : null;
}
?>

72
lib/sd/manual/README.md Normal file
View file

@ -0,0 +1,72 @@
This folder contains the source files for http://simplehtmldom.sourceforge.net/,
the project page for PHP Simple HTML DOM Parser.
Source files are written in Markdown: https://en.wikipedia.org/wiki/Markdown
Site data is generated by MkDocs, a lightweight static site generator for project
documentation: https://www.mkdocs.org/
# Folder structure
`custom_theme` : Contains customizations to the theme provided by MkDocs.
`docs` : Contains the source files for the project page (the actual pages).
`site` : Contains the output files for the project page when build with MkDocs.
`extra.css` : Customizations to the styles provided by MkDocs.
`mkdocs.yml` : The configuration file that is used by MkDocs to generate pages.
# Adding new pages
Place new files in `source`. Use subfolders (as few levels as possible) to
separate categories.
Files added to the manual will **not** appear on the project page automatically.
All pages need to be specified in the _mkdocs.yml_ file under "nav:". Simply add
the relative path to the new file where appropriate.
Note: Files are not added automatically because they are sorted by name if not
specified manually. Since readability is key factor for manuals, the files must
be sorted in a way that makes it clear to users.
# Setting up MkDocs
The installation instructions for MkDocs are provided on their homepage:
https://www.mkdocs.org/#installation
MkDocs automatically builds the project based on the _mkdocs.yml_ file. Find the
specification for this file at https://www.mkdocs.org/user-guide/configuration/.
# Building project pages
The build process depends on your installation of MkDocs. Typically MkDocs is
made available via the command line.
## Step 1 - Check your version of MkDocs
To check your version of MkDocs run this command:
`mkdocs --version` or
`python3 -m mkdocs --version`
Should return `version 1.0.4` or higher. If it doesn't make sure to install the
latest version using `pip install mkdocs` or `python3 -m pip install mkdocs`. If
you don't have pip installed, install it via package manager or follow the
instructions at https://pip.pypa.io/en/stable/installing/
## Step 2 - View the project locally
MkDocs allows you to view the project files in a browser on your local machine:
`mkdocs serve` or
`python3 -m mkdocs serve`
If the process is successful you can access the site at http://127.0.0.1:8000.
## Step 3 - Build the project
If you are satisfied with the results of the project, build the final project
with this command:
`mkdocs build` or
`python3 -m mkdocs build`
Find the output files in the `site` folder.

View file

@ -0,0 +1,7 @@
{% extends "base.html" %}
{% block footer %}
{% include "footer.html" %}
<hr>
<a class="logo" href="https://sourceforge.net/p/simplehtmldom/"><img alt="Download PHP Simple HTML DOM Parser" src="https://sourceforge.net/sflogo.php?type=16&group_id=218559" ></a>
{% endblock %}

View file

@ -0,0 +1,68 @@
---
title: API Reference
---
# Parsing documents
The parser accepts documents in the form of URLs, files and strings. The document
must be accessible for reading and cannot exceed [`MAX_FILE_SIZE`](constants.md#max_file_size).
Name | Description
---- | -----------
`str_get_html( string $content ) : object` | Creates a DOM object from string.
`file_get_html( string $filename ) : object` | Creates a DOM object from file or URL.
# DOM methods & properties
Name | Description
---- | -----------
`__construct( [string $filename] ) : void` | Constructor, set the filename parameter will automatically load the contents, either text or file/url.
`plaintext : string` | Returns the contents extracted from HTML.
`clear() : void` | Clean up memory.
`load( string $content ) : void` | Load contents from string.
`save( [string $filename] ) : string` | Dumps the internal DOM tree back into a string. If the $filename is set, result string will save to file.
`load_file( string $filename ) : void` | Load contents from a file or a URL.
`set_callback( string $function_name ) : void` | Set a callback function.
`find( string $selector [, int $index] ) : mixed` | Find elements by the CSS selector. Returns the Nth element object if index is set, otherwise return an array of object.
# Element methods & properties
Name | Description
---- | -----------
`[attribute] : string` | Read or write element's attribute value.
`tag : string` | Read or write the tag name of element.
`outertext : string` | Read or write the outer HTML text of element.
`innertext : string` | Read or write the inner HTML text of element.
`plaintext : string` | Read or write the plain text of element.
`find( string $selector [, int $index] ) : mixed` | Find children by the CSS selector. Returns the Nth element object if index is set, otherwise return an array of object.
# DOM traversing
Name | Description
---- | -----------
`$e->children( [int $index] ) : mixed` | Returns the Nth child object if index is set, otherwise return an array of children.
`$e->parent() : element` | Returns the parent of element.
`$e->first_child() : element` | Returns the first child of element, or null if not found.
`$e->last_child() : element` | Returns the last child of element, or null if not found.
`$e->next_sibling() : element` | Returns the next sibling of element, or null if not found.
`$e->prev_sibling() : element` | Returns the previous sibling of element, or null if not found.
# Camel naming conventions
Method | Mapping
------ | -------
`$e->getAllAttributes()` | `$e->attr`
`$e->getAttribute( $name )` | `$e->attribute`
`$e->setAttribute( $name, $value)` | `$value = $e->attribute`
`$e->hasAttribute( $name )` | `isset($e->attribute)`
`$e->removeAttribute ( $name )` | `$e->attribute = null`
`$e->getElementById ( $id )` | `$e->find ( "#$id", 0 )`
`$e->getElementsById ( $id [,$index] )` | `$e->find ( "#$id" [, int $index] )`
`$e->getElementByTagName ($name )` | `$e->find ( $name, 0 )`
`$e->getElementsByTagName ( $name [, $index] )` | `$e->find ( $name [, int $index] )`
`$e->parentNode ()` | `$e->parent ()`
`$e->childNodes ( [$index] )` | `$e->children ( [int $index] )`
`$e->firstChild ()` | `$e->first_child ()`
`$e->lastChild ()` | `$e->last_child ()`
`$e->nextSibling ()` | `$e->next_sibling ()`
`$e->previousSibling ()` | `$e->prev_sibling ()`

View file

@ -0,0 +1,33 @@
---
title: Constants
---
# Constants
Constants define how the parser treats documents. They can be defined before
loading the parser to globally replace the default values.
## DEFAULT_TARGET_CHARSET
Defines the default target charset for text returned by the parser.
Default: `'UTF-8'`
## DEFAULT_BR_TEXT
Defines the default text to return for `<br>` elements.
Default: `"\r\n"`
## DEFAULT_SPAN_TEXT
Defines the default text to return for `<span>` elements.
Default: `' '`
## MAX_FILE_SIZE
Defines the maximum number of bytes the parser can load into memory. This limit
only applies to the source file or string.
Default: `600000`

View file

@ -0,0 +1,100 @@
---
title: Definitions
---
# Definitions
The definitions below are an essential part of the parser.
## Node Types
The type of a node is determined during parsing and represented by one of the elements in the list below.
| Type | Description
| ---- | -----------
| `HDOM_TYPE_ELEMENT` | Start tag (i.e. `<html>`)
| `HDOM_TYPE_COMMENT` | HTML comment (i.e. `<!-- Hello, World! -->`)
| `HDOM_TYPE_TEXT` | Plain text (i.e. `Hello, World!`)
| `HDOM_TYPE_ENDTAG` | End tag (i.e. `</html>`)
| `HDOM_TYPE_ROOT` | Root element. There can always only be one root element in the DOM.
| `HDOM_TYPE_UNKNOWN` | Unknown type (i.e. CDATA, DOCTYPE, etc...)
### Example
```html
<!DOCTYPE html><html><!-- Hello, World! --></html>Hello, World!
```
_Note_: `HDOM_TYPE_ROOT` always exists regardless of the actual document structure.
| HTML | Node Type
| ---- | ---------
| | `HDOM_TYPE_ROOT`
| `<!DOCTYPE html>` | `HDOM_TYPE_UNKNOWN`
| `<html>` | `HDOM_TYPE_ELEMENT`
| `<!-- Hello, World! -->` | `HDOM_TYPE_COMMENT`
| `</html>` | `HDOM_TYPE_ENDTAG`
| `Hello, World!` | `HDOM_TYPE_TEXT`
## Quote Types
Identifies the quoting type on attribute values.
| Type | Description
| ---- | -----------
| `HDOM_QUOTE_DOUBLE` | Double quotes (`""`)
| `HDOM_QUOTE_SINGLE` | Single quotes (`''`)
| `HDOM_QUOTE_NO` | Not quoted (flag)
_Note_: Attributes with no values (flags) are stored as `HDOM_QUOTE_NO`.
### Example
```html
<p class="paragraph" id='info1' hidden>Hello, World!</p>
```
| Attribute | Description
| --------- | -----------
| `class="paragraph"` | `HDOM_QUOTE_DOUBLE`
| `id='info1'` | `HDOM_QUOTE_SINGLE`
| `hidden` | `HDOM_QUOTE_NO`
## Node Info Types
Each node stores additional information (metadata) that is identified by the elements below.
| Type | Description
| ---- | -----------
| `HDOM_INFO_BEGIN` | Cursor position for the start tag of a node.
| `HDOM_INFO_END` | Cursor position for the end tag of a node. A value of zero indicates a node with no end tag (missing closing tag).
| `HDOM_INFO_QUOTE` | Quote type for attribute values. The value must be an element of [Quote Type](#quote-types).
| `HDOM_INFO_SPACE` | Array of whitespace around attributes (see [Attribute Whitespace](#attribute-whitespace)).
| `HDOM_INFO_TEXT` | Non-HTML text in tags (i.e. comments, doctype, etc...).
| `HDOM_INFO_INNER` | Inner text of a node.
| `HDOM_INFO_OUTER` | Outer text of a node.
| `HDOM_INFO_ENDSPACE` | Whitespace at the end of a tag before the closing bracket.
## Attribute Whitespace
Whitespace around attributes is stored in the form of an array with three elements:
| Element | Description
| ------- | -----------
| `0` | Whitespace before the attribute name.
| `1` | Whitespace between attribute name and the equal sign.
| `2` | Whitespace between the equal sign and the attribute value
### Example
```html
<p class="paragraph" id = 'info1'hidden>Hello, World!</p>
```
_Note_: Whitespace before attribute names is not displayed in the browser. It is, however, part of the attributes.
| Attribute | Description
| --------- | -----------
| ` class="paragraph"` | `[0] => ' ', [1] => '', [2] => ''`
| ` id = 'info1'` | `[0] => ' ', [1] => ' ', [2] => ' '`
| `hidden` | `[0] => '', [1] => '', [2] => ''`

View file

@ -0,0 +1,25 @@
---
title: file_get_html
---
# file_get_html
```php
file_get_html ( string $url [, bool $use_include_path = false [, resouce $context = null [, int $offset = 0 [, int $maxLen = -1 [, bool $lowercase = true [, bool $forceTagsClosed = true [, string $target_charset = DEFAULT_TARGET_CHARSET [, bool $stripRN = true [, string $defaultBRText = DEFAULT_BR_TEXT [, string $defaultSpanText = DEFAULT_SPAN_TEXT ]]]]]]]]]] )
```
Parses the provided file and returns the DOM object.
| Parameter | Description
| --------- | -----------
| `url` | Name or URL of the file to read.
| `use_include_path` | See [`file_get_contents`](http://php.net/manual/en/function.file-get-contents.php#refsect1-function.file-get-contents-parameters)
| `context` | See [`file_get_contents`](http://php.net/manual/en/function.file-get-contents.php#refsect1-function.file-get-contents-parameters)
| `offset` | See [`file_get_contents`](http://php.net/manual/en/function.file-get-contents.php#refsect1-function.file-get-contents-parameters)
| `maxLen` | See [`file_get_contents`](http://php.net/manual/en/function.file-get-contents.php#refsect1-function.file-get-contents-parameters)
| `lowercase` | Forces lowercase matching of tags if enabled. This is very useful when loading documents with mixed naming conventions.
| `forceTagsClosed` | Obsolete. This parameter is no longer used by the parser.
| `target_charset` | Defines the target charset when returning text from the document.
| `stripRN` | If enabled, removes newlines before parsing the document.
| `defaultBRText` | Defines the default text to return for `<br>` elements.
| `defaultSpanText` | Defines the default text to return for `<span>` elements.

View file

@ -0,0 +1,20 @@
# __construct
```php
__construct ( [ string $str = null [, bool $lowercase = true [, bool $forceTagsClosed = true [, string $target_charset = DEFAULT_TARGET_CHARSET [, bool $stripRN = true [, string $defaultBRText = DEFAULT_BR_TEXT [, string $defaultSpanText = DEFAULT_SPAN_TEXT [, int $options = 0 ]]]]]]]]) : object
```
Creates a new `simple_html_dom` object.
| Parameter | Description
| --------- | -----------
| `str` | The HTML document string.
| `lowercase` | Tag names are parsed in lowercase letters if enabled.
| `forceTagsClosed` | Tags inside block tags are forcefully closed if the closing tag was omitted.
| `target_charset` | Defines the target charset for text returned by the parser.
| `stripRN` | Newline characters are replaced by whitespace if enabled.
| `defaultBRText` | Defines the default text to return for `<br>` elements.
| `defaultSpanText` | Defines the default text to return for `<span>` elements.
| `options` | Additional options for the parser. Currently supports `'HDOM_SMARTY_AS_TEXT'` to remove [Smarty](https://www.smarty.net/) scripts.
Returns the object.

View file

@ -0,0 +1,7 @@
# __destruct
```php
__destruct ()
```
Destroys the current object and clears memory.

View file

@ -0,0 +1,17 @@
# __get
```php
__get ( string $name ) : mixed
```
See [magic methods](http://php.net/manual/en/language.oop5.overloading.php#object.get)
Supports following names:
| Name | Description
| ---- | -----------
| `outertext` | Returns the outer text of the root element.
| `innertext` | Returns the inner text of the root element.
| `plaintext` | Returns the plain text of the root element.
| `charset` | Returns the charset for the document.
| `target_charset` | Returns the target charset for the document.

View file

@ -0,0 +1,7 @@
# __toString
```php
__toString () : string
```
Returns the inner text of the root element of the DOM.

View file

@ -0,0 +1,13 @@
# as_text_node (protected)
```php
as_text_node ( string $tag ) : bool
```
Adds a tag as text node.
| Parameter | Description
| --------- | -----------
| `tag` | The element's tag name.
Returns true on success.

View file

@ -0,0 +1,11 @@
# childNodes
```php
childNodes ( [ int $idx = -1 ] ) : mixed
```
Returns children of the root element.
| Parameter | Description
| --------- | -----------
| `idx` | Index of the child element to return.

View file

@ -0,0 +1,7 @@
# clear
```php
clear ()
```
Cleans up memory to prevent [PHP 5 circular references memory leak](https://bugs.php.net/bug.php?id=33595).

View file

@ -0,0 +1,13 @@
# copy_skip (protected)
```php
copy_skip ( string $chars ) : string
```
Skips characters starting at the current parsing position in the document. Sets the parsing position to the first character not in the provided list of characters.
| Parameter | Description
| --------- | -----------
| `chars` | A list of characters to skip.
Returns the skipped characters.

View file

@ -0,0 +1,13 @@
# copy_until (protected)
```php
copy_until ( string $chars ) : string
```
Copies all characters starting at the current parsing position in the document. Sets the parsing position to the first character that matches any of the characters in the provided list of characters.
| Parameter | Description
| --------- | -----------
| `chars` | A list of characters to stop copying at.
Returns the copied characters.

View file

@ -0,0 +1,13 @@
# copy_until_char (protected)
```php
copy_until ( string $char ) : string
```
Copies all characters starting at the current parsing position in the document. Sets the parsing position to the first character that matches the provided character.
| Parameter | Description
| --------- | -----------
| `char` | A character to stop copying at.
Returns the copied characters.

View file

@ -0,0 +1,14 @@
# createElement
```php
createElement ( string $name [, string $value = null ] ) : object
```
Creates a new element.
| Parameter | Description
| --------- | -----------
| `name` | Name of the element
| `value` | Value of the element
Returns the element.

View file

@ -0,0 +1,9 @@
# createTextNode
```php
createTextNode ( string $value ) : object
```
Creates a new text element.
Returns the element.

View file

@ -0,0 +1,13 @@
# dump
```php
dump ( [ bool show_attr = true ] ) : string
```
Dumps the entire DOM into a string. Useful for debugging purposes.
| Parameter | Description
| --------- | -----------
| `show_attr` | Attributes are included in the dump when enabled.
Returns the DOM tree as string.

View file

@ -0,0 +1,15 @@
# find
```php
find ( string $selector [, int $idx = null [, bool $lowercase = false ]] ) : mixed
```
Finds elements in the DOM.
| Parameter | Description
| --------- | -----------
| `selector` | A [CSS style selector](/manual/selectors).
| `idx` | Index of the element to return.
| `lowercase` | Matches tag names case insensitive when enabled.
Returns an array of matches or a single element if `idx` is defined.

View file

@ -0,0 +1,7 @@
# firstChild
```php
firstChild () : object
```
Returns the first child of the root element.

View file

@ -0,0 +1,13 @@
# getElementById
```php
getElementById ( string $id ) : object
```
Searches an element by id.
| Parameter | Description
| --------- | -----------
| `id` | ID of the element to find.
Returns the element or null if no match was found.

View file

@ -0,0 +1,13 @@
# getElementByTagName
```php
getElementByTagName ( string $name ) : object
```
Searches an element by tag name.
| Parameter | Description
| --------- | -----------
| `name` | Tag name of the element to find.
Returns the element or null if no match was found.

View file

@ -0,0 +1,14 @@
# getElementsById
```php
getElementsById ( string $id [, int $idx = null ] ) : object
```
Searches elements by id.
| Parameter | Description
| --------- | -----------
| `id` | ID of the element to find.
| `idx` | Returns the element at the specified index if defined.
Returns the element(s) or null if no match was found.

View file

@ -0,0 +1,14 @@
# getElementsByTagName
```php
getElementsByTagName ( string $name [, int $idx = -1 ] ) : object
```
Searches elements by tag name.
| Parameter | Description
| --------- | -----------
| `name` | Tag name of the element to find.
| `idx` | Returns the element at the specified index.
Returns the element(s) or null if no match was found.

View file

@ -0,0 +1,7 @@
# lastChild
```php
lastChild () : object
```
Returns the last child of the root element.

View file

@ -0,0 +1,12 @@
# link_nodes (protected)
```php
link_nodes ( object &$node, bool $is_child )
```
Links the provided node to the DOM tree.
| Parameter | Description
| --------- | -----------
| `node` | The node to link to the DOM tree.
| `is_child` | If active, makes the node a sibling of the current node (child of parent).

View file

@ -0,0 +1,18 @@
# load
```php
load ( string $str [, bool $lowercase = true [, bool $stripRN = true [, string $defaultBRText = DEFAULT_BR_TEXT [, string $defaultSpanText = DEFAULT_SPAN_TEXT [, int $options = 0 ]]]]]) : object
```
Loads the provided HTML document string.
| Parameter | Description
| --------- | -----------
| `str` | The HTML document string.
| `lowercase` | Tag names are parsed in lowercase letters if enabled.
| `stripRN` | Newline characters are replaced by whitespace if enabled.
| `defaultBRText` | Defines the default text to return for `<br>` elements.
| `defaultSpanText` | Defines the default text to return for `<span>` elements.
| `options` | Additional options for the parser. Currently supports `'HDOM_SMARTY_AS_TEXT'` to remove [Smarty](https://www.smarty.net/) scripts.
Returns the object.

View file

@ -0,0 +1,7 @@
# loadFile
```php
loadFile (...)
```
This function is a wrapper for [`load_file`](#load_file)

View file

@ -0,0 +1,9 @@
# load_file
```php
load_file (...) : object
```
Loads a HTML document from file. Supports arguments of [`file_get_contents`](http://php.net/manual/en/function.file-get-contents.php).
Returns the object.

View file

@ -0,0 +1,7 @@
# parse (protected)
```php
parse ()
```
Parses the document. This function is called after the document was loaded into `$this->doc`.

View file

@ -0,0 +1,13 @@
# parse_attr (protected)
```php
parse_attr ( object $node, string $name, array &$space )
```
Parses a single attribute starting at the current parsing position in the document.
| Parameter | Description
| --------- | -----------
| `node` | The current element (node).
| `name` | The attribute name.
| `space` | An array of whitespace sorounding the current attribute (see [Attribute Whitespace](../definitions/#attribute-whitespace)).

View file

@ -0,0 +1,15 @@
# parse_charset (protected)
```php
parse_charset ()
```
Parses the charset.
If the callback function `get_last_retrieve_url_contents_content_type` exists, it is assumed to return the content type header for the current document as string.
Uses the charset from the metadata of the page if defined.
If none of the previous conditions are met, the charset is determined by `mb_detect_encoding` if multi-byte support is active.
If multi-byte support is not active the charset is assumed to be `'UTF-8'`.

View file

@ -0,0 +1,14 @@
# prepare (protected)
```php
prepare ( string $str [, bool $lowercase = true [, string $defaultBRText = DEFAULT_BR_TEXT [, string $defaultSpanText = DEFAULT_SPAN_TEXT ]]] )
```
Initializes the DOM object.
| Parameters | Description
| ---------- | -----------
| `str` | The HTML document string.
| `lowercase` | Tag names are parsed in lowercase letters if enabled.
| `defaultBRText` | Defines the default text to return for `<br>` elements.
| `defaultSpanText` | Defines the default text to return for `<span>` elements.

View file

@ -0,0 +1,9 @@
# read_tag (protected)
```php
read_tag () : bool
```
Reads a single tag starting at the current parsing position in the document. The tag is automatically added to the DOM.
Returns true if a tag was found.

View file

@ -0,0 +1,7 @@
# remove_callback
```php
remove_callback ()
```
Removes the callback set by [`set_callback`](#set_callback).

View file

@ -0,0 +1,14 @@
# remove_noise (protected)
```php
remove_noise ( string $pattern [, bool $remove_tag = false] )
```
Replaces noise in the document (i.e. scripts) by placeholders and adds the removed contents to `$this->noise`.
_Note_: Noise is replaced by placeholders in order to allow restoring the original contents. Placeholders take the form of `'___noise___1000'` where the number is increased by one for each removed noise.
| Parameter | Description
| --------- | -----------
| `pattern` | A regular expression that matches the noise to remove.
| `remove_tag` | Removes the entire match when enabled or submatches when disabled.

View file

@ -0,0 +1,13 @@
# restore_noise (protected)
```php
restore_noise ( string $text ) : string
```
Restores noise in the provided string by replacing noise placeholders by their original contents.
| Parameter | Description
| --------- | -----------
| `text` | A string (potentially) containing noise placeholders.
Returns the string with original contents restored or the original string if it doesn't contain noise placeholders.

View file

@ -0,0 +1,13 @@
# save
```php
save ( [ string $filepath = '' ] ) : string
```
Writes the current DOM to file.
| Parameter | Description
| --------- | -----------
| `filepath` | Writes to file if the provided file path is not empty.
Returns the document string.

View file

@ -0,0 +1,13 @@
# search_noise (protected)
```php
search_noise ( string $text ) : string
```
Find a single noise element by providing the noise placeholder text.
| Parameter | Description
| --------- | -----------
| `text` | The noise placeholder to find.
Returns the original contents for the placeholder.

View file

@ -0,0 +1,12 @@
# set_callback
```php
set_callback ( string $function_name )
```
Sets the callback function which is called on each element of the DOM when building outertext.
The function must accept a single parameter of type `simple_html_dom_node`.
| Parameter | Description
| --------- | -----------
| `function_name` | Name of the function.

View file

@ -0,0 +1,40 @@
---
title: simple_html_dom
---
# simple_html_dom
Represents the [DOM](https://en.wikipedia.org/wiki/Document_Object_Model) in memory. Provides functions to parse documents and access individual elements (see [`simple_html_dom_node`](../simple_html_dom_node/simple_html_dom_node.md)).
# Public Properties
| Property | Description
| -------- | -----------
| `root` | Root node of the document.
| `nodes` | List of top-level nodes in the document.
| `callback` | Callback function that is called for each element in the DOM when generating outertext.
| `lowercase` | If enabled, all tag names are converted to lowercase when parsing documents.
| `original_size` | Original document size in bytes.
| `size` | Current document size in bytes.
| `_charset` | Charset of the original document.
| `_target_charset` | Target charset for the current document.
| `default_span_text` | Text to return for `<span>` elements.
# Protected Properties
| Property | Description
| -------- | -----------
| `pos` | Current parsing position within `doc`.
| `doc` | The original document.
| `char` | Character at position `pos` in `doc`.
| `cursor` | Current element cursor in the document.
| `parent` | Parent element node.
| `noise` | Noise from the original document (i.e. scripts, comments, etc...).
| `token_blank` | Tokens that are considered whitespace in HTML.
| `token_equal` | Tokens to identify the equal sign for attributes, stopping either at the closing tag ("/" i.e. `<html />`) or the end of an opening tag (">" i.e. `<html>`).
| `token_slash` | Tokens to identify the end of a tag name. A tag name either ends on the ending slash ("/" i.e. `<html/>`) or whitespace (`"\s\r\n\t"`).
| `token_attr` | Tokens to identify the end of an attribute.
| `default_br_text` | Text to return for `<br>` elements.
| `self_closing_tags` | A list of tag names where the closing tag is omitted.
| `block_tags` | A list of tag names where remaining unclosed tags are forcibly closed.
| `optional_closing_tags` | A list of tag names where the closing tag can be omitted.

View file

@ -0,0 +1,12 @@
# skip (protected)
```php
skip ( string $chars )
```
Skips characters starting at the current parsing position in the document. Sets the parsing position to the first character not in the provided list of characters.
| Parameter | Description
| --------- | -----------
| `chars` | A list of characters to skip.

View file

@ -0,0 +1,11 @@
# __construct
```php
__construct ( [ object $dom ] ) : object
```
| Parameter | Description
| --------- | -----------
| `dom` | An object of type [`simple_html_dom`](api/simple_html_dom/).
Constructs a new object of type `simple_html_dom_node`, assignes `$dom` as DOM object and adds itself to the list of nodes in `$dom`.

View file

@ -0,0 +1,7 @@
# __destruct
```php
__destruct ( )
```
Destructs the current object and frees memory.

View file

@ -0,0 +1,22 @@
# __get
```php
__get ( string $name ) : mixed
```
| Parameter | Description
| --------- | -----------
| `name` | `outertext`, `innertext`, `plaintext`, `xmltext` or attribute name.
See [magic methods](http://php.net/manual/en/language.oop5.overloading.php#object.get)
If the provided name is a valid attribute name, returns the attribute value. Otherwise a value according to the table below.
| Name | Description
| ---- | -----------
| `outertext` | Returns the outer text of the current node.
| `innertext` | Returns the inner text of the current node.
| `plaintext` | Returns the plain text of the current node.
| `xmltext` | Returns the xml representation for the inner text of the current node as a CDATA section.
Returns nothing if the provided name is neither a valid attribute name, nor a valid parameter name.

View file

@ -0,0 +1,19 @@
# __isset
```php
__isset ( string $name ) : bool
```
| Parameter | Description
| --------- | -----------
| `name` | `outertext`, `innertext`, `plaintext` or attribute name.
See [magic methods](http://php.net/manual/en/language.oop5.overloading.php#object.get)
Returns true if the provided name is a valid attribute name or any of the values in the table below. False otherwise.
| Name | Description
| ---- | -----------
| `outertext` | Returns the outer text of the current node.
| `innertext` | Returns the inner text of the current node.
| `plaintext` | Returns the plain text of the current node.

View file

@ -0,0 +1,18 @@
# __set
```php
__set ( string $name, mixed $value )
```
| Parameter | Description
| --------- | -----------
| `name` | `outertext`, `innertext` or attribute name.
| `value` | Value to set.
See [magic methods](http://php.net/manual/en/language.oop5.overloading.php#object.get)
Sets the outer text of the current node to `$value` if `$name` is `outertext`.
Sets the inner text of the current node to `$value` if `$name` is `innertext`.
Otherwise, adds or updates an attribute with name `$name` and value `$value` to the current node.

View file

@ -0,0 +1,7 @@
# __toString
```php
__toString ( ) : string
```
Returns the outer text of the current node.

View file

@ -0,0 +1,7 @@
# __unset
```php
__unset ( string $name )
```
Removes the attribute with name `$name` from the current node if it exists.

View file

@ -0,0 +1,23 @@
# addClass
```php
addClass ( mixed $class )
```
| Parameter | Description
| --------- | -----------
| `class` | Specifies one or more class names to be added.
Adds one or more class names to the current node.
**Remarks**
* To add more than one class, separate the class names with space or provide them as an array.
**Examples**
```php
$node->addClass('hidden');
$node->addClass('article important');
$node->addClass(array('article', 'new'));
```

View file

@ -0,0 +1,13 @@
# appendChild
```php
appendChild ( object $node ) : object
```
| Parameter | Description
| --------- | -----------
| `node` | An object of type [`simple_html_dom_node`](../simple_html_dom_node/)
Makes the current node parent of the node provided to this function.
Returns the provided node.

View file

@ -0,0 +1,15 @@
# childNodes
```php
childNodes ( [ int $idx = -1 ] ) : mixed
```
| Parameter | Description
| --------- | -----------
| `idx` | Index of the node to return or `-1` to return all nodes.
Returns all or one specific child node from the current node.
## Remarks
This function is a wrapper for [`children`](../children/)

View file

@ -0,0 +1,11 @@
# children
```php
children ( [ int $idx = -1 ] ) : mixed
```
| Parameter | Description
| --------- | -----------
| `idx` | Index of the node to return or `-1` to return all nodes.
Returns all or one specific child node from the current node.

View file

@ -0,0 +1,7 @@
# clear
```php
clear ( )
```
Sets all properties in the current node, which contain objects, to null.

View file

@ -0,0 +1,13 @@
# convert_text
```php
convert_text ( string $text ) : string
```
| Parameter | Description
| --------- | -----------
| `text` | Text to convert.
Assumes that the provided text is in the form of the configured source character set (see [`sourceCharset`](../simple_html_dom_node/) and converts it to the specified target character set (see [`targetCharset`](../simple_html_dom_node/)).
Returns the converted text.

View file

@ -0,0 +1,12 @@
# dump
```php
dump ( [ bool $show_attr = false [, int $depth = 0 ]] )
```
| Parameter | Description
| --------- | -----------
| `show_attr` | Attribute names are included in the output if enabled.
| `depth` | Depth of the current element
Dumps information about the current node and all child nodes recursively.

View file

@ -0,0 +1,11 @@
# dump_node
```php
dump_node ( [ bool $echo = true ] ) : mixed
```
| Parameter | Description
| --------- | -----------
| `echo` | Echoes the dump details directly if enabled.
Dumps information about the current document node. Returns a string if `$echo` is set to false, null otherwise.

View file

@ -0,0 +1,44 @@
# find
```php
find (
string $selector
[, int $idx = null ]
[, bool $lowercase = false ]
) : mixed
```
| Parameter | Description
| --------- | -----------
| `selector` | [CSS](https://www.w3.org/TR/selectors/) selector.
| `idx` | Index of element to return.
| `lowercase` | Matches tag names case insensitive (lowercase) if enabled.
Finds one or more nodes in the current document, using CSS selectors.
* Returns null if no match was found.
* Returns an array of [`simple_html_dom_node`](../simple_html_dom_node/) if `$idx` is null.
* Returns an object of type [`simple_html_dom_node`](../simple_html_dom_node/) if `$idx` is anything __but__ null.
## Supported Selectors
| Selector | Description
| --------- | -----------
| `*` | [Universal selector](https://www.w3.org/TR/selectors/#the-universal-selector)
| `E` | [Type (tag name) selector](https://www.w3.org/TR/selectors/#type-selectors)
| `E#id` | [ID selector](https://www.w3.org/TR/selectors/#id-selectors)
| `E.class` | [Class selector](https://www.w3.org/TR/selectors/#class-html)
| `E[attr]` | [Attribute selector](https://www.w3.org/TR/selectors/#attribute-selectors)
| `E[attr="value"]` | [Attribute selector](https://www.w3.org/TR/selectors/#attribute-selectors)
| `E[attr="value"] i` | [Case-sensitivity](https://www.w3.org/TR/selectors/#attribute-case)
| `E[attr="value"] s` | [Case-sensitivity](https://www.w3.org/TR/selectors/#attribute-case)
| `E[attr~="value"]` | [Attribute selector](https://www.w3.org/TR/selectors/#attribute-selectors)
| `E[attr^="value"]` | [Substring matching attribute selector](https://www.w3.org/TR/selectors/#attribute-substrings)
| `E[attr$="value"]` | [Substring matching attribute selector](https://www.w3.org/TR/selectors/#attribute-substrings)
| `E[attr*="value"]` | [Substring matching attribute selector](https://www.w3.org/TR/selectors/#attribute-substrings)
| `E[attr|="value"]` | [Attribute selector](https://www.w3.org/TR/selectors/#attribute-selectors)
| `E F` | [Descendant combinator](https://www.w3.org/TR/selectors/#descendant-combinators)
| `E > F` | [Child combinator](https://www.w3.org/TR/selectors/#child-combinators)
| `E + F` | [Next-sibling combinator](https://www.w3.org/TR/selectors/#adjacent-sibling-combinators)
| `E ~ F` | [Subsequent-sibling combinator](https://www.w3.org/TR/selectors/#general-sibling-combinators)
| `E, F` | [Selector list](https://www.w3.org/TR/selectors/#selector-list)

View file

@ -0,0 +1,11 @@
# find_ancestor_tag
```php
find_ancestor_tag ( string $tag ) : object
```
| Parameter | Description
| --------- | -----------
| `tag` | Tag name of the element to find.
Returns the first matching node that matches the specified tag name or null if no match was found.

View file

@ -0,0 +1,7 @@
# firstChild
```php
firstChild ( ) : mixed
```
This function is a wrapper for [`first_child`](../first_child/)

View file

@ -0,0 +1,7 @@
# first_child
```php
first_child ( ) : mixed
```
Returns the first child node of the current node or null if the current nod has no child nodes.

View file

@ -0,0 +1,7 @@
# getAllAttributes
```php
getAllAttributes ( ) : array
```
Returns all attributes for the current node.

View file

@ -0,0 +1,11 @@
# getAttribute
```php
getAttribute ( string $name ) : mixed
```
| Parameter | Description
| --------- | -----------
| `name` | Attribute name.
Returns the value for the attribute `$name`.

View file

@ -0,0 +1,11 @@
# getElementById
```php
getElementById ( string $id ) : object
```
| Parameter | Description
| --------- | -----------
| `id` | Element id.
Returns the first element with the specified id.

View file

@ -0,0 +1,11 @@
# getElementByTagName
```php
getElementByTagName ( string $name ) : object
```
| Parameter | Description
| --------- | -----------
| `name` | Tag name.
Returns the first element with the specified tag name.

View file

@ -0,0 +1,12 @@
# getElementsById
```php
getElementsById ( string $id [, int $idx = null] ) : mixed
```
| Parameter | Description
| --------- | -----------
| `id` | Element id.
| `idx` | Index of element to return.
Returns all elements with the specified id if `$idx` is null, or a specific one if `$idx` is a valid index.

View file

@ -0,0 +1,12 @@
# getElementsByTagName
```php
getElementsByTagName ( string $name [, int $idx = null ] ) : mixed
```
| Parameter | Description
| --------- | -----------
| `name` | Tag name.
| `idx` | Index of the element to return.
Returns all elements with the specified tag name if `$idx` is null, or a specific one if `$idx` is a valid index.

View file

@ -0,0 +1,9 @@
# get_display_size
```php
get_display_size ( ) : mixed
```
Returns false if the current node is not an image.
Returns an associative array of two elements - `height` and `width` - that represent the display size of the image.

View file

@ -0,0 +1,11 @@
# hasAttribute
```php
hasAttribute ( string $name ) : bool
```
| Parameter | Description
| --------- | -----------
| `name` | Name of the attribute.
Returns true if the current node has an attribute with the specified name.

View file

@ -0,0 +1,7 @@
# hasChildNodes
```php
hasChildNodes ( ) : bool
```
This is a wrapper function for [`has_child`](../has_child/).

View file

@ -0,0 +1,17 @@
# hasClass
```php
hasClass ( string $class ) : bool
```
| Parameter | Description
| --------- | -----------
| `class` | Specifies the class name to search for.
Returns true if the current node has the specified class name.
**Examples**
```php
$node->hasClass('article');
```

View file

@ -0,0 +1,7 @@
# has_child
```php
has_child ( ) : bool
```
Returns true if the current node has one or more child nodes.

View file

@ -0,0 +1,7 @@
# innertext
```php
innertext ( ) : string
```
Returns the inner text (everything inside the opening and closing tags) of the current node.

View file

@ -0,0 +1,11 @@
# is_utf8 (static)
```php
is_utf8 ( string $str ) : bool
```
| Parameter | Description
| --------- | -----------
| `str` | String to test.
Returns true if the provided string is a valid UTF-8 string.

View file

@ -0,0 +1,7 @@
# lastChild
```php
lastChild ( ) : object
```
This is a wrapper for [`last_child`](../last_child/).

View file

@ -0,0 +1,7 @@
# last_child
```php
last_child ( ) : object
```
Returns the last child of the current node or null if the current node has no child elements.

View file

@ -0,0 +1,7 @@
# makeup
```php
makeup ( ) : string
```
Returns the HTML representation of the current node.

View file

@ -0,0 +1,19 @@
# match (protected)
```php
match (
string $exp
, string $pattern
, string $value
, string $case_sensitivity
) : bool
```
| Parameter | Description
| --------- | -----------
| `exp` | Expression
| `pattern` | Pattern
| `value` | Value
| `case_sensitivity` | Case sensitivity
Matches a single attribute value against the specified attribute selector. See also [`find`](../find/).

View file

@ -0,0 +1,7 @@
# nextSibling
```php
nextSibling ( ) : object
```
This is a wrapper for [`next_sibling`](../next_sibling/).

View file

@ -0,0 +1,7 @@
# next_sibling
```php
next_sibling ( ) : object
```
Returns the next sibling of the current node or null if the current node has no next sibling.

View file

@ -0,0 +1,7 @@
# nodeName
```php
nodeName ( ) : string
```
Returns the name of the current node (tag name).

View file

@ -0,0 +1,7 @@
# outertext
```php
outertext ( ) : string
```
Returns the outer text (everything including the opening and closing tags) of the current node.

View file

@ -0,0 +1,12 @@
# parent
```php
parent ( [ object $parent = null ] ) : object
```
| Parameter | Description
| --------- | -----------
| `parent` | The parent node
* Returns the parent node of the current node if `$parent` is null.
* Sets the parent node of the current node if `$parent` is not null. In this case the current node is automatically added to the list of nodes in the parent node.

View file

@ -0,0 +1,7 @@
# parentNode
```php
parentNode () : object
```
Returns the current's node parent.

View file

@ -0,0 +1,11 @@
# parse_selector (protected)
```php
parse_selector ( string $selector_string ) : array
```
| Parameter | Description
| --------- | -----------
| `selector_string` | The selector string
Parses a CSS selector into an internal format for further use. See also [`find`](../find/).

View file

@ -0,0 +1,7 @@
# prevSibling
```php
prevSibling ( ) : object
```
This is a wrapper for [`previous_sibling`](../previous_sibling/).

View file

@ -0,0 +1,7 @@
# prev_sibling
```php
prev_sibling ( ) : object
```
Returns the previous sibling of the current node, or null if the current node has no previous sibling.

Some files were not shown because too many files have changed in this diff Show more