init
This commit is contained in:
commit
72a26edcff
22092 changed files with 2101903 additions and 0 deletions
265
lib/sd/CHANGELOG.md
Normal file
265
lib/sd/CHANGELOG.md
Normal file
|
|
@ -0,0 +1,265 @@
|
|||
# Changelog
|
||||
All notable changes to this project will be documented in this file.
|
||||
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
||||
|
||||
## [1.9.1] - 2019-10-20
|
||||
### Fixed
|
||||
- Fixed broken "text" selectors [#175](https://sourceforge.net/p/simplehtmldom/bugs/175/)
|
||||
|
||||
## [1.9] - 2019-05-30
|
||||
### Added
|
||||
- Added unit test for bug reports
|
||||
- Added test for bug [#153](https://sourceforge.net/p/simplehtmldom/bugs/153/)
|
||||
- Added test for bug [#163](https://sourceforge.net/p/simplehtmldom/bugs/163/)
|
||||
- Added test for bug [#166](https://sourceforge.net/p/simplehtmldom/bugs/166/)
|
||||
- Added test for bug [#169](https://sourceforge.net/p/simplehtmldom/bugs/169/)
|
||||
- Added unit test for character sets UTF-8, CP1251 and CP1252 (#142)
|
||||
- Added support for meta charset to parse_charset
|
||||
- Added detection for CP1251 to parse_charset, using iconv
|
||||
- Added LICENSE file (MIT) to the project root
|
||||
- Added functions to `simple_html_dom_node`
|
||||
- `remove`: Removes the current node recursively from the DOM tree
|
||||
- `removeChild`: Removes a child node recursively from the DOM tree
|
||||
- `hasClass`: Checks if the current node has the specified class name
|
||||
- `addClass`: Adds one or more classes to the current node
|
||||
- `removeClass`: Removes one or more classes from the current node
|
||||
- `save`: Saves the current node to disk
|
||||
### Changed
|
||||
- Changed manual from custom implementation to MkDocs (https://www.mkdocs.org/)
|
||||
### Fixed
|
||||
- Fixed warning when trying to clear() the DOM on a null nodes list (#153)
|
||||
- Fixed missing whitespace when returning plaintext (#163)
|
||||
- Fixed broken detection of duplicate attributes (#166)
|
||||
- Fixed broken detection of CP1252 (ISO-8859-1) documents (#142)
|
||||
- Fixed error using next-sibling combinator ('E + F') on last child
|
||||
- Fixed selector parsing for attribute selectors ending on "s" or "i" (#169)
|
||||
|
||||
## [1.8.1] - 2019-01-13
|
||||
### Fixed
|
||||
- Fixed various bugs related to parsing classes and ids
|
||||
|
||||
## [1.8] - 2019-01-13
|
||||
### Added
|
||||
- Added documentation for `simple_html_dom_node::find`
|
||||
- Added documentation for `simple_html_dom_node::parse_selector`
|
||||
- Added documentation for `simple_html_dom_node::seek`
|
||||
- Added documentation for `simple_html_dom_node::match`
|
||||
- Added unit tests for bug reports
|
||||
- Added test for bug [#62](https://sourceforge.net/p/simplehtmldom/bugs/62/)
|
||||
- Added test for bug [#79](https://sourceforge.net/p/simplehtmldom/bugs/79/)
|
||||
- Added test for bug [#144](https://sourceforge.net/p/simplehtmldom/bugs/144/)
|
||||
- Added unit tests for CSS selectors
|
||||
- Added ability to define constants before simple_html_dom does
|
||||
- 'DEFAULT_TARGET_CHARSET'
|
||||
- 'DEFAULT_BR_TEXT'
|
||||
- 'DEFAULT_SPAN_TEXT'
|
||||
- 'MAX_FILE_SIZE'
|
||||
- Added support for CSS combinators
|
||||
- Added support for Child Combinator (`>`)
|
||||
- Added support for Next Sibling Combinator (`+`)
|
||||
- Added support for Subsequent Sibling Combinator (`~`)
|
||||
- Added support for multiclass selectors (`.class.class.class`)
|
||||
- Added support for multiattribute selectors (`[attr1][attr2][attribute3]`)
|
||||
- Added support for attribute selectors
|
||||
- Added support for pipe selectors (`|=`)
|
||||
- Added support for tilde selectors (`~=`)
|
||||
- Added support for case sensitivity selectors (`i` and `s`)
|
||||
- Added unit tests for PHP compatibility to PHP 5.6+
|
||||
- Added coding standard using PHP_CodeSniffer
|
||||
### Changed
|
||||
- Removed automatic filtering of 'tbody' selectors (#79)
|
||||
> Remove 'tbody' from all selectors to maintain the previous state!
|
||||
- Coding standard using PHP_CodeSniffer
|
||||
### Fixed
|
||||
- Fixed broken CSS selector attributes with value "0" (#62)
|
||||
- Fixed broken simple_html_dom::load_file
|
||||
- Fixed forward slashes in CSS selector breaks value matching using '*=' (#144)
|
||||
- Fixed Universal Selectors
|
||||
|
||||
## [1.7] - 2018-12-10
|
||||
### Added
|
||||
- Added code documentation to improve readability
|
||||
- Added unit tests for `simple_html_dom::$self_closing_tags`
|
||||
- Added unit tests for `simple_html_dom::$optional_closing_tags`
|
||||
- Added unit tests for bug reports
|
||||
- Added test for bug [#56](https://sourceforge.net/p/simplehtmldom/bugs/56/)
|
||||
- Added test for bug [#97](https://sourceforge.net/p/simplehtmldom/bugs/97/)
|
||||
- Added test for bug [#116](https://sourceforge.net/p/simplehtmldom/bugs/116/)
|
||||
- Added test for bug [#121](https://sourceforge.net/p/simplehtmldom/bugs/127/)
|
||||
- Added test for bug [#127](https://sourceforge.net/p/simplehtmldom/bugs/127/)
|
||||
- Added test for bug [#154](https://sourceforge.net/p/simplehtmldom/bugs/154/)
|
||||
- Added test for bug [#160](https://sourceforge.net/p/simplehtmldom/bugs/160/)
|
||||
- Added unit tests for memory management of the parser
|
||||
- Added bit flags to `simple_html_dom::load()`
|
||||
- Added bit flag `HDOM_SMARTY_AS_TEXT` to optionally filter Smarty scripts (#154)\
|
||||
**Note**: Smarty scripts are no longer filtered by default!\
|
||||
- Added build script to automate releases
|
||||
- Added support for attributes without whitespace to separate them
|
||||
### Changed
|
||||
- Improved documentation and readability for `$self_closing_tags`
|
||||
- Improved documentation and readability for `$block_tags`
|
||||
- Improved documentation and readability for `$optional_closing_tags`
|
||||
- Updated list of `simple_html_dom::$self_closing_tags`
|
||||
- Removed 'spacer' (obsolete)
|
||||
- Added 'area'
|
||||
- Added 'col'
|
||||
- Added 'meta'
|
||||
- Added 'param'
|
||||
- Added 'source'
|
||||
- Added 'track'
|
||||
- Added 'wbr'
|
||||
- Updated list of `simple_html_dom::$optional_closing_tags`
|
||||
- Removed "nobr" (obsolete)
|
||||
- Added 'th' as closable element to 'td'
|
||||
- Added 'td' as closable element to 'th'
|
||||
- Added 'optgroup' with 'optgroup' and 'option' as closable elements
|
||||
- Added 'optgroup' as closable element to 'option'
|
||||
- Added 'rp' with 'rp' and 'rt' as closable elements
|
||||
- Added 'rt' with 'rt' and 'rp' as closable elements
|
||||
- Clarified meaning of `simple_html_dom->parent`
|
||||
- Changed default `$offset` for `file_get_html()` from -1 to 0 (#161)
|
||||
- Changed `simple_html_dom::load()` to remove script tags before replacing newline characters
|
||||
- `simple_html_dom_node::text()` no longer adds whitespace to top level span elements (only to sub-elements)
|
||||
- `simple_html_dom_node::text()` adds blank lines between paragraphs
|
||||
- Normalized line endings in the repository to LF via `.gitattributes`
|
||||
- Improved performance of `simple_html_dom::parse_charset()` by approximately 25%
|
||||
- Improved performance of `simple_html_dom::parse()` by approximately 10%
|
||||
### Deprecated
|
||||
- `str_get_html()` is deprecated and should be replaced by `new simple_html_dom()`
|
||||
### Removed
|
||||
- Removed protected function `simple_html_dom::copy_until_char_escaped()`
|
||||
### Fixed
|
||||
- Fixed compatibility issues with PHP 7.3
|
||||
- Fixed typo (#147)
|
||||
- Fixed handling of incorrectly escaped text (#160)
|
||||
- Restore functionality of `$maxLen` in `file_get_html()`
|
||||
- Fixed load_file breaks if an error ocurred in another script
|
||||
|
||||
## [1.6] - 2014-05-28
|
||||
### Added
|
||||
- Added some ability to insert and create nodes
|
||||
- Add ability to search the "noise" array
|
||||
|
||||
## [1.5] - 2012-09-10
|
||||
### Added
|
||||
- Added flag: LOCK_EX while calling "file_put_contents()"
|
||||
- Added support for detecting the source html character set. This is used to convert characters when plaintext is requested.
|
||||
- Other little fixes and features, too numerous to categorize
|
||||
### Changed
|
||||
- Error of "file_get_contents()" will be thrown as an exception
|
||||
### Fixed
|
||||
- Fixed the typo of "token_blank_t"
|
||||
- Memory leak fixed
|
||||
|
||||
## [1.11] - 2008-12-14
|
||||
### Added
|
||||
- Supports xpath generated from Firebug
|
||||
- New method "dump" of "simple_html_dom_node"
|
||||
- New attribute "xmltext" of "simple_html_dom_node"
|
||||
### Changed
|
||||
- Remove preg_quote on selector match function: `[attribute*=value]`
|
||||
- Element "Comment" will treat as children
|
||||
### Fixed
|
||||
- Fixed the problem with `<pre>`
|
||||
- Fixed bug #2207477 (does not load some pages properly)
|
||||
- Fixed bug #2315853 (Error with character after < sign)
|
||||
|
||||
## [1.10] - 2008-10-25
|
||||
### Changed
|
||||
- Negative indexes supports of "find" method, thanks for Vadim Voituk
|
||||
- Constructor with automatically load contents either text or file/url, thanks for Antcs
|
||||
- Fully supports wildcard in selectors
|
||||
### Fixed
|
||||
- Fixed bug of confusing by the < symbol inside the text
|
||||
- Fixed bug of dash in selectors
|
||||
- Fixed bug of `<nobr>`
|
||||
- Fixed bug #2155883 (Nested List Parses Incorrectly)
|
||||
- Fixed bug #2155113 (error with unclosed html tags)
|
||||
|
||||
## [1.00] - 2008-09-05
|
||||
### Added
|
||||
- New method "getAllAttributes" of "simple_html_dom_node"
|
||||
- Supports full javascript string in selector: `$e->find("a[onclick=alert('hello')]")`
|
||||
### Changed
|
||||
- Changed selector "*=" to case-insentive
|
||||
### Fixed
|
||||
- Fixed the bug of selector in some critical conditions
|
||||
- Fixed the bug of striping php tags
|
||||
- Fixed the bug of remove_noise()
|
||||
- Fixed the bug of noise in attributes
|
||||
|
||||
## [0.99] - 2008-08-03
|
||||
### Changed
|
||||
- Performance tuning (boost 10%)
|
||||
- Memory requirement reduced by 25%
|
||||
- Changed function name from "file_get_dom()" to "file_get_html()"
|
||||
- Changed function name from "str_get_dom()" to "str_get_html()"
|
||||
### Fixed
|
||||
- Fixed bug #2011286 (Error with unclosed html tags)
|
||||
- Fixed bug #2012551 (Error parsing divs)
|
||||
- Fixed bug #2020924 (Error for missed tag)
|
||||
- Fixed bug (problem with `<body>` tag's innertext)
|
||||
|
||||
## [0.98] - 2008-06-24
|
||||
### Added
|
||||
- Supports "multiple class" selector feature: `<div class="a b c"></div>`
|
||||
- New "callback function" feature
|
||||
- New "multiple selectors" feature: $dom->find('p,a,b')
|
||||
- New examples
|
||||
- Supports extract contents from HTML features: $dom->plaintext
|
||||
### Changed
|
||||
- Performance tuning (boost 20%)
|
||||
- Changed simple_html_dom_node method name from "text()" to "makeup()"
|
||||
### Fixed
|
||||
- Fixed the bug of $dom->clear()
|
||||
- Fixed the bug of text nodes' innertext
|
||||
- Fixed the bug of comment nodes' innertext
|
||||
- Fixed the bug of decendent selector with optional tags
|
||||
|
||||
## [0.97] - 2008-05-09
|
||||
### Added
|
||||
- New node type "comment" (eg. $dom->find('comment'))
|
||||
- Add self-closing tags: 'base', 'spacer'
|
||||
- New example "simple_html_dom_utility.php"
|
||||
### Changed
|
||||
- File and class name changed (html_dom_parser->simple_html_dom)
|
||||
### Removed
|
||||
- ($dom->save_file) will not support anymore
|
||||
- Remove example "example_customize_parser.php"
|
||||
### Fixed
|
||||
- Fixed the bug of outertext (th)
|
||||
- Fixed the bug of regular expression escaping chars ($dom->find)
|
||||
- Fixed the bug while line-breaker and "\t" in tags
|
||||
|
||||
## [0.96] - 2008-04-27
|
||||
### Added
|
||||
- Reference section in manual
|
||||
- Added traverse section in manual
|
||||
- Added the solution while server behind proxy in FAQ (Thanks to Yousuke Shaggy)
|
||||
- New method to remove attribute.
|
||||
- New DOM operations(first_child, last_child, next_sibling, previous_sibling) (Request #1936000)
|
||||
### Changed
|
||||
- Now file_get_dom supports full file_get_contents parameters
|
||||
### Fixed
|
||||
- Fixed the bug of self-closing tags in the end of file
|
||||
- Fixed the bug of blanks in the end of tag
|
||||
- Fixed some typo of testcase
|
||||
|
||||
## [0.95] - 2008-04-13
|
||||
### Added
|
||||
- Supports tag name with namespace
|
||||
### Changed
|
||||
- New attribute filters (Thanks to Yousuke Kumakura)
|
||||
- Refine structure of testcase
|
||||
### Fixed
|
||||
- Fix the bug of optional-closing tags
|
||||
- Fix the bug of parsing the line break next to the tag's name
|
||||
|
||||
## [0.94] - 2008-04-06
|
||||
### Added
|
||||
- Add FAQ section in manual
|
||||
### Fixed
|
||||
- Fixed infinity loop while the source content is BAD HTML
|
||||
- Fixed the bug of adding new attributes to self closing tags
|
||||
- Fixed the bug of customize parser without $dom->remove_noise()
|
||||
21
lib/sd/LICENSE
Normal file
21
lib/sd/LICENSE
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
MIT License
|
||||
|
||||
Copyright (c) 2019 S.C. Chen, John Schlick, logmanoriginal
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
54
lib/sd/example/example_advanced_selector.php
Normal file
54
lib/sd/example/example_advanced_selector.php
Normal file
|
|
@ -0,0 +1,54 @@
|
|||
<?php
|
||||
// example of how to use advanced selector features
|
||||
include('../simple_html_dom.php');
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
// descendant selector
|
||||
$str = <<<HTML
|
||||
<div>
|
||||
<div>
|
||||
<div class="foo bar">ok</div>
|
||||
</div>
|
||||
</div>
|
||||
HTML;
|
||||
|
||||
$html = str_get_html($str);
|
||||
echo $html->find('div div div', 0)->innertext . '<br>'; // result: "ok"
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
// nested selector
|
||||
$str = <<<HTML
|
||||
<ul id="ul1">
|
||||
<li>item:<span>1</span></li>
|
||||
<li>item:<span>2</span></li>
|
||||
</ul>
|
||||
<ul id="ul2">
|
||||
<li>item:<span>3</span></li>
|
||||
<li>item:<span>4</span></li>
|
||||
</ul>
|
||||
HTML;
|
||||
|
||||
$html = str_get_html($str);
|
||||
foreach($html->find('ul') as $ul) {
|
||||
foreach($ul->find('li') as $li)
|
||||
echo $li->innertext . '<br>';
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
// parsing checkbox
|
||||
$str = <<<HTML
|
||||
<form name="form1" method="post" action="">
|
||||
<input type="checkbox" name="checkbox1" value="checkbox1" checked>item1<br>
|
||||
<input type="checkbox" name="checkbox2" value="checkbox2">item2<br>
|
||||
<input type="checkbox" name="checkbox3" value="checkbox3" checked>item3<br>
|
||||
</form>
|
||||
HTML;
|
||||
|
||||
$html = str_get_html($str);
|
||||
foreach($html->find('input[type=checkbox]') as $checkbox) {
|
||||
if ($checkbox->checked)
|
||||
echo $checkbox->name . ' is checked<br>';
|
||||
else
|
||||
echo $checkbox->name . ' is not checked<br>';
|
||||
}
|
||||
?>
|
||||
37
lib/sd/example/example_basic_selector.php
Normal file
37
lib/sd/example/example_basic_selector.php
Normal file
|
|
@ -0,0 +1,37 @@
|
|||
<?php
|
||||
// example of how to use basic selector to retrieve HTML contents
|
||||
include('../simple_html_dom.php');
|
||||
|
||||
// get DOM from URL or file
|
||||
$html = file_get_html('http://www.google.com/');
|
||||
|
||||
// find all link
|
||||
foreach($html->find('a') as $e)
|
||||
echo $e->href . '<br>';
|
||||
|
||||
// find all image
|
||||
foreach($html->find('img') as $e)
|
||||
echo $e->src . '<br>';
|
||||
|
||||
// find all image with full tag
|
||||
foreach($html->find('img') as $e)
|
||||
echo $e->outertext . '<br>';
|
||||
|
||||
// find all div tags with id=gbar
|
||||
foreach($html->find('div#gbar') as $e)
|
||||
echo $e->innertext . '<br>';
|
||||
|
||||
// find all span tags with class=gb1
|
||||
foreach($html->find('span.gb1') as $e)
|
||||
echo $e->outertext . '<br>';
|
||||
|
||||
// find all td tags with attribite align=center
|
||||
foreach($html->find('td[align=center]') as $e)
|
||||
echo $e->innertext . '<br>';
|
||||
|
||||
// extract text from table
|
||||
echo $html->find('td[align="center"]', 1)->plaintext.'<br><hr>';
|
||||
|
||||
// extract text from HTML
|
||||
echo $html->plaintext;
|
||||
?>
|
||||
28
lib/sd/example/example_callback.php
Normal file
28
lib/sd/example/example_callback.php
Normal file
|
|
@ -0,0 +1,28 @@
|
|||
<?php
|
||||
include_once('../simple_html_dom.php');
|
||||
|
||||
|
||||
// 1. Write a function with parameter "$element"
|
||||
function my_callback($element) {
|
||||
if ($element->tag=='input')
|
||||
$element->outertext = 'input';
|
||||
|
||||
if ($element->tag=='img')
|
||||
$element->outertext = 'img';
|
||||
|
||||
if ($element->tag=='a')
|
||||
$element->outertext = 'a';
|
||||
}
|
||||
|
||||
|
||||
// 2. create HTML Dom
|
||||
$html = file_get_html('http://www.google.com/');
|
||||
|
||||
|
||||
// 3. Register the callback function with it's function name
|
||||
$html->set_callback('my_callback');
|
||||
|
||||
|
||||
// 4. Callback function will be invoked while dumping
|
||||
echo $html;
|
||||
?>
|
||||
5
lib/sd/example/example_extract_html.php
Normal file
5
lib/sd/example/example_extract_html.php
Normal file
|
|
@ -0,0 +1,5 @@
|
|||
<?php
|
||||
include_once('../simple_html_dom.php');
|
||||
|
||||
echo file_get_html('http://www.google.com/')->plaintext;
|
||||
?>
|
||||
18
lib/sd/example/example_modify_contents.php
Normal file
18
lib/sd/example/example_modify_contents.php
Normal file
|
|
@ -0,0 +1,18 @@
|
|||
<?php
|
||||
// example of how to modify HTML contents
|
||||
include('../simple_html_dom.php');
|
||||
|
||||
// get DOM from URL or file
|
||||
$html = file_get_html('http://www.google.com/');
|
||||
|
||||
// remove all image
|
||||
foreach($html->find('img') as $e)
|
||||
$e->outertext = '';
|
||||
|
||||
// replace all input
|
||||
foreach($html->find('input') as $e)
|
||||
$e->outertext = '[INPUT]';
|
||||
|
||||
// dump contents
|
||||
echo $html;
|
||||
?>
|
||||
44
lib/sd/example/scraping/example_scraping_digg.php
Normal file
44
lib/sd/example/scraping/example_scraping_digg.php
Normal file
|
|
@ -0,0 +1,44 @@
|
|||
<?php
|
||||
include_once('../../simple_html_dom.php');
|
||||
|
||||
function scraping_digg() {
|
||||
// create HTML DOM
|
||||
$html = file_get_html('http://digg.com/');
|
||||
|
||||
// get news block
|
||||
foreach($html->find('div.news-summary') as $article) {
|
||||
// get title
|
||||
$item['title'] = trim($article->find('h3', 0)->plaintext);
|
||||
// get details
|
||||
$item['details'] = trim($article->find('p', 0)->plaintext);
|
||||
// get intro
|
||||
$item['diggs'] = trim($article->find('li a strong', 0)->plaintext);
|
||||
|
||||
$ret[] = $item;
|
||||
}
|
||||
|
||||
// clean up memory
|
||||
$html->clear();
|
||||
unset($html);
|
||||
|
||||
return $ret;
|
||||
}
|
||||
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
// test it!
|
||||
|
||||
// "http://digg.com" will check user_agent header...
|
||||
ini_set('user_agent', 'My-Application/2.5');
|
||||
|
||||
$ret = scraping_digg();
|
||||
|
||||
foreach($ret as $v) {
|
||||
echo $v['title'].'<br>';
|
||||
echo '<ul>';
|
||||
echo '<li>'.$v['details'].'</li>';
|
||||
echo '<li>Diggs: '.$v['diggs'].'</li>';
|
||||
echo '</ul>';
|
||||
}
|
||||
|
||||
?>
|
||||
59
lib/sd/example/scraping/example_scraping_general.php
Normal file
59
lib/sd/example/scraping/example_scraping_general.php
Normal file
|
|
@ -0,0 +1,59 @@
|
|||
<?php
|
||||
include_once('simple_html_dom.php');
|
||||
|
||||
function scraping_generic($url, $search) {
|
||||
// Didn't find it yet.
|
||||
$return = false;
|
||||
|
||||
echo "reading the url: " . $url . "<br/>";
|
||||
// create HTML DOM
|
||||
$html = file_get_html($url);
|
||||
echo "url has been read.<br/>";
|
||||
|
||||
// get article block
|
||||
foreach($html->find($search) as $found) {
|
||||
// Found at least one.
|
||||
$return - true;
|
||||
echo "found a: " . $search . "<br/><pre>";
|
||||
$found->dump();
|
||||
echo "</pre><br/>";
|
||||
}
|
||||
|
||||
// clean up memory
|
||||
$html->clear();
|
||||
unset($html);
|
||||
|
||||
return $return;
|
||||
}
|
||||
|
||||
|
||||
// ------------------------------------------
|
||||
error_log ("post:" . print_r($_POST, true));
|
||||
$url = "";
|
||||
if (isset($_POST['url']))
|
||||
{
|
||||
$url = $_POST['url'];
|
||||
}
|
||||
$search = "";
|
||||
if (isset($_POST['search']))
|
||||
{
|
||||
$search = $_POST['search'];
|
||||
}
|
||||
?>
|
||||
<form method="post">
|
||||
URL: <input name="url" type="text" value="<?=$url;?>"/><br/>
|
||||
Search: <input name="search" type="text" value="<?=$search;?>"/>
|
||||
<input name="submit" type="submit" value="Submit"/>
|
||||
</form>
|
||||
<?php
|
||||
// -----------------------------------------------------------------------------
|
||||
// test it!
|
||||
if (isset ($_POST['submit']))
|
||||
{
|
||||
$response = scraping_generic($_POST['url'], $_POST['search']);
|
||||
if (!$response)
|
||||
{
|
||||
echo "Did not find any: " . $_POST['search'] . "<br />";
|
||||
}
|
||||
}
|
||||
?>
|
||||
51
lib/sd/example/scraping/example_scraping_imdb.php
Normal file
51
lib/sd/example/scraping/example_scraping_imdb.php
Normal file
|
|
@ -0,0 +1,51 @@
|
|||
<?php
|
||||
include_once('../../simple_html_dom.php');
|
||||
|
||||
function scraping_IMDB($url) {
|
||||
// create HTML DOM
|
||||
$html = file_get_html($url);
|
||||
|
||||
// get title
|
||||
$ret['Title'] = $html->find('title', 0)->innertext;
|
||||
|
||||
// get rating
|
||||
$ret['Rating'] = $html->find('div[class="general rating"] b', 0)->innertext;
|
||||
|
||||
// get overview
|
||||
foreach($html->find('div[class="info"]') as $div) {
|
||||
// skip user comments
|
||||
if($div->find('h5', 0)->innertext=='User Comments:')
|
||||
return $ret;
|
||||
|
||||
$key = '';
|
||||
$val = '';
|
||||
|
||||
foreach($div->find('*') as $node) {
|
||||
if ($node->tag=='h5')
|
||||
$key = $node->plaintext;
|
||||
|
||||
if ($node->tag=='a' && $node->plaintext!='more')
|
||||
$val .= trim(str_replace("\n", '', $node->plaintext));
|
||||
|
||||
if ($node->tag=='text')
|
||||
$val .= trim(str_replace("\n", '', $node->plaintext));
|
||||
}
|
||||
|
||||
$ret[$key] = $val;
|
||||
}
|
||||
|
||||
// clean up memory
|
||||
$html->clear();
|
||||
unset($html);
|
||||
|
||||
return $ret;
|
||||
}
|
||||
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
// test it!
|
||||
$ret = scraping_IMDB('http://imdb.com/title/tt0335266/');
|
||||
|
||||
foreach($ret as $k=>$v)
|
||||
echo '<strong>'.$k.' </strong>'.$v.'<br>';
|
||||
?>
|
||||
35
lib/sd/example/scraping/example_scraping_slashdot.php
Normal file
35
lib/sd/example/scraping/example_scraping_slashdot.php
Normal file
|
|
@ -0,0 +1,35 @@
|
|||
<?php
|
||||
include_once('../../simple_html_dom.php');
|
||||
|
||||
function scraping_slashdot() {
|
||||
// create HTML DOM
|
||||
$html = file_get_html('http://slashdot.org/');
|
||||
|
||||
// get article block
|
||||
foreach($html->find('div[id^=firehose-]') as $article) {
|
||||
// get title
|
||||
$item['title'] = trim($article->find('a.datitle', 0)->plaintext);
|
||||
// get body
|
||||
$item['body'] = trim($article->find('div.body', 0)->plaintext);
|
||||
|
||||
$ret[] = $item;
|
||||
}
|
||||
|
||||
// clean up memory
|
||||
$html->clear();
|
||||
unset($html);
|
||||
|
||||
return $ret;
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
// test it!
|
||||
$ret = scraping_slashdot();
|
||||
|
||||
foreach($ret as $v) {
|
||||
echo $v['title'].'<br>';
|
||||
echo '<ul>';
|
||||
echo '<li>'.$v['body'].'</li>';
|
||||
echo '</ul>';
|
||||
}
|
||||
?>
|
||||
35
lib/sd/example/simple_html_dom_utility.php
Normal file
35
lib/sd/example/simple_html_dom_utility.php
Normal file
|
|
@ -0,0 +1,35 @@
|
|||
<?php
|
||||
include_once('../simple_html_dom.php');
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
// remove HTML comments
|
||||
function html_no_comment($url) {
|
||||
// create HTML DOM
|
||||
$html = file_get_html($url);
|
||||
|
||||
// remove all comment elements
|
||||
foreach($html->find('comment') as $e)
|
||||
$e->outertext = '';
|
||||
|
||||
$ret = $html->save();
|
||||
|
||||
// clean up memory
|
||||
$html->clear();
|
||||
unset($html);
|
||||
|
||||
return $ret;
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
// search elements that contains an specific text
|
||||
function find_contains($html, $selector, $keyword, $index=-1) {
|
||||
$ret = array();
|
||||
foreach ($html->find($selector) as $e) {
|
||||
if (strpos($e->innertext, $keyword)!==false)
|
||||
$ret[] = $e;
|
||||
}
|
||||
|
||||
if ($index<0) return $ret;
|
||||
return (isset($ret[$index])) ? $ret[$index] : null;
|
||||
}
|
||||
?>
|
||||
72
lib/sd/manual/README.md
Normal file
72
lib/sd/manual/README.md
Normal file
|
|
@ -0,0 +1,72 @@
|
|||
This folder contains the source files for http://simplehtmldom.sourceforge.net/,
|
||||
the project page for PHP Simple HTML DOM Parser.
|
||||
|
||||
Source files are written in Markdown: https://en.wikipedia.org/wiki/Markdown
|
||||
|
||||
Site data is generated by MkDocs, a lightweight static site generator for project
|
||||
documentation: https://www.mkdocs.org/
|
||||
|
||||
# Folder structure
|
||||
|
||||
`custom_theme` : Contains customizations to the theme provided by MkDocs.
|
||||
`docs` : Contains the source files for the project page (the actual pages).
|
||||
`site` : Contains the output files for the project page when build with MkDocs.
|
||||
`extra.css` : Customizations to the styles provided by MkDocs.
|
||||
`mkdocs.yml` : The configuration file that is used by MkDocs to generate pages.
|
||||
|
||||
# Adding new pages
|
||||
|
||||
Place new files in `source`. Use subfolders (as few levels as possible) to
|
||||
separate categories.
|
||||
|
||||
Files added to the manual will **not** appear on the project page automatically.
|
||||
All pages need to be specified in the _mkdocs.yml_ file under "nav:". Simply add
|
||||
the relative path to the new file where appropriate.
|
||||
|
||||
Note: Files are not added automatically because they are sorted by name if not
|
||||
specified manually. Since readability is key factor for manuals, the files must
|
||||
be sorted in a way that makes it clear to users.
|
||||
|
||||
# Setting up MkDocs
|
||||
|
||||
The installation instructions for MkDocs are provided on their homepage:
|
||||
https://www.mkdocs.org/#installation
|
||||
|
||||
MkDocs automatically builds the project based on the _mkdocs.yml_ file. Find the
|
||||
specification for this file at https://www.mkdocs.org/user-guide/configuration/.
|
||||
|
||||
# Building project pages
|
||||
|
||||
The build process depends on your installation of MkDocs. Typically MkDocs is
|
||||
made available via the command line.
|
||||
|
||||
## Step 1 - Check your version of MkDocs
|
||||
|
||||
To check your version of MkDocs run this command:
|
||||
|
||||
`mkdocs --version` or
|
||||
`python3 -m mkdocs --version`
|
||||
|
||||
Should return `version 1.0.4` or higher. If it doesn't make sure to install the
|
||||
latest version using `pip install mkdocs` or `python3 -m pip install mkdocs`. If
|
||||
you don't have pip installed, install it via package manager or follow the
|
||||
instructions at https://pip.pypa.io/en/stable/installing/
|
||||
|
||||
## Step 2 - View the project locally
|
||||
|
||||
MkDocs allows you to view the project files in a browser on your local machine:
|
||||
|
||||
`mkdocs serve` or
|
||||
`python3 -m mkdocs serve`
|
||||
|
||||
If the process is successful you can access the site at http://127.0.0.1:8000.
|
||||
|
||||
## Step 3 - Build the project
|
||||
|
||||
If you are satisfied with the results of the project, build the final project
|
||||
with this command:
|
||||
|
||||
`mkdocs build` or
|
||||
`python3 -m mkdocs build`
|
||||
|
||||
Find the output files in the `site` folder.
|
||||
7
lib/sd/manual/custom_theme/main.html
Normal file
7
lib/sd/manual/custom_theme/main.html
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
{% extends "base.html" %}
|
||||
|
||||
{% block footer %}
|
||||
{% include "footer.html" %}
|
||||
<hr>
|
||||
<a class="logo" href="https://sourceforge.net/p/simplehtmldom/"><img alt="Download PHP Simple HTML DOM Parser" src="https://sourceforge.net/sflogo.php?type=16&group_id=218559" ></a>
|
||||
{% endblock %}
|
||||
68
lib/sd/manual/docs/api/api.md
Normal file
68
lib/sd/manual/docs/api/api.md
Normal file
|
|
@ -0,0 +1,68 @@
|
|||
---
|
||||
title: API Reference
|
||||
---
|
||||
|
||||
# Parsing documents
|
||||
|
||||
The parser accepts documents in the form of URLs, files and strings. The document
|
||||
must be accessible for reading and cannot exceed [`MAX_FILE_SIZE`](constants.md#max_file_size).
|
||||
|
||||
Name | Description
|
||||
---- | -----------
|
||||
`str_get_html( string $content ) : object` | Creates a DOM object from string.
|
||||
`file_get_html( string $filename ) : object` | Creates a DOM object from file or URL.
|
||||
|
||||
# DOM methods & properties
|
||||
|
||||
Name | Description
|
||||
---- | -----------
|
||||
`__construct( [string $filename] ) : void` | Constructor, set the filename parameter will automatically load the contents, either text or file/url.
|
||||
`plaintext : string` | Returns the contents extracted from HTML.
|
||||
`clear() : void` | Clean up memory.
|
||||
`load( string $content ) : void` | Load contents from string.
|
||||
`save( [string $filename] ) : string` | Dumps the internal DOM tree back into a string. If the $filename is set, result string will save to file.
|
||||
`load_file( string $filename ) : void` | Load contents from a file or a URL.
|
||||
`set_callback( string $function_name ) : void` | Set a callback function.
|
||||
`find( string $selector [, int $index] ) : mixed` | Find elements by the CSS selector. Returns the Nth element object if index is set, otherwise return an array of object.
|
||||
|
||||
# Element methods & properties
|
||||
|
||||
Name | Description
|
||||
---- | -----------
|
||||
`[attribute] : string` | Read or write element's attribute value.
|
||||
`tag : string` | Read or write the tag name of element.
|
||||
`outertext : string` | Read or write the outer HTML text of element.
|
||||
`innertext : string` | Read or write the inner HTML text of element.
|
||||
`plaintext : string` | Read or write the plain text of element.
|
||||
`find( string $selector [, int $index] ) : mixed` | Find children by the CSS selector. Returns the Nth element object if index is set, otherwise return an array of object.
|
||||
|
||||
# DOM traversing
|
||||
|
||||
Name | Description
|
||||
---- | -----------
|
||||
`$e->children( [int $index] ) : mixed` | Returns the Nth child object if index is set, otherwise return an array of children.
|
||||
`$e->parent() : element` | Returns the parent of element.
|
||||
`$e->first_child() : element` | Returns the first child of element, or null if not found.
|
||||
`$e->last_child() : element` | Returns the last child of element, or null if not found.
|
||||
`$e->next_sibling() : element` | Returns the next sibling of element, or null if not found.
|
||||
`$e->prev_sibling() : element` | Returns the previous sibling of element, or null if not found.
|
||||
|
||||
# Camel naming conventions
|
||||
|
||||
Method | Mapping
|
||||
------ | -------
|
||||
`$e->getAllAttributes()` | `$e->attr`
|
||||
`$e->getAttribute( $name )` | `$e->attribute`
|
||||
`$e->setAttribute( $name, $value)` | `$value = $e->attribute`
|
||||
`$e->hasAttribute( $name )` | `isset($e->attribute)`
|
||||
`$e->removeAttribute ( $name )` | `$e->attribute = null`
|
||||
`$e->getElementById ( $id )` | `$e->find ( "#$id", 0 )`
|
||||
`$e->getElementsById ( $id [,$index] )` | `$e->find ( "#$id" [, int $index] )`
|
||||
`$e->getElementByTagName ($name )` | `$e->find ( $name, 0 )`
|
||||
`$e->getElementsByTagName ( $name [, $index] )` | `$e->find ( $name [, int $index] )`
|
||||
`$e->parentNode ()` | `$e->parent ()`
|
||||
`$e->childNodes ( [$index] )` | `$e->children ( [int $index] )`
|
||||
`$e->firstChild ()` | `$e->first_child ()`
|
||||
`$e->lastChild ()` | `$e->last_child ()`
|
||||
`$e->nextSibling ()` | `$e->next_sibling ()`
|
||||
`$e->previousSibling ()` | `$e->prev_sibling ()`
|
||||
33
lib/sd/manual/docs/api/constants.md
Normal file
33
lib/sd/manual/docs/api/constants.md
Normal file
|
|
@ -0,0 +1,33 @@
|
|||
---
|
||||
title: Constants
|
||||
---
|
||||
|
||||
# Constants
|
||||
|
||||
Constants define how the parser treats documents. They can be defined before
|
||||
loading the parser to globally replace the default values.
|
||||
|
||||
## DEFAULT_TARGET_CHARSET
|
||||
|
||||
Defines the default target charset for text returned by the parser.
|
||||
|
||||
Default: `'UTF-8'`
|
||||
|
||||
## DEFAULT_BR_TEXT
|
||||
|
||||
Defines the default text to return for `<br>` elements.
|
||||
|
||||
Default: `"\r\n"`
|
||||
|
||||
## DEFAULT_SPAN_TEXT
|
||||
|
||||
Defines the default text to return for `<span>` elements.
|
||||
|
||||
Default: `' '`
|
||||
|
||||
## MAX_FILE_SIZE
|
||||
|
||||
Defines the maximum number of bytes the parser can load into memory. This limit
|
||||
only applies to the source file or string.
|
||||
|
||||
Default: `600000`
|
||||
100
lib/sd/manual/docs/api/definitions.md
Normal file
100
lib/sd/manual/docs/api/definitions.md
Normal file
|
|
@ -0,0 +1,100 @@
|
|||
---
|
||||
title: Definitions
|
||||
---
|
||||
|
||||
# Definitions
|
||||
|
||||
The definitions below are an essential part of the parser.
|
||||
|
||||
## Node Types
|
||||
|
||||
The type of a node is determined during parsing and represented by one of the elements in the list below.
|
||||
|
||||
| Type | Description
|
||||
| ---- | -----------
|
||||
| `HDOM_TYPE_ELEMENT` | Start tag (i.e. `<html>`)
|
||||
| `HDOM_TYPE_COMMENT` | HTML comment (i.e. `<!-- Hello, World! -->`)
|
||||
| `HDOM_TYPE_TEXT` | Plain text (i.e. `Hello, World!`)
|
||||
| `HDOM_TYPE_ENDTAG` | End tag (i.e. `</html>`)
|
||||
| `HDOM_TYPE_ROOT` | Root element. There can always only be one root element in the DOM.
|
||||
| `HDOM_TYPE_UNKNOWN` | Unknown type (i.e. CDATA, DOCTYPE, etc...)
|
||||
|
||||
### Example
|
||||
|
||||
```html
|
||||
<!DOCTYPE html><html><!-- Hello, World! --></html>Hello, World!
|
||||
```
|
||||
|
||||
_Note_: `HDOM_TYPE_ROOT` always exists regardless of the actual document structure.
|
||||
|
||||
| HTML | Node Type
|
||||
| ---- | ---------
|
||||
| | `HDOM_TYPE_ROOT`
|
||||
| `<!DOCTYPE html>` | `HDOM_TYPE_UNKNOWN`
|
||||
| `<html>` | `HDOM_TYPE_ELEMENT`
|
||||
| `<!-- Hello, World! -->` | `HDOM_TYPE_COMMENT`
|
||||
| `</html>` | `HDOM_TYPE_ENDTAG`
|
||||
| `Hello, World!` | `HDOM_TYPE_TEXT`
|
||||
|
||||
## Quote Types
|
||||
|
||||
Identifies the quoting type on attribute values.
|
||||
|
||||
| Type | Description
|
||||
| ---- | -----------
|
||||
| `HDOM_QUOTE_DOUBLE` | Double quotes (`""`)
|
||||
| `HDOM_QUOTE_SINGLE` | Single quotes (`''`)
|
||||
| `HDOM_QUOTE_NO` | Not quoted (flag)
|
||||
|
||||
_Note_: Attributes with no values (flags) are stored as `HDOM_QUOTE_NO`.
|
||||
|
||||
### Example
|
||||
|
||||
```html
|
||||
<p class="paragraph" id='info1' hidden>Hello, World!</p>
|
||||
```
|
||||
|
||||
| Attribute | Description
|
||||
| --------- | -----------
|
||||
| `class="paragraph"` | `HDOM_QUOTE_DOUBLE`
|
||||
| `id='info1'` | `HDOM_QUOTE_SINGLE`
|
||||
| `hidden` | `HDOM_QUOTE_NO`
|
||||
|
||||
## Node Info Types
|
||||
|
||||
Each node stores additional information (metadata) that is identified by the elements below.
|
||||
|
||||
| Type | Description
|
||||
| ---- | -----------
|
||||
| `HDOM_INFO_BEGIN` | Cursor position for the start tag of a node.
|
||||
| `HDOM_INFO_END` | Cursor position for the end tag of a node. A value of zero indicates a node with no end tag (missing closing tag).
|
||||
| `HDOM_INFO_QUOTE` | Quote type for attribute values. The value must be an element of [Quote Type](#quote-types).
|
||||
| `HDOM_INFO_SPACE` | Array of whitespace around attributes (see [Attribute Whitespace](#attribute-whitespace)).
|
||||
| `HDOM_INFO_TEXT` | Non-HTML text in tags (i.e. comments, doctype, etc...).
|
||||
| `HDOM_INFO_INNER` | Inner text of a node.
|
||||
| `HDOM_INFO_OUTER` | Outer text of a node.
|
||||
| `HDOM_INFO_ENDSPACE` | Whitespace at the end of a tag before the closing bracket.
|
||||
|
||||
## Attribute Whitespace
|
||||
|
||||
Whitespace around attributes is stored in the form of an array with three elements:
|
||||
|
||||
| Element | Description
|
||||
| ------- | -----------
|
||||
| `0` | Whitespace before the attribute name.
|
||||
| `1` | Whitespace between attribute name and the equal sign.
|
||||
| `2` | Whitespace between the equal sign and the attribute value
|
||||
|
||||
### Example
|
||||
|
||||
```html
|
||||
<p class="paragraph" id = 'info1'hidden>Hello, World!</p>
|
||||
```
|
||||
|
||||
_Note_: Whitespace before attribute names is not displayed in the browser. It is, however, part of the attributes.
|
||||
|
||||
| Attribute | Description
|
||||
| --------- | -----------
|
||||
| ` class="paragraph"` | `[0] => ' ', [1] => '', [2] => ''`
|
||||
| ` id = 'info1'` | `[0] => ' ', [1] => ' ', [2] => ' '`
|
||||
| `hidden` | `[0] => '', [1] => '', [2] => ''`
|
||||
25
lib/sd/manual/docs/api/file_get_html.md
Normal file
25
lib/sd/manual/docs/api/file_get_html.md
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
---
|
||||
title: file_get_html
|
||||
---
|
||||
|
||||
# file_get_html
|
||||
|
||||
```php
|
||||
file_get_html ( string $url [, bool $use_include_path = false [, resouce $context = null [, int $offset = 0 [, int $maxLen = -1 [, bool $lowercase = true [, bool $forceTagsClosed = true [, string $target_charset = DEFAULT_TARGET_CHARSET [, bool $stripRN = true [, string $defaultBRText = DEFAULT_BR_TEXT [, string $defaultSpanText = DEFAULT_SPAN_TEXT ]]]]]]]]]] )
|
||||
```
|
||||
|
||||
Parses the provided file and returns the DOM object.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `url` | Name or URL of the file to read.
|
||||
| `use_include_path` | See [`file_get_contents`](http://php.net/manual/en/function.file-get-contents.php#refsect1-function.file-get-contents-parameters)
|
||||
| `context` | See [`file_get_contents`](http://php.net/manual/en/function.file-get-contents.php#refsect1-function.file-get-contents-parameters)
|
||||
| `offset` | See [`file_get_contents`](http://php.net/manual/en/function.file-get-contents.php#refsect1-function.file-get-contents-parameters)
|
||||
| `maxLen` | See [`file_get_contents`](http://php.net/manual/en/function.file-get-contents.php#refsect1-function.file-get-contents-parameters)
|
||||
| `lowercase` | Forces lowercase matching of tags if enabled. This is very useful when loading documents with mixed naming conventions.
|
||||
| `forceTagsClosed` | Obsolete. This parameter is no longer used by the parser.
|
||||
| `target_charset` | Defines the target charset when returning text from the document.
|
||||
| `stripRN` | If enabled, removes newlines before parsing the document.
|
||||
| `defaultBRText` | Defines the default text to return for `<br>` elements.
|
||||
| `defaultSpanText` | Defines the default text to return for `<span>` elements.
|
||||
20
lib/sd/manual/docs/api/simple_html_dom/__construct.md
Normal file
20
lib/sd/manual/docs/api/simple_html_dom/__construct.md
Normal file
|
|
@ -0,0 +1,20 @@
|
|||
# __construct
|
||||
|
||||
```php
|
||||
__construct ( [ string $str = null [, bool $lowercase = true [, bool $forceTagsClosed = true [, string $target_charset = DEFAULT_TARGET_CHARSET [, bool $stripRN = true [, string $defaultBRText = DEFAULT_BR_TEXT [, string $defaultSpanText = DEFAULT_SPAN_TEXT [, int $options = 0 ]]]]]]]]) : object
|
||||
```
|
||||
|
||||
Creates a new `simple_html_dom` object.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `str` | The HTML document string.
|
||||
| `lowercase` | Tag names are parsed in lowercase letters if enabled.
|
||||
| `forceTagsClosed` | Tags inside block tags are forcefully closed if the closing tag was omitted.
|
||||
| `target_charset` | Defines the target charset for text returned by the parser.
|
||||
| `stripRN` | Newline characters are replaced by whitespace if enabled.
|
||||
| `defaultBRText` | Defines the default text to return for `<br>` elements.
|
||||
| `defaultSpanText` | Defines the default text to return for `<span>` elements.
|
||||
| `options` | Additional options for the parser. Currently supports `'HDOM_SMARTY_AS_TEXT'` to remove [Smarty](https://www.smarty.net/) scripts.
|
||||
|
||||
Returns the object.
|
||||
7
lib/sd/manual/docs/api/simple_html_dom/__destruct.md
Normal file
7
lib/sd/manual/docs/api/simple_html_dom/__destruct.md
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
# __destruct
|
||||
|
||||
```php
|
||||
__destruct ()
|
||||
```
|
||||
|
||||
Destroys the current object and clears memory.
|
||||
17
lib/sd/manual/docs/api/simple_html_dom/__get.md
Normal file
17
lib/sd/manual/docs/api/simple_html_dom/__get.md
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
# __get
|
||||
|
||||
```php
|
||||
__get ( string $name ) : mixed
|
||||
```
|
||||
|
||||
See [magic methods](http://php.net/manual/en/language.oop5.overloading.php#object.get)
|
||||
|
||||
Supports following names:
|
||||
|
||||
| Name | Description
|
||||
| ---- | -----------
|
||||
| `outertext` | Returns the outer text of the root element.
|
||||
| `innertext` | Returns the inner text of the root element.
|
||||
| `plaintext` | Returns the plain text of the root element.
|
||||
| `charset` | Returns the charset for the document.
|
||||
| `target_charset` | Returns the target charset for the document.
|
||||
7
lib/sd/manual/docs/api/simple_html_dom/__toString.md
Normal file
7
lib/sd/manual/docs/api/simple_html_dom/__toString.md
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
# __toString
|
||||
|
||||
```php
|
||||
__toString () : string
|
||||
```
|
||||
|
||||
Returns the inner text of the root element of the DOM.
|
||||
13
lib/sd/manual/docs/api/simple_html_dom/as_text_node.md
Normal file
13
lib/sd/manual/docs/api/simple_html_dom/as_text_node.md
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
# as_text_node (protected)
|
||||
|
||||
```php
|
||||
as_text_node ( string $tag ) : bool
|
||||
```
|
||||
|
||||
Adds a tag as text node.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `tag` | The element's tag name.
|
||||
|
||||
Returns true on success.
|
||||
11
lib/sd/manual/docs/api/simple_html_dom/childNodes.md
Normal file
11
lib/sd/manual/docs/api/simple_html_dom/childNodes.md
Normal file
|
|
@ -0,0 +1,11 @@
|
|||
# childNodes
|
||||
|
||||
```php
|
||||
childNodes ( [ int $idx = -1 ] ) : mixed
|
||||
```
|
||||
|
||||
Returns children of the root element.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `idx` | Index of the child element to return.
|
||||
7
lib/sd/manual/docs/api/simple_html_dom/clear.md
Normal file
7
lib/sd/manual/docs/api/simple_html_dom/clear.md
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
# clear
|
||||
|
||||
```php
|
||||
clear ()
|
||||
```
|
||||
|
||||
Cleans up memory to prevent [PHP 5 circular references memory leak](https://bugs.php.net/bug.php?id=33595).
|
||||
13
lib/sd/manual/docs/api/simple_html_dom/copy_skip.md
Normal file
13
lib/sd/manual/docs/api/simple_html_dom/copy_skip.md
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
# copy_skip (protected)
|
||||
|
||||
```php
|
||||
copy_skip ( string $chars ) : string
|
||||
```
|
||||
|
||||
Skips characters starting at the current parsing position in the document. Sets the parsing position to the first character not in the provided list of characters.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `chars` | A list of characters to skip.
|
||||
|
||||
Returns the skipped characters.
|
||||
13
lib/sd/manual/docs/api/simple_html_dom/copy_until.md
Normal file
13
lib/sd/manual/docs/api/simple_html_dom/copy_until.md
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
# copy_until (protected)
|
||||
|
||||
```php
|
||||
copy_until ( string $chars ) : string
|
||||
```
|
||||
|
||||
Copies all characters starting at the current parsing position in the document. Sets the parsing position to the first character that matches any of the characters in the provided list of characters.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `chars` | A list of characters to stop copying at.
|
||||
|
||||
Returns the copied characters.
|
||||
13
lib/sd/manual/docs/api/simple_html_dom/copy_until_char.md
Normal file
13
lib/sd/manual/docs/api/simple_html_dom/copy_until_char.md
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
# copy_until_char (protected)
|
||||
|
||||
```php
|
||||
copy_until ( string $char ) : string
|
||||
```
|
||||
|
||||
Copies all characters starting at the current parsing position in the document. Sets the parsing position to the first character that matches the provided character.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `char` | A character to stop copying at.
|
||||
|
||||
Returns the copied characters.
|
||||
14
lib/sd/manual/docs/api/simple_html_dom/createElement.md
Normal file
14
lib/sd/manual/docs/api/simple_html_dom/createElement.md
Normal file
|
|
@ -0,0 +1,14 @@
|
|||
# createElement
|
||||
|
||||
```php
|
||||
createElement ( string $name [, string $value = null ] ) : object
|
||||
```
|
||||
|
||||
Creates a new element.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `name` | Name of the element
|
||||
| `value` | Value of the element
|
||||
|
||||
Returns the element.
|
||||
9
lib/sd/manual/docs/api/simple_html_dom/createTextNode.md
Normal file
9
lib/sd/manual/docs/api/simple_html_dom/createTextNode.md
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
# createTextNode
|
||||
|
||||
```php
|
||||
createTextNode ( string $value ) : object
|
||||
```
|
||||
|
||||
Creates a new text element.
|
||||
|
||||
Returns the element.
|
||||
13
lib/sd/manual/docs/api/simple_html_dom/dump.md
Normal file
13
lib/sd/manual/docs/api/simple_html_dom/dump.md
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
# dump
|
||||
|
||||
```php
|
||||
dump ( [ bool show_attr = true ] ) : string
|
||||
```
|
||||
|
||||
Dumps the entire DOM into a string. Useful for debugging purposes.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `show_attr` | Attributes are included in the dump when enabled.
|
||||
|
||||
Returns the DOM tree as string.
|
||||
15
lib/sd/manual/docs/api/simple_html_dom/find.md
Normal file
15
lib/sd/manual/docs/api/simple_html_dom/find.md
Normal file
|
|
@ -0,0 +1,15 @@
|
|||
# find
|
||||
|
||||
```php
|
||||
find ( string $selector [, int $idx = null [, bool $lowercase = false ]] ) : mixed
|
||||
```
|
||||
|
||||
Finds elements in the DOM.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `selector` | A [CSS style selector](/manual/selectors).
|
||||
| `idx` | Index of the element to return.
|
||||
| `lowercase` | Matches tag names case insensitive when enabled.
|
||||
|
||||
Returns an array of matches or a single element if `idx` is defined.
|
||||
7
lib/sd/manual/docs/api/simple_html_dom/firstChild.md
Normal file
7
lib/sd/manual/docs/api/simple_html_dom/firstChild.md
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
# firstChild
|
||||
|
||||
```php
|
||||
firstChild () : object
|
||||
```
|
||||
|
||||
Returns the first child of the root element.
|
||||
13
lib/sd/manual/docs/api/simple_html_dom/getElementById.md
Normal file
13
lib/sd/manual/docs/api/simple_html_dom/getElementById.md
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
# getElementById
|
||||
|
||||
```php
|
||||
getElementById ( string $id ) : object
|
||||
```
|
||||
|
||||
Searches an element by id.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `id` | ID of the element to find.
|
||||
|
||||
Returns the element or null if no match was found.
|
||||
|
|
@ -0,0 +1,13 @@
|
|||
# getElementByTagName
|
||||
|
||||
```php
|
||||
getElementByTagName ( string $name ) : object
|
||||
```
|
||||
|
||||
Searches an element by tag name.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `name` | Tag name of the element to find.
|
||||
|
||||
Returns the element or null if no match was found.
|
||||
14
lib/sd/manual/docs/api/simple_html_dom/getElementsById.md
Normal file
14
lib/sd/manual/docs/api/simple_html_dom/getElementsById.md
Normal file
|
|
@ -0,0 +1,14 @@
|
|||
# getElementsById
|
||||
|
||||
```php
|
||||
getElementsById ( string $id [, int $idx = null ] ) : object
|
||||
```
|
||||
|
||||
Searches elements by id.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `id` | ID of the element to find.
|
||||
| `idx` | Returns the element at the specified index if defined.
|
||||
|
||||
Returns the element(s) or null if no match was found.
|
||||
|
|
@ -0,0 +1,14 @@
|
|||
# getElementsByTagName
|
||||
|
||||
```php
|
||||
getElementsByTagName ( string $name [, int $idx = -1 ] ) : object
|
||||
```
|
||||
|
||||
Searches elements by tag name.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `name` | Tag name of the element to find.
|
||||
| `idx` | Returns the element at the specified index.
|
||||
|
||||
Returns the element(s) or null if no match was found.
|
||||
7
lib/sd/manual/docs/api/simple_html_dom/lastChild.md
Normal file
7
lib/sd/manual/docs/api/simple_html_dom/lastChild.md
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
# lastChild
|
||||
|
||||
```php
|
||||
lastChild () : object
|
||||
```
|
||||
|
||||
Returns the last child of the root element.
|
||||
12
lib/sd/manual/docs/api/simple_html_dom/link_nodes.md
Normal file
12
lib/sd/manual/docs/api/simple_html_dom/link_nodes.md
Normal file
|
|
@ -0,0 +1,12 @@
|
|||
# link_nodes (protected)
|
||||
|
||||
```php
|
||||
link_nodes ( object &$node, bool $is_child )
|
||||
```
|
||||
|
||||
Links the provided node to the DOM tree.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `node` | The node to link to the DOM tree.
|
||||
| `is_child` | If active, makes the node a sibling of the current node (child of parent).
|
||||
18
lib/sd/manual/docs/api/simple_html_dom/load.md
Normal file
18
lib/sd/manual/docs/api/simple_html_dom/load.md
Normal file
|
|
@ -0,0 +1,18 @@
|
|||
# load
|
||||
|
||||
```php
|
||||
load ( string $str [, bool $lowercase = true [, bool $stripRN = true [, string $defaultBRText = DEFAULT_BR_TEXT [, string $defaultSpanText = DEFAULT_SPAN_TEXT [, int $options = 0 ]]]]]) : object
|
||||
```
|
||||
|
||||
Loads the provided HTML document string.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `str` | The HTML document string.
|
||||
| `lowercase` | Tag names are parsed in lowercase letters if enabled.
|
||||
| `stripRN` | Newline characters are replaced by whitespace if enabled.
|
||||
| `defaultBRText` | Defines the default text to return for `<br>` elements.
|
||||
| `defaultSpanText` | Defines the default text to return for `<span>` elements.
|
||||
| `options` | Additional options for the parser. Currently supports `'HDOM_SMARTY_AS_TEXT'` to remove [Smarty](https://www.smarty.net/) scripts.
|
||||
|
||||
Returns the object.
|
||||
7
lib/sd/manual/docs/api/simple_html_dom/loadFile.md
Normal file
7
lib/sd/manual/docs/api/simple_html_dom/loadFile.md
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
# loadFile
|
||||
|
||||
```php
|
||||
loadFile (...)
|
||||
```
|
||||
|
||||
This function is a wrapper for [`load_file`](#load_file)
|
||||
9
lib/sd/manual/docs/api/simple_html_dom/load_file.md
Normal file
9
lib/sd/manual/docs/api/simple_html_dom/load_file.md
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
# load_file
|
||||
|
||||
```php
|
||||
load_file (...) : object
|
||||
```
|
||||
|
||||
Loads a HTML document from file. Supports arguments of [`file_get_contents`](http://php.net/manual/en/function.file-get-contents.php).
|
||||
|
||||
Returns the object.
|
||||
7
lib/sd/manual/docs/api/simple_html_dom/parse.md
Normal file
7
lib/sd/manual/docs/api/simple_html_dom/parse.md
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
# parse (protected)
|
||||
|
||||
```php
|
||||
parse ()
|
||||
```
|
||||
|
||||
Parses the document. This function is called after the document was loaded into `$this->doc`.
|
||||
13
lib/sd/manual/docs/api/simple_html_dom/parse_attr.md
Normal file
13
lib/sd/manual/docs/api/simple_html_dom/parse_attr.md
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
# parse_attr (protected)
|
||||
|
||||
```php
|
||||
parse_attr ( object $node, string $name, array &$space )
|
||||
```
|
||||
|
||||
Parses a single attribute starting at the current parsing position in the document.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `node` | The current element (node).
|
||||
| `name` | The attribute name.
|
||||
| `space` | An array of whitespace sorounding the current attribute (see [Attribute Whitespace](../definitions/#attribute-whitespace)).
|
||||
15
lib/sd/manual/docs/api/simple_html_dom/parse_charset.md
Normal file
15
lib/sd/manual/docs/api/simple_html_dom/parse_charset.md
Normal file
|
|
@ -0,0 +1,15 @@
|
|||
# parse_charset (protected)
|
||||
|
||||
```php
|
||||
parse_charset ()
|
||||
```
|
||||
|
||||
Parses the charset.
|
||||
|
||||
If the callback function `get_last_retrieve_url_contents_content_type` exists, it is assumed to return the content type header for the current document as string.
|
||||
|
||||
Uses the charset from the metadata of the page if defined.
|
||||
|
||||
If none of the previous conditions are met, the charset is determined by `mb_detect_encoding` if multi-byte support is active.
|
||||
|
||||
If multi-byte support is not active the charset is assumed to be `'UTF-8'`.
|
||||
14
lib/sd/manual/docs/api/simple_html_dom/prepare.md
Normal file
14
lib/sd/manual/docs/api/simple_html_dom/prepare.md
Normal file
|
|
@ -0,0 +1,14 @@
|
|||
# prepare (protected)
|
||||
|
||||
```php
|
||||
prepare ( string $str [, bool $lowercase = true [, string $defaultBRText = DEFAULT_BR_TEXT [, string $defaultSpanText = DEFAULT_SPAN_TEXT ]]] )
|
||||
```
|
||||
|
||||
Initializes the DOM object.
|
||||
|
||||
| Parameters | Description
|
||||
| ---------- | -----------
|
||||
| `str` | The HTML document string.
|
||||
| `lowercase` | Tag names are parsed in lowercase letters if enabled.
|
||||
| `defaultBRText` | Defines the default text to return for `<br>` elements.
|
||||
| `defaultSpanText` | Defines the default text to return for `<span>` elements.
|
||||
9
lib/sd/manual/docs/api/simple_html_dom/read_tag.md
Normal file
9
lib/sd/manual/docs/api/simple_html_dom/read_tag.md
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
# read_tag (protected)
|
||||
|
||||
```php
|
||||
read_tag () : bool
|
||||
```
|
||||
|
||||
Reads a single tag starting at the current parsing position in the document. The tag is automatically added to the DOM.
|
||||
|
||||
Returns true if a tag was found.
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
# remove_callback
|
||||
|
||||
```php
|
||||
remove_callback ()
|
||||
```
|
||||
|
||||
Removes the callback set by [`set_callback`](#set_callback).
|
||||
14
lib/sd/manual/docs/api/simple_html_dom/remove_noise.md
Normal file
14
lib/sd/manual/docs/api/simple_html_dom/remove_noise.md
Normal file
|
|
@ -0,0 +1,14 @@
|
|||
# remove_noise (protected)
|
||||
|
||||
```php
|
||||
remove_noise ( string $pattern [, bool $remove_tag = false] )
|
||||
```
|
||||
|
||||
Replaces noise in the document (i.e. scripts) by placeholders and adds the removed contents to `$this->noise`.
|
||||
|
||||
_Note_: Noise is replaced by placeholders in order to allow restoring the original contents. Placeholders take the form of `'___noise___1000'` where the number is increased by one for each removed noise.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `pattern` | A regular expression that matches the noise to remove.
|
||||
| `remove_tag` | Removes the entire match when enabled or submatches when disabled.
|
||||
13
lib/sd/manual/docs/api/simple_html_dom/restore_noise.md
Normal file
13
lib/sd/manual/docs/api/simple_html_dom/restore_noise.md
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
# restore_noise (protected)
|
||||
|
||||
```php
|
||||
restore_noise ( string $text ) : string
|
||||
```
|
||||
|
||||
Restores noise in the provided string by replacing noise placeholders by their original contents.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `text` | A string (potentially) containing noise placeholders.
|
||||
|
||||
Returns the string with original contents restored or the original string if it doesn't contain noise placeholders.
|
||||
13
lib/sd/manual/docs/api/simple_html_dom/save.md
Normal file
13
lib/sd/manual/docs/api/simple_html_dom/save.md
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
# save
|
||||
|
||||
```php
|
||||
save ( [ string $filepath = '' ] ) : string
|
||||
```
|
||||
|
||||
Writes the current DOM to file.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `filepath` | Writes to file if the provided file path is not empty.
|
||||
|
||||
Returns the document string.
|
||||
13
lib/sd/manual/docs/api/simple_html_dom/search_noise.md
Normal file
13
lib/sd/manual/docs/api/simple_html_dom/search_noise.md
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
# search_noise (protected)
|
||||
|
||||
```php
|
||||
search_noise ( string $text ) : string
|
||||
```
|
||||
|
||||
Find a single noise element by providing the noise placeholder text.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `text` | The noise placeholder to find.
|
||||
|
||||
Returns the original contents for the placeholder.
|
||||
12
lib/sd/manual/docs/api/simple_html_dom/set_callback.md
Normal file
12
lib/sd/manual/docs/api/simple_html_dom/set_callback.md
Normal file
|
|
@ -0,0 +1,12 @@
|
|||
# set_callback
|
||||
|
||||
```php
|
||||
set_callback ( string $function_name )
|
||||
```
|
||||
|
||||
Sets the callback function which is called on each element of the DOM when building outertext.
|
||||
The function must accept a single parameter of type `simple_html_dom_node`.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `function_name` | Name of the function.
|
||||
40
lib/sd/manual/docs/api/simple_html_dom/simple_html_dom.md
Normal file
40
lib/sd/manual/docs/api/simple_html_dom/simple_html_dom.md
Normal file
|
|
@ -0,0 +1,40 @@
|
|||
---
|
||||
title: simple_html_dom
|
||||
---
|
||||
|
||||
# simple_html_dom
|
||||
|
||||
Represents the [DOM](https://en.wikipedia.org/wiki/Document_Object_Model) in memory. Provides functions to parse documents and access individual elements (see [`simple_html_dom_node`](../simple_html_dom_node/simple_html_dom_node.md)).
|
||||
|
||||
# Public Properties
|
||||
|
||||
| Property | Description
|
||||
| -------- | -----------
|
||||
| `root` | Root node of the document.
|
||||
| `nodes` | List of top-level nodes in the document.
|
||||
| `callback` | Callback function that is called for each element in the DOM when generating outertext.
|
||||
| `lowercase` | If enabled, all tag names are converted to lowercase when parsing documents.
|
||||
| `original_size` | Original document size in bytes.
|
||||
| `size` | Current document size in bytes.
|
||||
| `_charset` | Charset of the original document.
|
||||
| `_target_charset` | Target charset for the current document.
|
||||
| `default_span_text` | Text to return for `<span>` elements.
|
||||
|
||||
# Protected Properties
|
||||
|
||||
| Property | Description
|
||||
| -------- | -----------
|
||||
| `pos` | Current parsing position within `doc`.
|
||||
| `doc` | The original document.
|
||||
| `char` | Character at position `pos` in `doc`.
|
||||
| `cursor` | Current element cursor in the document.
|
||||
| `parent` | Parent element node.
|
||||
| `noise` | Noise from the original document (i.e. scripts, comments, etc...).
|
||||
| `token_blank` | Tokens that are considered whitespace in HTML.
|
||||
| `token_equal` | Tokens to identify the equal sign for attributes, stopping either at the closing tag ("/" i.e. `<html />`) or the end of an opening tag (">" i.e. `<html>`).
|
||||
| `token_slash` | Tokens to identify the end of a tag name. A tag name either ends on the ending slash ("/" i.e. `<html/>`) or whitespace (`"\s\r\n\t"`).
|
||||
| `token_attr` | Tokens to identify the end of an attribute.
|
||||
| `default_br_text` | Text to return for `<br>` elements.
|
||||
| `self_closing_tags` | A list of tag names where the closing tag is omitted.
|
||||
| `block_tags` | A list of tag names where remaining unclosed tags are forcibly closed.
|
||||
| `optional_closing_tags` | A list of tag names where the closing tag can be omitted.
|
||||
12
lib/sd/manual/docs/api/simple_html_dom/skip.md
Normal file
12
lib/sd/manual/docs/api/simple_html_dom/skip.md
Normal file
|
|
@ -0,0 +1,12 @@
|
|||
|
||||
# skip (protected)
|
||||
|
||||
```php
|
||||
skip ( string $chars )
|
||||
```
|
||||
|
||||
Skips characters starting at the current parsing position in the document. Sets the parsing position to the first character not in the provided list of characters.
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `chars` | A list of characters to skip.
|
||||
11
lib/sd/manual/docs/api/simple_html_dom_node/__construct.md
Normal file
11
lib/sd/manual/docs/api/simple_html_dom_node/__construct.md
Normal file
|
|
@ -0,0 +1,11 @@
|
|||
# __construct
|
||||
|
||||
```php
|
||||
__construct ( [ object $dom ] ) : object
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `dom` | An object of type [`simple_html_dom`](api/simple_html_dom/).
|
||||
|
||||
Constructs a new object of type `simple_html_dom_node`, assignes `$dom` as DOM object and adds itself to the list of nodes in `$dom`.
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
# __destruct
|
||||
|
||||
```php
|
||||
__destruct ( )
|
||||
```
|
||||
|
||||
Destructs the current object and frees memory.
|
||||
22
lib/sd/manual/docs/api/simple_html_dom_node/__get.md
Normal file
22
lib/sd/manual/docs/api/simple_html_dom_node/__get.md
Normal file
|
|
@ -0,0 +1,22 @@
|
|||
# __get
|
||||
|
||||
```php
|
||||
__get ( string $name ) : mixed
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `name` | `outertext`, `innertext`, `plaintext`, `xmltext` or attribute name.
|
||||
|
||||
See [magic methods](http://php.net/manual/en/language.oop5.overloading.php#object.get)
|
||||
|
||||
If the provided name is a valid attribute name, returns the attribute value. Otherwise a value according to the table below.
|
||||
|
||||
| Name | Description
|
||||
| ---- | -----------
|
||||
| `outertext` | Returns the outer text of the current node.
|
||||
| `innertext` | Returns the inner text of the current node.
|
||||
| `plaintext` | Returns the plain text of the current node.
|
||||
| `xmltext` | Returns the xml representation for the inner text of the current node as a CDATA section.
|
||||
|
||||
Returns nothing if the provided name is neither a valid attribute name, nor a valid parameter name.
|
||||
19
lib/sd/manual/docs/api/simple_html_dom_node/__isset.md
Normal file
19
lib/sd/manual/docs/api/simple_html_dom_node/__isset.md
Normal file
|
|
@ -0,0 +1,19 @@
|
|||
# __isset
|
||||
|
||||
```php
|
||||
__isset ( string $name ) : bool
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `name` | `outertext`, `innertext`, `plaintext` or attribute name.
|
||||
|
||||
See [magic methods](http://php.net/manual/en/language.oop5.overloading.php#object.get)
|
||||
|
||||
Returns true if the provided name is a valid attribute name or any of the values in the table below. False otherwise.
|
||||
|
||||
| Name | Description
|
||||
| ---- | -----------
|
||||
| `outertext` | Returns the outer text of the current node.
|
||||
| `innertext` | Returns the inner text of the current node.
|
||||
| `plaintext` | Returns the plain text of the current node.
|
||||
18
lib/sd/manual/docs/api/simple_html_dom_node/__set.md
Normal file
18
lib/sd/manual/docs/api/simple_html_dom_node/__set.md
Normal file
|
|
@ -0,0 +1,18 @@
|
|||
# __set
|
||||
|
||||
```php
|
||||
__set ( string $name, mixed $value )
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `name` | `outertext`, `innertext` or attribute name.
|
||||
| `value` | Value to set.
|
||||
|
||||
See [magic methods](http://php.net/manual/en/language.oop5.overloading.php#object.get)
|
||||
|
||||
Sets the outer text of the current node to `$value` if `$name` is `outertext`.
|
||||
|
||||
Sets the inner text of the current node to `$value` if `$name` is `innertext`.
|
||||
|
||||
Otherwise, adds or updates an attribute with name `$name` and value `$value` to the current node.
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
# __toString
|
||||
|
||||
```php
|
||||
__toString ( ) : string
|
||||
```
|
||||
|
||||
Returns the outer text of the current node.
|
||||
7
lib/sd/manual/docs/api/simple_html_dom_node/__unset.md
Normal file
7
lib/sd/manual/docs/api/simple_html_dom_node/__unset.md
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
# __unset
|
||||
|
||||
```php
|
||||
__unset ( string $name )
|
||||
```
|
||||
|
||||
Removes the attribute with name `$name` from the current node if it exists.
|
||||
23
lib/sd/manual/docs/api/simple_html_dom_node/addClass.md
Normal file
23
lib/sd/manual/docs/api/simple_html_dom_node/addClass.md
Normal file
|
|
@ -0,0 +1,23 @@
|
|||
# addClass
|
||||
|
||||
```php
|
||||
addClass ( mixed $class )
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `class` | Specifies one or more class names to be added.
|
||||
|
||||
Adds one or more class names to the current node.
|
||||
|
||||
**Remarks**
|
||||
|
||||
* To add more than one class, separate the class names with space or provide them as an array.
|
||||
|
||||
**Examples**
|
||||
|
||||
```php
|
||||
$node->addClass('hidden');
|
||||
$node->addClass('article important');
|
||||
$node->addClass(array('article', 'new'));
|
||||
```
|
||||
13
lib/sd/manual/docs/api/simple_html_dom_node/appendChild.md
Normal file
13
lib/sd/manual/docs/api/simple_html_dom_node/appendChild.md
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
# appendChild
|
||||
|
||||
```php
|
||||
appendChild ( object $node ) : object
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `node` | An object of type [`simple_html_dom_node`](../simple_html_dom_node/)
|
||||
|
||||
Makes the current node parent of the node provided to this function.
|
||||
|
||||
Returns the provided node.
|
||||
15
lib/sd/manual/docs/api/simple_html_dom_node/childNodes.md
Normal file
15
lib/sd/manual/docs/api/simple_html_dom_node/childNodes.md
Normal file
|
|
@ -0,0 +1,15 @@
|
|||
# childNodes
|
||||
|
||||
```php
|
||||
childNodes ( [ int $idx = -1 ] ) : mixed
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `idx` | Index of the node to return or `-1` to return all nodes.
|
||||
|
||||
Returns all or one specific child node from the current node.
|
||||
|
||||
## Remarks
|
||||
|
||||
This function is a wrapper for [`children`](../children/)
|
||||
11
lib/sd/manual/docs/api/simple_html_dom_node/children.md
Normal file
11
lib/sd/manual/docs/api/simple_html_dom_node/children.md
Normal file
|
|
@ -0,0 +1,11 @@
|
|||
# children
|
||||
|
||||
```php
|
||||
children ( [ int $idx = -1 ] ) : mixed
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `idx` | Index of the node to return or `-1` to return all nodes.
|
||||
|
||||
Returns all or one specific child node from the current node.
|
||||
7
lib/sd/manual/docs/api/simple_html_dom_node/clear.md
Normal file
7
lib/sd/manual/docs/api/simple_html_dom_node/clear.md
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
# clear
|
||||
|
||||
```php
|
||||
clear ( )
|
||||
```
|
||||
|
||||
Sets all properties in the current node, which contain objects, to null.
|
||||
13
lib/sd/manual/docs/api/simple_html_dom_node/convert_text.md
Normal file
13
lib/sd/manual/docs/api/simple_html_dom_node/convert_text.md
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
# convert_text
|
||||
|
||||
```php
|
||||
convert_text ( string $text ) : string
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `text` | Text to convert.
|
||||
|
||||
Assumes that the provided text is in the form of the configured source character set (see [`sourceCharset`](../simple_html_dom_node/) and converts it to the specified target character set (see [`targetCharset`](../simple_html_dom_node/)).
|
||||
|
||||
Returns the converted text.
|
||||
12
lib/sd/manual/docs/api/simple_html_dom_node/dump.md
Normal file
12
lib/sd/manual/docs/api/simple_html_dom_node/dump.md
Normal file
|
|
@ -0,0 +1,12 @@
|
|||
# dump
|
||||
|
||||
```php
|
||||
dump ( [ bool $show_attr = false [, int $depth = 0 ]] )
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `show_attr` | Attribute names are included in the output if enabled.
|
||||
| `depth` | Depth of the current element
|
||||
|
||||
Dumps information about the current node and all child nodes recursively.
|
||||
11
lib/sd/manual/docs/api/simple_html_dom_node/dump_node.md
Normal file
11
lib/sd/manual/docs/api/simple_html_dom_node/dump_node.md
Normal file
|
|
@ -0,0 +1,11 @@
|
|||
# dump_node
|
||||
|
||||
```php
|
||||
dump_node ( [ bool $echo = true ] ) : mixed
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `echo` | Echoes the dump details directly if enabled.
|
||||
|
||||
Dumps information about the current document node. Returns a string if `$echo` is set to false, null otherwise.
|
||||
44
lib/sd/manual/docs/api/simple_html_dom_node/find.md
Normal file
44
lib/sd/manual/docs/api/simple_html_dom_node/find.md
Normal file
|
|
@ -0,0 +1,44 @@
|
|||
# find
|
||||
|
||||
```php
|
||||
find (
|
||||
string $selector
|
||||
[, int $idx = null ]
|
||||
[, bool $lowercase = false ]
|
||||
) : mixed
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `selector` | [CSS](https://www.w3.org/TR/selectors/) selector.
|
||||
| `idx` | Index of element to return.
|
||||
| `lowercase` | Matches tag names case insensitive (lowercase) if enabled.
|
||||
|
||||
Finds one or more nodes in the current document, using CSS selectors.
|
||||
|
||||
* Returns null if no match was found.
|
||||
* Returns an array of [`simple_html_dom_node`](../simple_html_dom_node/) if `$idx` is null.
|
||||
* Returns an object of type [`simple_html_dom_node`](../simple_html_dom_node/) if `$idx` is anything __but__ null.
|
||||
|
||||
## Supported Selectors
|
||||
|
||||
| Selector | Description
|
||||
| --------- | -----------
|
||||
| `*` | [Universal selector](https://www.w3.org/TR/selectors/#the-universal-selector)
|
||||
| `E` | [Type (tag name) selector](https://www.w3.org/TR/selectors/#type-selectors)
|
||||
| `E#id` | [ID selector](https://www.w3.org/TR/selectors/#id-selectors)
|
||||
| `E.class` | [Class selector](https://www.w3.org/TR/selectors/#class-html)
|
||||
| `E[attr]` | [Attribute selector](https://www.w3.org/TR/selectors/#attribute-selectors)
|
||||
| `E[attr="value"]` | [Attribute selector](https://www.w3.org/TR/selectors/#attribute-selectors)
|
||||
| `E[attr="value"] i` | [Case-sensitivity](https://www.w3.org/TR/selectors/#attribute-case)
|
||||
| `E[attr="value"] s` | [Case-sensitivity](https://www.w3.org/TR/selectors/#attribute-case)
|
||||
| `E[attr~="value"]` | [Attribute selector](https://www.w3.org/TR/selectors/#attribute-selectors)
|
||||
| `E[attr^="value"]` | [Substring matching attribute selector](https://www.w3.org/TR/selectors/#attribute-substrings)
|
||||
| `E[attr$="value"]` | [Substring matching attribute selector](https://www.w3.org/TR/selectors/#attribute-substrings)
|
||||
| `E[attr*="value"]` | [Substring matching attribute selector](https://www.w3.org/TR/selectors/#attribute-substrings)
|
||||
| `E[attr|="value"]` | [Attribute selector](https://www.w3.org/TR/selectors/#attribute-selectors)
|
||||
| `E F` | [Descendant combinator](https://www.w3.org/TR/selectors/#descendant-combinators)
|
||||
| `E > F` | [Child combinator](https://www.w3.org/TR/selectors/#child-combinators)
|
||||
| `E + F` | [Next-sibling combinator](https://www.w3.org/TR/selectors/#adjacent-sibling-combinators)
|
||||
| `E ~ F` | [Subsequent-sibling combinator](https://www.w3.org/TR/selectors/#general-sibling-combinators)
|
||||
| `E, F` | [Selector list](https://www.w3.org/TR/selectors/#selector-list)
|
||||
|
|
@ -0,0 +1,11 @@
|
|||
# find_ancestor_tag
|
||||
|
||||
```php
|
||||
find_ancestor_tag ( string $tag ) : object
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `tag` | Tag name of the element to find.
|
||||
|
||||
Returns the first matching node that matches the specified tag name or null if no match was found.
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
# firstChild
|
||||
|
||||
```php
|
||||
firstChild ( ) : mixed
|
||||
```
|
||||
|
||||
This function is a wrapper for [`first_child`](../first_child/)
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
# first_child
|
||||
|
||||
```php
|
||||
first_child ( ) : mixed
|
||||
```
|
||||
|
||||
Returns the first child node of the current node or null if the current nod has no child nodes.
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
# getAllAttributes
|
||||
|
||||
```php
|
||||
getAllAttributes ( ) : array
|
||||
```
|
||||
|
||||
Returns all attributes for the current node.
|
||||
11
lib/sd/manual/docs/api/simple_html_dom_node/getAttribute.md
Normal file
11
lib/sd/manual/docs/api/simple_html_dom_node/getAttribute.md
Normal file
|
|
@ -0,0 +1,11 @@
|
|||
# getAttribute
|
||||
|
||||
```php
|
||||
getAttribute ( string $name ) : mixed
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `name` | Attribute name.
|
||||
|
||||
Returns the value for the attribute `$name`.
|
||||
|
|
@ -0,0 +1,11 @@
|
|||
# getElementById
|
||||
|
||||
```php
|
||||
getElementById ( string $id ) : object
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `id` | Element id.
|
||||
|
||||
Returns the first element with the specified id.
|
||||
|
|
@ -0,0 +1,11 @@
|
|||
# getElementByTagName
|
||||
|
||||
```php
|
||||
getElementByTagName ( string $name ) : object
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `name` | Tag name.
|
||||
|
||||
Returns the first element with the specified tag name.
|
||||
|
|
@ -0,0 +1,12 @@
|
|||
# getElementsById
|
||||
|
||||
```php
|
||||
getElementsById ( string $id [, int $idx = null] ) : mixed
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `id` | Element id.
|
||||
| `idx` | Index of element to return.
|
||||
|
||||
Returns all elements with the specified id if `$idx` is null, or a specific one if `$idx` is a valid index.
|
||||
|
|
@ -0,0 +1,12 @@
|
|||
# getElementsByTagName
|
||||
|
||||
```php
|
||||
getElementsByTagName ( string $name [, int $idx = null ] ) : mixed
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `name` | Tag name.
|
||||
| `idx` | Index of the element to return.
|
||||
|
||||
Returns all elements with the specified tag name if `$idx` is null, or a specific one if `$idx` is a valid index.
|
||||
|
|
@ -0,0 +1,9 @@
|
|||
# get_display_size
|
||||
|
||||
```php
|
||||
get_display_size ( ) : mixed
|
||||
```
|
||||
|
||||
Returns false if the current node is not an image.
|
||||
|
||||
Returns an associative array of two elements - `height` and `width` - that represent the display size of the image.
|
||||
11
lib/sd/manual/docs/api/simple_html_dom_node/hasAttribute.md
Normal file
11
lib/sd/manual/docs/api/simple_html_dom_node/hasAttribute.md
Normal file
|
|
@ -0,0 +1,11 @@
|
|||
# hasAttribute
|
||||
|
||||
```php
|
||||
hasAttribute ( string $name ) : bool
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `name` | Name of the attribute.
|
||||
|
||||
Returns true if the current node has an attribute with the specified name.
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
# hasChildNodes
|
||||
|
||||
```php
|
||||
hasChildNodes ( ) : bool
|
||||
```
|
||||
|
||||
This is a wrapper function for [`has_child`](../has_child/).
|
||||
17
lib/sd/manual/docs/api/simple_html_dom_node/hasClass.md
Normal file
17
lib/sd/manual/docs/api/simple_html_dom_node/hasClass.md
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
# hasClass
|
||||
|
||||
```php
|
||||
hasClass ( string $class ) : bool
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `class` | Specifies the class name to search for.
|
||||
|
||||
Returns true if the current node has the specified class name.
|
||||
|
||||
**Examples**
|
||||
|
||||
```php
|
||||
$node->hasClass('article');
|
||||
```
|
||||
7
lib/sd/manual/docs/api/simple_html_dom_node/has_child.md
Normal file
7
lib/sd/manual/docs/api/simple_html_dom_node/has_child.md
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
# has_child
|
||||
|
||||
```php
|
||||
has_child ( ) : bool
|
||||
```
|
||||
|
||||
Returns true if the current node has one or more child nodes.
|
||||
7
lib/sd/manual/docs/api/simple_html_dom_node/innertext.md
Normal file
7
lib/sd/manual/docs/api/simple_html_dom_node/innertext.md
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
# innertext
|
||||
|
||||
```php
|
||||
innertext ( ) : string
|
||||
```
|
||||
|
||||
Returns the inner text (everything inside the opening and closing tags) of the current node.
|
||||
11
lib/sd/manual/docs/api/simple_html_dom_node/is_utf8.md
Normal file
11
lib/sd/manual/docs/api/simple_html_dom_node/is_utf8.md
Normal file
|
|
@ -0,0 +1,11 @@
|
|||
# is_utf8 (static)
|
||||
|
||||
```php
|
||||
is_utf8 ( string $str ) : bool
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `str` | String to test.
|
||||
|
||||
Returns true if the provided string is a valid UTF-8 string.
|
||||
7
lib/sd/manual/docs/api/simple_html_dom_node/lastChild.md
Normal file
7
lib/sd/manual/docs/api/simple_html_dom_node/lastChild.md
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
# lastChild
|
||||
|
||||
```php
|
||||
lastChild ( ) : object
|
||||
```
|
||||
|
||||
This is a wrapper for [`last_child`](../last_child/).
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
# last_child
|
||||
|
||||
```php
|
||||
last_child ( ) : object
|
||||
```
|
||||
|
||||
Returns the last child of the current node or null if the current node has no child elements.
|
||||
7
lib/sd/manual/docs/api/simple_html_dom_node/makeup.md
Normal file
7
lib/sd/manual/docs/api/simple_html_dom_node/makeup.md
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
# makeup
|
||||
|
||||
```php
|
||||
makeup ( ) : string
|
||||
```
|
||||
|
||||
Returns the HTML representation of the current node.
|
||||
19
lib/sd/manual/docs/api/simple_html_dom_node/match.md
Normal file
19
lib/sd/manual/docs/api/simple_html_dom_node/match.md
Normal file
|
|
@ -0,0 +1,19 @@
|
|||
# match (protected)
|
||||
|
||||
```php
|
||||
match (
|
||||
string $exp
|
||||
, string $pattern
|
||||
, string $value
|
||||
, string $case_sensitivity
|
||||
) : bool
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `exp` | Expression
|
||||
| `pattern` | Pattern
|
||||
| `value` | Value
|
||||
| `case_sensitivity` | Case sensitivity
|
||||
|
||||
Matches a single attribute value against the specified attribute selector. See also [`find`](../find/).
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
# nextSibling
|
||||
|
||||
```php
|
||||
nextSibling ( ) : object
|
||||
```
|
||||
|
||||
This is a wrapper for [`next_sibling`](../next_sibling/).
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
# next_sibling
|
||||
|
||||
```php
|
||||
next_sibling ( ) : object
|
||||
```
|
||||
|
||||
Returns the next sibling of the current node or null if the current node has no next sibling.
|
||||
7
lib/sd/manual/docs/api/simple_html_dom_node/nodeName.md
Normal file
7
lib/sd/manual/docs/api/simple_html_dom_node/nodeName.md
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
# nodeName
|
||||
|
||||
```php
|
||||
nodeName ( ) : string
|
||||
```
|
||||
|
||||
Returns the name of the current node (tag name).
|
||||
7
lib/sd/manual/docs/api/simple_html_dom_node/outertext.md
Normal file
7
lib/sd/manual/docs/api/simple_html_dom_node/outertext.md
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
# outertext
|
||||
|
||||
```php
|
||||
outertext ( ) : string
|
||||
```
|
||||
|
||||
Returns the outer text (everything including the opening and closing tags) of the current node.
|
||||
12
lib/sd/manual/docs/api/simple_html_dom_node/parent.md
Normal file
12
lib/sd/manual/docs/api/simple_html_dom_node/parent.md
Normal file
|
|
@ -0,0 +1,12 @@
|
|||
# parent
|
||||
|
||||
```php
|
||||
parent ( [ object $parent = null ] ) : object
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `parent` | The parent node
|
||||
|
||||
* Returns the parent node of the current node if `$parent` is null.
|
||||
* Sets the parent node of the current node if `$parent` is not null. In this case the current node is automatically added to the list of nodes in the parent node.
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
# parentNode
|
||||
|
||||
```php
|
||||
parentNode () : object
|
||||
```
|
||||
|
||||
Returns the current's node parent.
|
||||
|
|
@ -0,0 +1,11 @@
|
|||
# parse_selector (protected)
|
||||
|
||||
```php
|
||||
parse_selector ( string $selector_string ) : array
|
||||
```
|
||||
|
||||
| Parameter | Description
|
||||
| --------- | -----------
|
||||
| `selector_string` | The selector string
|
||||
|
||||
Parses a CSS selector into an internal format for further use. See also [`find`](../find/).
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
# prevSibling
|
||||
|
||||
```php
|
||||
prevSibling ( ) : object
|
||||
```
|
||||
|
||||
This is a wrapper for [`previous_sibling`](../previous_sibling/).
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
# prev_sibling
|
||||
|
||||
```php
|
||||
prev_sibling ( ) : object
|
||||
```
|
||||
|
||||
Returns the previous sibling of the current node, or null if the current node has no previous sibling.
|
||||
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue