我的网站上有html(http://testsite.com/test.php):
<div class="first">
<div class="second">
<a href="/test.PHP">click</a>
<span>back</span>
</div>
</div>
<div class="first">
<div class="second">
<a href="/test.PHP">click</a>
<span>back</span>
</div>
</div>
我想收到:
<div class="first">
<div class="second">
<a href="/test.PHP">click</a>
</div>
</div>
<div class="first">
<div class="second">
<a href="/test.PHP">click</a>
</div>
</div>
所以我想删除跨度. 我在Symfony2中使用Goutte基于http://symfony.com/doc/current/components/dom_crawler.html:
$client = new Client();
$crawler = $client->request('GET', 'http://testsite.com/test.PHP');
$crawler->filter('.first .second')->each(function ($node) {
//??????
});
解决方法: 如explained in the docs:
The DomCrawler component eases DOM navigation for HTML and XML documents.
并且:
While possible, the DomCrawler component is not designed for manipulation of the DOM or re-dumping HTML/XML.
DomCrawler旨在从DOM文档中提取细节而不是修改它们.
然而…
由于PHP通过引用传递对象,而Crawler基本上是DOMNodes的包装器,因此在技术上可以修改底层DOM文档:
// will remove all span nodes inside .second nodes
$crawler->filter('html .content h2')->each(function (Crawler $crawler) {
foreach ($crawler as $node) {
$node->parentNode->removeChild($node);
}
});
这是一个有效的例子:https://gist.github.com/jakzal/8dd52d3df9a49c1e5922 (编辑:北几岛)
【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!
|