即日起在codingBlog上分享您的技术经验即可获得积分,积分可兑换现金哦。

html parsing with php DOMDocument

栈溢出 ghbarratt 162℃ 0评论
本文目录
[隐藏]

1.原始问题:html parsing with php DOMDocument

i’m trying to extract contents from a forum, I want to get all the topics links if the topic has more than one page, this is the topic format:




and this is the topic format if it has more then one page:




[Ir à página
Ir à página:
12345 
]

I want to get the id(_t_1594517) of those topics with 5 or more pages, how can I do that ? This is what I were tyring, but I got lost and I didn’t understand the DOMDocument documentation very well, I’m new to programming and PHP, help:

loadHTML($url + $page);

foreach($html->getElementsByTagName('td') as $td)
{
    if($td->hasAttributes())
    {
        if($td->getAttribute('align') == "left")
        {
            $div = $td->getElementsByTagName('div');
            if($div->hasAttributes())
            {
                if($td->getAttribute('class') == "topicos")
                {
                    $a = $td->getElementsByTagName('a');
                    {
                        if($a->hasAttributes())
                        {
                            /*$return['link'][] =*/ echo $a->getElementById('href')->tagName;
                        }
                    }
                }
            }
        }
    }
}   
}
?>

2.被采纳答案

暂无被采纳答案,请参照下面其他答案。

3.其他高票答案

3.1.第1个答案

I think xpath can help you:

If $with_links had the HTML content with the 5 links then

$doc = new DOMDocument();
$doc->loadHTML($with_links);
$xpath = new DOMXPath($doc);

$quick_paging_links = $xpath->query('//span[@class="quickPaging"]/a[contains(@href,"_t_")]/@href');
if($quick_paging_links->length>4)
{
  $first_href = $quick_paging_links->item(0)->value;
  $id = substr($first_href, 1, strpos($first_href, '?')-1);
  echo 'Topic with id '.$id.' has '.$quick_paging_links->length." links.\n";
}

will produce the output:

Topic with id _t_1594517 has 5 links.

转载请注明:CodingBlog » html parsing with php DOMDocument

喜欢 (0)or分享 (0)
发表我的评论
取消评论

*

表情