Well, now you can do it to your simple phpBB if you keep it in the [phpBB3] *folder in your root and use say iPage and change a few things i have noted, and curl will work remotely. This pulls your main topics, then reads each topic area and then pulls all sub entries, so, yes, only 2 levels deep, but you'll learn how to do it here.
files here
_files/phpbb_scraper_sc.rar
my setup from home
win 10x64
apache apache2_4_33x64-vc15
php7211x64vc15
-curl enabled (in windows it's turned on by removing ; from ;extension=curl (no .dll here in 7.x versions of php)
-allow_url_fopen=on (in your php.ini)
-remember to make php and php's /bin directory listed in your windows PATH environment if you haven't yet
my setup on ipage (01/12/2021)
php 7.2x
-allow_url_fopen=on (in your php.ini)
* /phpBB3/ so that's my [/phpBB3/] meaning my root site. you can change the php variable in my code for your path if not that
****do NOT remove the ; from the modules from ipage's php.ini they don't change anything, run your own phpinfo test to see
So, I chopped up some online tutorials and came up with this very long, super basic way to use the chrome browser, right click the title of a topic in my phpBB and use the plugin for chrome called SCRAPER and choose [scrape similar...] then I read about xpath, and it gave me a basic pull. There were two columns and they were accessed in php with [a href] as the attribute, which I figured out from what scraper had as a row in it's output.
I also saved the process (xpath) and named it "topics" just in case I forgot how to do this in chrome inside scraper's interface. very nice feature from them. I then also looked at a topic page and right clicked an entry title and the xpath was different (the dom is) and saved that scraper search as well.
main page topic xpath
Code: Select all
'//div[1]/div/ul[2]/li/dl/dt/div/a'
Code: Select all
'//dt/div/a'
test_board_index.php{/b]
Code: Select all
<?php
/*
test_board_index.php v1.0 by kristoffe brodeur. ©2021 All Rights Reserved.
01-11-2021
simple program to scrape the board indexes
of the main area of a phpbb with [Lucid Lime] theme
*/
$to_root="./";
$base="http://www.supercala.net/phpBB3/";//allows us to use the scraped data to make links to the forum page areas
$url=$base."index.php";
$html=file_get_contents($url);
//echo "scraping away v1.0 by kristoffe<hr />";
$phpbbMain_doc=new DomDocument();
libxml_use_internal_errors(TRUE);//disable libxml errors
$bbStr="";
$topMenuStr="";
//
if(!empty($html))
{
//echo "cool we got the website as data<hr />";
$phpbbMain_doc->loadHTML($html);
libxml_clear_errors();//remove any of the weird html errors etc we don't need
$phpbbMain_xpath=new DOMXPath($phpbbMain_doc);
$phpbbMain_topics=$phpbbMain_xpath->query('//div[1]/div/ul[2]/li/dl/dt/div/a');
$lenT=0;
$lenSS=0;
//
if($phpbbMain_topics->length >0)
{
//echo "[let's look into an object node for it's parts now]<br/>";
$lenT=count($phpbbMain_topics);
$pagePos=-1;
//
foreach($phpbbMain_topics as $row)
{
$pagePos++;
$a_textCol=$row->nodeValue;
$a_hrefCol=$row->getAttribute("href");
//get rid of the one character (.) period in the return ./viewforum.php?f= etc
$aStr=substr($a_hrefCol,1);
/*
use the $base not the $url so it starts from the phpbb forum folder root as the ^ link goes to on the page(s)
pretend we're at the scraped url and going to the link as if we were locally there (./)
*/
$a_hrefLink=$base.$aStr;
//now let's get the variables (queries) from both ? and & in the href string phpbb outputs to each area
$queryArr=parse_url($a_hrefCol);
//
foreach($queryArr as $key=>$val)
{
$subPageStr="";
//echo "[$key][$val]<br />";
//
if($key=="query")
{
$tmpQListArr=array();
parse_str($val,$tmpQListArr);
//echo count($tmpQListArr)."<br />";
$forumNode=-1;
//
foreach($tmpQListArr as $key2=>$val2)
{
//
switch($key2)
{
case("f"):
$forumNode=$val2;
//echo "[forum id]$forumNode<br />";
break;
case("sid"):
break;
default:
break;
}
}
//
if($forumNode!=-1)
{
/*
test with $pagePos==0 to save time and debug if one loop of doSubArea(#) will work
test with $pagePos<$lenT for all of them.
it might be better to call this with javascript and populate areas with a 1 second wait
instead of a really long pause. maybe
html->
jquery php that does this with a result->
add to area on screen per main area with sub-areas via DOM js->
wait 1s->
loop->
*/
//
if($pagePos<$lenT)
{
$subPageStr=doSubAreaPage($forumNode);
//echo "!!!!! [$subPageStr] !!!!! <br />";
}
}
}
/*
now the forum areas have titles and links to all the their listed nodes too
time to loop but with a different target per loop on the cpath found with [scraper] and chrome (I'm learning)
*/
}
$topMenuStr.="
<div class='bb_areaButton'><a href='#area".$pagePos."'>".$a_textCol."</a></div>
";
$bbStr.="
<div class='bb_page' id='area".$pagePos."'>
<div class='dataBox_m'>
$a_textCol
</div>
<div class='dataBox_l'>
<a href='$a_hrefLink'>$a_hrefCol</a>
</div>
<div class='clearBoth'></div>
$subPageStr
</div>
";
}
}
}
//
function doSubAreaPage($sentArea)
{
global $base;
global $lenSS;
$sub_url=$base."viewforum.php?f=".$sentArea;
//echo "$sub_url<br />";
$sub_html=file_get_contents($sub_url);
$phpbbSub_doc=new DomDocument();
//maybe only declare this once, not each time per php page (done in page root)
//libxml_use_internal_errors(TRUE);
$sub_bbStr="";
//
if(!empty($sub_html))
{
$phpbbSub_doc->loadHTML($sub_html);
libxml_clear_errors();
$phpbbSub_xpath=new DOMXPath($phpbbSub_doc);
/*
new query, the template puts the areas in a different level of the DOM structure
found right scraper and just right clicking the title of the sub area entry and [scraper similar...]
lazy, but it's a lot to learn at once
*/
$phpbbSub_topics=$phpbbSub_xpath->query('//dt/div/a');
//echo "subTopic areas found with xpath [".count($phpbbSub_topics)."]<br />";
$lenST=0;
//
if($phpbbSub_topics->length >0)
{
//echo "sub topic page[$sentArea] so far so good!<br />";
$pagePos=-1;
//
foreach($phpbbSub_topics as $row)
{
$pagePos++;
$a_textCol=$row->nodeValue;
$a_hrefCol=$row->getAttribute("href");
$aStr=substr($a_hrefCol,1);
$a_hrefLink=$GLOBALS['base'].$aStr;
$lenSS++;
//
$sub_bbStr.="
<div class='bb_page_sub'>
<div class='dataBox_m'>
$a_textCol
</div>
<div class='dataBox_l'>
<a href='$a_hrefLink'>$a_hrefCol</a>
</div>
<div class='clearBoth'></div>
</div>
";
}
}
}
//
else
{
$sub_bbStr="page load error<br />";
}
return $sub_bbStr;
}
//
function showRows()
{
$lenT=count($phpbbMain_topics);
//echo "wow. I found [$lenT] rows of topics! thanks to php xpath and chrome with the 'scraper' plugin<hr />";
//
foreach($phpbbMain_topics as $row)
{
//echo "<div class=''>".$row->nodeValue."</div>";
}
}
//echo "<hr />finished<hr />";
?>
<html>
<head>
<link type="text/css" rel="stylesheet" href="<?php echo $to_root;?>css/page.css" />
</head>
<body>
<div class='dataBox_m'>Forum Main Sections</div>
<div class='dataBox_l'><?php echo $lenT;?></div>
<div class="clearBoth"></div>
<div class='dataBox_m'>Total Postings In All Main Areas</div>
<div class='dataBox_l'><?php echo $lenSS;?></div>
<div class="clearBoth"></div>
<div class="topMenuButtons">
<?php echo $topMenuStr;?>
</div>
<div class="clearBoth"></div>
<?php echo $bbStr;?>
</body>
</html>
Code: Select all
.dataBox_m,.dataBox_m
{
float:left;
padding:4px;
}
.clearBoth
{
clear:both;
}
.dataBox_m
{
width:600px;
}
.bb_page,.bb_page_sub
{
width:100%;
}
.bb_page_sub
{
padding:0px 0px 0px 72px;
background-color:#DDFFDD;
border:solid;
border-color:#55FF55;
border-width:0px 0px 2px 0px;
}
.bb_page_sub:hover
{
background-color:#AAFFAA;
}
.bb_areaButton
{
padding:8px;
margin:2px;
background-color:#00CC00;
color:#FFFFFF;
float:left;
}
Code: Select all
<?php
$to_root="./";
?>
<html>
<head>
<link type="text/css" rel="stylesheet" href="<?php echo $to_root;?>css/ipage_curl_test.css" />
</head>
<body>
seems that ipage has
<br />
<b>allow_url_fopen=Off</b>
<br />
<br />
by default, so change it in the php.ini on the ipage shared server control panel to
<br />
<b>allow_url_fopen=On</b>
<hr />
<div class='story_half'><img src="ipage_curl_step1.jpg" /></div>
<div class='story_half'><img src="ipage_curl_step2.jpg" /></div>
<div class='clearBoth'></div>
<div class='story_half'><img src="ipage_curl_step3.jpg" /></div>
<div class='story_half'><img src="ipage_curl_step4.jpg" /></div>
<div class='clearBoth'></div>
<?php
/*
test_board_index.php v1.0 by kristoffe brodeur. ©2021 All Rights Reserved.
01-11-2021
simple program to scrape the board indexes
of the main area of a phpbb with [Lucid Lime] theme
seems that ipage has
allow_url_fopen=Off
by default, so change it in the php.ini on the ipage shared server control panel to
allow_url_fopen=On
*/
echo "testing curl on ipage<br />";
$base="http://www.supercala.net/phpBB3/";//allows us to use the scraped data to make links to the forum page areas
$url=$base."index.php";
$html=file_get_contents($url);
echo "scraping away v1.0 by kristoffe<hr />";
$phpbbMain_doc=new DomDocument();
libxml_use_internal_errors(TRUE);//disable libxml errors
$bbStr="";
//
if(!empty($html))
{
echo "cool we got the website as data<hr />";
}
?>
</body>
</html>
Code: Select all
.clearBoth
{
clear:both;
}
.story_half
{
width:47%;
float:left;
border:solid;
border-width:2px;
margin:4px;
border-color:#AAFFAA;
}
.story_half img,.story_half a img
{
width:100%;
}
@media screen and (min-width:400px) and (max-width:1000px)
{
.story_half
{
width:100% !important;
}
}
Code: Select all
<?php
// Script to test if the CURL extension is installed on this server
// Define function to test
function _is_curl_installed() {
if (in_array ('curl', get_loaded_extensions())) {
return true;
}
else {
return false;
}
}
// Output text to user based on test
if (_is_curl_installed()) {
echo "cURL is <span style=\"color:blue\">installed</span> on this server";
} else {
echo "cURL is NOT <span style=\"color:red\">installed</span> on this server";
}
?>
http://www.supercala.net/sites/phpbb_sc ... _step1.jpg
http://www.supercala.net/sites/phpbb_sc ... _step2.jpg
http://www.supercala.net/sites/phpbb_sc ... _step3.jpg
http://www.supercala.net/sites/phpbb_sc ... _step4.jpg