Content crawler php, how to create a web crawler using php

Before getting started, I"ll give a quick summary of the web scraping. Web scraping is to extract information from within the HTML of a web page. Web scraping with PHP doesn"t make any difference than any other kind of computer languages or web scraping tools, likeạn đang xem: How to create a website crawler using php

This article is to lớn illustrate how a beginner could build a simple web crawler in PHP. If you plan lớn learn PHP & use it for web scraping, follow the steps below.

Bạn đang xem: Content crawler php, how to create a web crawler using php



Web Crawler in PhP


Step 1. 

Add an input đầu vào box & a submit button to lớn the website page. We can enter the web page address into the input box. Regular Expressions are needed when extracting data.



Step 2. 

Regular expressions are needed when extracting data.


function preg_substr($start, $end, $str) // Regular expression

$temp = preg_split($start, $str);

$content = preg_split($end, $temp);

return $content;

Step 3.

String Split is needed when extracting data.


function str_substr($start, $end, $str) // string split

$temp = explode($start, $str, 2);

$content = explode($end, $temp, 2);

return $content;


Add a function khổng lồ save the content of extraction:

function writelog($str)


$open=fopen("log.txt","a" );




When the content we extracted is inconsistent with what is displayed in the browser, we couldn’t find the correct regular expressions. Here we can mở cửa the saved .txt file to find the correct string.

Xem thêm: Top 20 Phần Mềm Đọc Dxf Viewer, Download Autodwg Dxf Viewer, Cách Mở Và Chuyển Đổi File Dxf Sang Pdf, Svg, Ai

function writelog($str)


$open=fopen("log.txt","a" );



Step 5.

A function would be needed as well if you need to capture pictures.

function getImage($url, $filename="", $dirName, $fileType, $type=0)

if($url == "")return false;

//get the mặc định file name

$defaultFileName = basename($url);

//file type

$suffix = substr(strrchr($url,"."), 1);

if(!in_array($suffix, $fileType))

return false;

//set the tệp tin name

$filename = $filename == "" ? time().rand(0,9).".".$suffix : $defaultFileName;

//get remote file resource


$ch = curl_init();

$timeout = 5;

curl_setopt($ch, CURLOPT_URL, $url);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);

$file = curl_exec($ch);





$file = ob_get_contents();


//set file path

$dirName = $dirName."/".date("Y", time())."/".date("m", time())."/".date("d",time())."/";


mkdir($dirName, 0777, true);

//save file

$res = fopen($dirName.$filename,"a");



return $dirName.$filename;

Step 6.

We will write the code for extraction. Let’s take a web page from Amazon as an example. Enter a hàng hóa link.



$str = file_get_contents($_POST);

$str = mb_convert_encoding($str, ‘utf-8’,’iso-8859-1’);


//echo $str;

echo(‘Title:’ . Preg_substr(‘/*>/’,’//$str));


$imgurl=str_substr(‘var imageSrc = “’,’”’,$str);

echo ‘



Web Crawling for Non-coders

You don"t need to code a website crawler any more if you have an automatic web crawler.

As mentioned previously, PHP is only a tool that is used in creating a website crawler. Computer languages, lượt thích Python and JavaScript, are also good tools for those who are familiar with them. Nowadays, with the development of web-scraping tech, more và more website scraping tools, such as, Beautiful Soup,, and Parsehub, are emerging in multitude. They simplify the process of creating a website crawler.

Take website Scraping Templates as an example, it enables everyone to scrape data using pre-built templates, no more crawler setup, simply enter the từ khoá to search with and get data instantly.



Artículo en español: Crear un Simple web Crawler en PHPTambién puede leer artículos de web scraping en el Website Oficial