Broad Network


PHP Search this Website Application that Works

Foreword: This is a PHP search engine program for your own website.

By: Chrysanthus Date Published: 26 Aug 2018

Introduction

Many of the website programs to search their own websites out there, do not work. This one is written in PHP and it works. It starts its search from the current working directory and searches all the directories in and under the current working direcrory. It searches all the HTML files in the directories for keywords. It is a website search engine.

Before you continue, make sure you are registered so that you can be paid for reading and copying the code of this article, and more.  The application is just one PHP file, which you place in a directory (preferably home directory).

Code Segments
The file has the following code segments:

- HTML Code Strings

- Global Variables

- Placing of Useful Keywords into Array

- The scanTree() Recursive Function

HTML Code Strings
This code segment has the top  and bottom HTML code. In this segment, replace the title, "Searching Title goes Here" with that of your choice.

Global Variables
The recursive  function needs global variables to work.

Placing of Useful Keywords into Array
The HTTP POST method is used to send the keyword phrase from the browser to the web server. At the server, this code segment removes the non-keywords such as prepositions. The keywords are placed into an array.

The scanTree() Recursive Function
This code segment searches the directory tree for HTML files, beginning from the current workig directory. It checks each HTML file for the presence of any of the keywords. If your site is big, the code would take some time.

The Complete Program
Here is the complete program. Replace the value of the variable, $domainURL, with yours (like http://www.google.com). You are free to copy and modify the complete code and use it for any purpose.  The complete code is:


<?php

$pageTop = "<!DOCTYPE HTML>
<html lang='en'>
<head>
    <title>Searching Title goes Here</title>
</head>
<body >
<article>";

$pageBottom = "</article>
</body>
</html>";

    echo $pageTop;  //top of page without data
    echo "<h1>Search Result</h1>";

    //obtain the search string
    $searchStr = $_POST['searchStr'];

    if ($searchStr == "")
        {
            echo "<strong>Search string is empty!</strong>";
        }
    else
        {
            //remove the non-keywords using regex
            $nonKeywords = array("about", "along", "among", "before", "after", "by", "for", "in", "from", "on", "of", "since", "to", "until", "till", "up", "with", "between", "the", "a", "an", "while", "whereas", "since", "as", "for", "therefore", "but", "and", "or", "I", "you", "he", "she", "we", "they", "me", "him", "her", "us", "them", "my", "your", "his", "her", "our", "their", "mine", "yours", "hers", "ours", "theirs", "some", "few", "many", "much", "little");

            $arrLength = count($nonKeywords); // no. of elements in the array
            $newSearchStr;  //search string after removing non-keywords

            for ($i=0; $i<$arrLength; ++$i)
                {
                    $newSearchStr = preg_replace("/b$nonKeywords[$i]b/", "", $searchStr);
                    $searchStr = $newSearchStr;
                }


            //place each word of search string into an array
            $searchStrArr;
            preg_match_all("/c++|bw+b/i", $newSearchStr, $searchStrArr);
        }

    $foundSomething = false;
    $dr = '.';
    $level = 0;
    $noItems = 0;
    $us = array(0);  //array of indexes to scan already visited directory
    $u = 0;
    $begin = 0;
    $j = 0;
    $goingUp = false;

    chdir('.');
    $iPath = getcwd();
    $domainURL = 'http://localhost';

    //search entire site
    function scanTree($path, $begin)
        {
            global $foundSomething, $searchStrArr, $level, $begin, $j, $noItems, $us, $u, $goingUp, $iPath, $domainURL;

            $arrDir = scandir($path);
            $noItems = count($arrDir);

            //to end recursion
            if (($level === 0)&&($goingUp === true)&&(count($us) === 0))
                 {
                     $noItems = 0;
                     $begin = 0;
                 }

            if ($begin == $noItems)
                {
                    array_pop($us);  
                    $indx = count($us) - 1;
                    $u = $us[$indx];
                }

            for ($j=$begin; $j<$noItems; ++$j)
                {
                    if ($arrDir[$j] === '.')
                        continue;
                    if (($arrDir[$j] === '..')&&($noItems == 2))
                        {
                            continue;
                        }
                    elseif ($arrDir[$j] === '..')
                        continue;

                    if (is_dir($arrDir[$j]))
                        {

                                    if ($level === 0)
                                        $us[0] = $j + 1;   //reset comeback index for topmost directory

                                    if (($goingUp === true)&&($level !== 0))
                                        {
                                            array_pop($us);
                                            $goingUp = false;
                                        }
                                    elseif ($level === 0)
                                        $goingUp = false;

                                    if ($y === null)
                                        $y = $j + 1;   //for the very first (top) scan

                                    $currPath = $path . '/' . $arrDir[$j] . '/';
                                    chdir($currPath);
                                    $level = $level + 1;
                                    $u = $j + 1;
                                    $us[] = $u;


                                    $begin = 0;
                                    scanTree(getcwd(), 0);

                        }
                    else
                        {
                            if (preg_match("/.htm$/", $arrDir[$j]))
                                {
                                    $fileStr = file_get_contents($arrDir[$j]);
                                    $fileStr = preg_replace('/^s*|s*$/', '', $fileStr);  #remove leading and trailing whitespaces

                                    $title = '';
                                    preg_match("/<title>.+</title>/", $fileStr, $title);
                                    $titl = $title[0];
                                    $titl = preg_replace("/<title>/", '<strong>', $titl);
                                    $titl = preg_replace("/</title>/", '</strong><br>', $titl);
                                    $keyStrArrD = array();
                                    for ($k=0; $k<count($searchStrArr[0]); ++$k)
                                        {
                                            $keyArr;
                                            $keyStrArr;
                                            $keyStrArr1 = array();
                                            $keyStrArrNoT = array();
                                            $regex0 = $searchStrArr[0][$k];
                                            if (preg_match_all("/$regex0/", $fileStr, $keyArr))
                                                {
                                                    for ($l=0; $l<count($keyArr[0]); ++$l)
                                                         {
                                                             $regex1 = $keyArr[0][$l];
                                                             preg_match_all("/.{0,66}$regex1.{0,66}/", $fileStr, $keyStrArr);
                                                             array_push($keyStrArr1, $keyStrArr[0][$l]);
                                                         }
                                                     for ($m=0; $m<count($keyStrArr1); ++$m)
                                                         {
                                                             $strNoD = preg_replace("/<.+>/", '', $keyStrArr1[$m]);
                                                             $strNoD = preg_replace("/</.+>/", '', $strNoD);  
                                                             array_push($keyStrArrNoT, $strNoD);   
                                                         }
                                                     for ($n=0; $n<count($keyStrArrNoT); ++$n)
                                                         {
                                                             if (preg_match("/w/", $keyStrArrNoT[$n]))
                                                                 {
                                                                     $strD = '. . . ' . $keyStrArrNoT[$n] . ' . . . ';
                                                                     array_push($keyStrArrD, $strD);
                                                                 }
                                                          }
                                                   }
                                           }
                                       if (count($keyStrArrD) > 0)
                                           {
                                               $foundSomething = true;
                                               $ePath = $path;
                                               $ePath = str_replace($iPath, '', $ePath);
                                               echo "<a href='$domainURL" . $ePath . '/' . $arrDir[$j] . "'>$titl</a>";
                                               for ($q=0; $q<count($keyStrArrD); ++$q)
                                                   {
                                                       echo $keyStrArrD[$q];
                                                   }
                                               echo '<br><br>';
                                           }

                                }

                        }

                    if (($noItems - 1) == $j)
                        {
                            $arrDirPresent = scandir($path);
                            $usefulDirPrsnt = 'No';
                            for ($w=0; $w<count($arrDirPresent); ++$w)
                                {
                                    if (($arrDir[$w] === '.')||($arrDir[$w] === '..'))
                                        continue;
                                    if (is_dir($arrDirPresent[$w]))
                                        $usefulDirPrsnt = 'Yes';

                                }
                            if ($usefulDirPrsnt == 'Yes')
                                {
                                    array_pop($us);
                                    $inx = count($us) - 1;
                                    $u = $us[$inx];
                                }

                        }

                }


            if ($level > 0)
                {
                    $goingUp = true;
                    chdir('..');
                    $level = $level - 1;
                    $currPath = getcwd();
                    $begin = $u;
                    scanTree($currPath, $u);
                }

        }

    scanTree($dr, 0);

    if ($foundSomething === false)
        {
            echo "<strong>No match found! </strong>";
            echo "<br>Go to: <a href='categories.htm'>Categories</a>";
        }

    echo $pageBottom;   //bottom of page without data

?>

Chrys

Related Links

Basics of PHP with Security Considerations
White Space in PHP
PHP Data Types with Security Considerations
PHP Variables with Security Considerations
PHP Operators with Security Considerations
PHP Control Structures with Security Considerations
PHP String with Security Considerations
PHP Arrays with Security Considerations
PHP Functions with Security Considerations
PHP Return Statement
Exception Handling in PHP
Variable Scope in PHP
Constant in PHP
PHP Classes and Objects
Reference in PHP
PHP Regular Expressions with Security Considerations
Date and Time in PHP with Security Considerations
Files and Directories with Security Considerations in PHP
Writing a PHP Command Line Tool
PHP Core Number Basics and Testing
Validating Input in PHP
PHP Eval Function and Security Risks
PHP Multi-Dimensional Array with Security Consideration
Mathematics Functions for Everybody in PHP
PHP Cheat Sheet and Prevention Explained
More Related Links

Cousins

Comments