Blog > Php > Php seo tutorials

2007-12-01 12:24:01

Automatically extracting keywords

As many of you might already know our favorite search engine Google no longer thinks keywords are worth it's time. Google simply does not read the meta keywords tag anymore. So the question comes to mind, should we still bother writing the meta keywords tag?

Well for me this was a hard one to answer, many other search engines that could deliver you some traffic still use the meta keywords tag to index the pages.
Should we neglect these engines? I think not, but we should stop putting in a lot of hours working for these low traffic engines.

The solution, automatically generate your keywords using php.

It might sound simple and well it is. We can use some of the nifty php functions to extract rate and collect keywords from a web page and put these inside the meta keywords tag. Might not always give the perfect keywords, but at least these keywords are in the content of the page and are relevant to the text.

Introduction generate your keywords using php

The vessie engine uses smarty as a templating engine, a key feature of smarty is that it buffers the output and sends the complete blocks of content when you want it to. This lets you take the completely generated html from a php web page extract its keywords and place them inside the meta keywords tag. Then it lets you send the content to the browser. This article will however not use the smarty engine, but ill be happy to write a smarty version if users require this. Also please note that vessie uses a completely object orientated coding model view controller style. The reason I left this out is that it is simply to complex to cover this. We mainly want to focus on the task at hand. I do however promote the user of object orientated programming using php quite heavily!

Tell php to buffer its output.

The first thing we need to do is tell php to buffer any output, the reason is that we want to count and manipulate this output.
We can do this with this code at the beginning of your script.
<?php
ob_start();
?>

This starts the output buffering but keep in mind that the output buffer will be send automatically to the browser at the end of a script, to counter this we can use ob_get_clean(); So at the end of your script you can add the following code to stop php from sending all the content in the output buffer before we changed some things, like the keywords meta tag.
<?php
$content = ob_get_clean();
?>

What this does is take the contents of the output buffer and puts it inside the variable $content after it has done this the output buffer is cleaned/emptied.

Counting the keywords.

So now we have a page that outputs nothing, great. We also have the entire contents of the page inside the variable $content, so we can count and input the keywords. To count the keywords I have created a nice and simple to use function, you can use, alter and distribute the function as you like. But you must leave the copyright notice in tact. You should put this function after the ob_start(); call and before you call it at the end of your script.


<?php
/**
 * Extract the keywords from the content string and return the keywords string
 * @param string $content
 * @param int $minLength
 * @param int $headingWeight
 * @param int $linksWeight
 * @param int $numberOfKeywords
 * @return bool
 * @copyright remco verton <info@seomagnifier.com> 
 * @copyright Visit http://seomagnifier.com for more free php scripts
 */
function extractKeywords($content,$minLenght,$headingWeight,$linksWeight,$numberOfKeywords){
    
// minimum lenght a keyword must have
    
$keywordArray = array();
    
    
//Count the link keywords
    
$links = array();
    
preg_match_all('#<a.*?>(.*?)</a.*?>#s',$content,$links);        
    foreach(
$links[1] as $key =>$value){
        
$keywords explode(' ',strip_tags($value));
        foreach(
$keywords as $id => $keyword){
            
// Get the alpha numeric value for the keyword
            
$keyword preg_replace('/[^[:alpha:]]/'''$keyword);
            if(
strlen($keyword) >= $minLenght){
                if(!
array_key_exists($keyword,$keywordArray)){
                    
$keywordArray[$keyword] = $linksWeight;
                }
                else{
                    
$keywordArray[$keyword] += $linksWeight;
                }
            }
        }
    }
    
//Count the heading keywords
    
$headings = array();
    
preg_match_all('#<h(.*?)>(.*?)</h.*?>#s',$content,$headings);
    foreach(
$headings[2] as $key =>$value){
        
$keywords explode(' ',strip_tags($value));
        foreach(
$keywords as $id => $keyword){
            
// Get the alpha numeric value for the keyword
            
$keyword preg_replace('/[^[:alpha:]]/'''$keyword);            
            if(
strlen($keyword) >= $minLenght){
                
$divider = (int)$headings[1][$key];
                if(
$headingNumber == 0)$headingNumber 1;
                if(!
array_key_exists($keyword,$keywordArray)){
                    
$keywordArray[$keyword] = $headingWeight/$headingNumber;
                }
                else{
                    
$keywordArray[$keyword] += $headingWeight/$headingNumber;
                }
            }
        }
    }
    
// Count the text keywords including the heading and link texts!
    // Meaning these are counted double once with a rating of 1 and once with the rating set for them!
    
$text str_ireplace(array('/',"\n",'<br />','<br/>'),' ',$content);
    
$text strip_tags($text);
    
$keywords explode(' ',$text);
    foreach(
$keywords as $key => $keyword){
        
// Get the alpha numeric value for the keyword
        
$keyword preg_replace('/[^[:alpha:]]/'''$keyword);
        if(
strlen($keyword) >= $minLenght){
            if(!
array_key_exists($keyword,$keywordArray)){
                
$keywordArray[$keyword] = 1;
            }
            else{
                
$keywordArray[$keyword] += 1;
            }
        }
    }
    
// Sort the keywords
    
arsort($keywordArray);
    
// Take only the number of keywords set in the config
    
$keywordArray array_slice($keywordArray,0,$numberOfKeywords);
    return 
strtolower(implode(',',array_keys($keywordArray)));
}
?>

What this function does it count the words inside the headings, links and text. It will use a rating for each and return the number of keywords set in the parameters. If you call the function as per the working example code you will get the best possible result. I will not go in depth on the workings of this function. You won't have to learn how to write one, just use this.

Putting the keywords inside the meta tag.

Putting the keywords inside the meta tag simply requires a bit of string replacing, I have placed this code inside my meta keywords tag: {$keywords}. You can choose any code you want but be sure you will never write this as content and make it descriptive. I chose {$keywords} because it's the smarty way of putting data inside templates. This only requires1 line of php code.

<?php
$content = str_replace('{$keywords}',$kewords,$content);
?>

Putting it all together, adding the keywords to your document.

So now that we have a function, and we know how to replace the {$keywords} code with our keywords how do we continue. At the end of you php file you should add the following lines. This takes the content from the output buffer, extracts the keywords and put them inside the meta tag. After all is well it will echo the content to the browser.

<?php
$content = ob_get_clean();
$kewords = extractKeywords($content,5,20,2,10);
$content = str_replace('{$keywords}',$kewords,$content);
echo $content;
?>

A working example that adds it's keywords to the meta tag.

A working example can be found at working example and the source can be seen at code. Feel free to use any part of this example, it would be nice if you kept the copyright notice in place. If you have any questions or suggestions feel free to leave me a note in my profile.

private?