<?php
// start the output buffering
ob_start();
/**
* Extract the keywords from the content string and return the keywords string
* @param string $content
* @param int $minLength
* @param int $headingWeight
* @param int $linksWeight
* @param int $numberOfKeywords
* @return bool
* @copyright remco verton <info@seomagnifier.com>
* @copyright Visit http://seomagnifier.com for more free php scripts
*/
function extractKeywords($content,$minLenght,$headingWeight,$linksWeight,$numberOfKeywords){
// minimum lenght a keyword must have
$keywordArray = array();
//Count the link keywords
$links = array();
preg_match_all('#<a.*?>(.*?)</a.*?>#s',$content,$links);
foreach($links[1] as $key =>$value){
$keywords = explode(' ',strip_tags($value));
foreach($keywords as $id => $keyword){
// Get the alpha numeric value for the keyword
$keyword = preg_replace('/[^[:alpha:]]/', '', $keyword);
if(strlen($keyword) >= $minLenght){
if(!array_key_exists($keyword,$keywordArray)){
$keywordArray[$keyword] = $linksWeight;
}
else{
$keywordArray[$keyword] += $linksWeight;
}
}
}
}
//Count the heading keywords
$headings = array();
preg_match_all('#<h(.*?)>(.*?)</h.*?>#s',$content,$headings);
foreach($headings[2] as $key =>$value){
$keywords = explode(' ',strip_tags($value));
foreach($keywords as $id => $keyword){
// Get the alpha numeric value for the keyword
$keyword = preg_replace('/[^[:alpha:]]/', '', $keyword);
if(strlen($keyword) >= $minLenght){
$divider = (int)$headings[1][$key];
if($headingNumber == 0)$headingNumber = 1;
if(!array_key_exists($keyword,$keywordArray)){
$keywordArray[$keyword] = $headingWeight/$headingNumber;
}
else{
$keywordArray[$keyword] += $headingWeight/$headingNumber;
}
}
}
}
// Count the text keywords including the heading and link texts!
// Meaning these are counted double once with a rating of 1 and once with the rating set for them!
$text = str_ireplace(array('/',"\n",'<br />','<br/>'),' ',$content);
$text = strip_tags($text);
$keywords = explode(' ',$text);
foreach($keywords as $key => $keyword){
// Get the alpha numeric value for the keyword
$keyword = preg_replace('/[^[:alpha:]]/', '', $keyword);
if(strlen($keyword) >= $minLenght){
if(!array_key_exists($keyword,$keywordArray)){
$keywordArray[$keyword] = 1;
}
else{
$keywordArray[$keyword] += 1;
}
}
}
// Sort the keywords
arsort($keywordArray);
// Take only the number of keywords set in the config
$keywordArray = array_slice($keywordArray,0,$numberOfKeywords);
return strtolower(implode(',',array_keys($keywordArray)));
}
?>
<html>
<head>
<title>Google sitelinks</title>
<meta name="keywords" content="{$keywords}" />
</head>
<body>
<h1>Google sitelinks</h1>
<p>Last year google introduced the sitelinks for pages that seem to have very relevant content for the search query entered by the user. These sitelinks for those who are not yet familiar with <strong>sitelinks</strong>, are an extra set of links from the first listing in the serps that give the user extra topics, information from this top ranking site, an example can be found here <a title="google sitelinks for cnn" href="http://www.google.com/search?hl=en&q=cnn&btnG=Google+Search">google sitelinks for cnn</a>, it seems that for the query cnn google thinks that cnn.com is so important that extra links from this site should be displayed. </p>
<h3>Do we love sitelinks?</h3>
<p>In one word we do, it gives your already top ranking site a very trustworthy feeling. People seem to click a lot more on sites containing the sitelinks than sites that don't posses these links. </p>
<h3>Sitelinks bug</h3>
<p>It seems that google is failing to remove the second listing of a site just after the sitelink listing. So after the sitelink listing for cnn you still get a regular listing for cnn.com, hopefully this will be removed soon as it makes it look a bit spammy. </p>
<h3>Which sites get the sitelinks?</h3>
<p>It seems that only searches for trademarks get the sitelinks E.G cnn,hp,ebay,intel.... you get my drift. Do a search for any major online trademark and sitelinks seem to appear out of the gloom. What makes these sites so special that they need to get a sitelink listing? Lets take look into this very interesting topic to see if we can find some clues. Lets start with a quote from the google webmaster help center</p>
<p><em> The links shown below some sites in our search results, called Sitelinks, are meant to help users navigate your site. Our systems analyze the link structure of your site to find shortcuts that will save users time and allow them to quickly find the information they're looking for. <br />
<br />
We only show Sitelinks for results when we think they'll be useful to the user. If the structure of your site doesn't allow our algorithms to find good Sitelinks, or we don't think that the Sitelinks for your site are relevant for the user's query, we won't show them.<br />
<br />
At the moment, Sitelinks are completely automated. We're always working to improve our Sitelinks algorithms, and we may incorporate webmaster input in the future.<br />
<strong>Source:</strong> <a title="Google webmaster help center" href="http://www.google.com/support/webmasters/bin/answer.py?answer=47334&topic=8523">Google webmaster help center</a></em></p>
<p>It seems that google have a nice new algorithm to calculate the site link factor for a site. A few things that I have noticed while working on sitelinked web sites are:</p>
<ul>
<li>Sitelinks appear only on the first serp listing.</li>
<li> Sitelinks appear only for a <em>"trademark</em>" search.</li>
<li> Sitelinks appear to still be buggy.</li>
<li> Sitelinks appear only on sites that are very old and have a decent amount of content.</li>
<li> Sitelinks appear mainly on authority sites.</li>
<li> Sitelinks are still a mystery!</li>
</ul>
<p><strong>How does Google calculate Sitelinks?</strong></p>
<p>Google claims that the Sitelinks are created automatically. We have several theories as to how google calculates the sitelinks. These theories are based on my best guess. </p>
<ul>
<li>Google might track the number of clicks for different results. If a web site gets a lot of traffic for a special keyword then the web site will get Sitelinks on Google's result page.
<br />
<em>For example, if you use a special trademark term on your web pages that cannot be found on other web sites then many people will click on your web site in Google's results when they search for that search term. It's likely that your web site will get Sitelinks for such a search term.</em><br />
</li>
<li>The link architecture of a web site might help. Links at the top of the HTML source of a web site seem to have a better chance to be included as Sitelinks.
<br />
</li>
<li>Google might use the Google toolbar to determine Sitelinks. The more often a page is book marked the more likely it is that these pages will be used as Sitelinks. Google's toolbar can collect a lot of information about a web site.</li>
<li>Google might be able to filter out the "trademark" trademarks seem to have the highest risk of obtaining sitelinks. </li>
</ul>
<h3>Final sitelink thoughts</h3>
<p>At the moment spending time in order to obtain sitelinks it time not well spend. Your could better spend time to obtain authority site status which gets you a lot of traffic and might in the end get you a sitelink listing. It is how ever so uncertain how to get these sitelinks that it is not advisable to target your site for them. <br /> </p>
<p>
Author: <a href="/remco">Remco</a><br />
Last update: <b>2006-12-04 16:07:06</b> <i>Europe/Amsterdam</i>
</p>
</body>
</html>
<?php
// Get the text from the output buffer and clean the buffer
$content = ob_get_clean();
// Extract the keywords from the text
$kewords = extractKeywords($content,5,20,2,10);
// Put the keywords inside the text
$content = str_replace('{$keywords}',$kewords,$content);
// Send the text including the keywords to the browser
echo $content;
?>