Welcome

Calais Marmoset is a simple yet powerful tool that makes it easy for you to generate and embed metadata in your content in preparation for Yahoo! Search's new open developer platform, SearchMonkey, as well as other metacrawlers and semantic applications.

In order to use Marmoset you must manage your own web site and be proficient in installing some simple PHP code templates.

If this doesn't sound like you - wait a little bit. The Marmoset capabilities will be deployed over the coming months via a variety of means that will require less technical knowledge to deploy.

Be sure to read the documentation and then download the Marmoset package.

 

Introduction

The Calais Microformats Injector provides a simple way to make your Web site's contents available through intelligent search.

Intelligent Search

Search engines allow filtering documents based on keywords that appear in a Web site. One word can refer to several different things. For example, when you search for 'Washington' using a search engine, results will likely relate to the city, the state and the person.

Intelligent search allows users to specify with greater precision what they are looking for. For example, a user may ask for results relating to 'Washington', but only when it refers to the state.

OpenCalais Semantic Analysis

The OpenCalais Web service can analyze text and provide rich semantic data to provided text. For example, when the word 'Washington' appears in the text, the OpenCalais Web service can determine whether it refers to the city, the state or the person.

Rich Content on Your Web Site

Using OpenCalais you can provide search engine crawlers with rich semantic data to consider when they index your page. Yahoo!'s search engine analyzes semantic data provided in Microformats, and other search engines are likely to follow.

The Calais Microformats Injector allows you to attach the Microformats data into dynamic Web pages on your site with no development effort on your part. As a result, users accessing your Web site through search engines will get better targeted results.

Back to top

How It Works

In order to do that the Calais Microformats Injector identifies page requests made by search robots. For requests made by browsers your Web site content will be returned unchanged.

When a search robot is identified, the Calais Microformats Injector invokes the OpenCalais Web service and retrieves rich semantic data for the requested page. It then injects the resulting Microformats into the original Web page and returns the result to the search robot.

Search engines that analyze the Microformats can offer intelligent search for your Web site.

Example

Consider, for example, the following PHP page:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Your Page Title</title>
</head>
<body>
<div>Your Page Contents</div>
<?php Your PHP Code ?>
</body>
</html>

When a browser requests the page, it will be returned as is. However, when a search robot requests the page, the following result will be sent instead:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Your Page Title</title>
</head>
<body>
<div class="vcard">
<span class="fn">A person from your page</span>
<div class="org">The person's organization</div>
<div class="title">The person's title</div>
</div>
<a href="Link to organization" rel="tag">Organization's name</a>
<div>Your Page Contents</div>
<?php Your PHP Code ?>
</body>
</html>

As you can see, semantic data is injected at the beginning of the HTML <body> element.

Back to top

Using Calais Microformats Injector

License

The Calais Microformats Injector is open-source and released under the BSD license (see LICENSE file in package).

Back to top

System Requirements

In order to allow the Calais Microformats Injector to work, your Web server requirements are as follows:

  • Support PHP 4 or 5
  • CURL PHP extension should be enabled
  • The Web server should be allowed to establish outbound connections to http://api.opencalais.com/ (some Web hosting services disable outbound connections for security reasons, and allow such connections only upon specific request)

Back to top

Installation

Extract the files in the package you downloaded. Under the public directory you will find a directory called calaismf. Copy this directory to a public location on your Web server.

Example

If your public root folder is /var/www/html copy the calaismf directory under /var/www/html. You should now have a new directory called /var/www/html/calaismf.

Back to top

Configuration

Edit conf.php found under the calaismf directory that you copied to your Web server.

Find the line:

$calaismf_APIKey = "your-api-key-goes-here";

Replace the value within the double quotes with your OpenCalais API key. To obtain an API key please see http://opencalais.com/APIKey. Make sure the key starts immediately after the first double quotes and ends immediately before the second double quotes. Do not insert any space characters.

It is recommended that you leave other values in conf.php at their original values. If you wish to make changes, the following table explains each value.

Name Description When to change
$calaismf_InvokeEnlightenTimeout When invoking the OpenCalais Web service in order to inject semantic data for a search robot, this is the maximum time in seconds for the Web service to provide the results. If you see that your pages do not include the additional semantic data in your tests try increasing this value.
$calaismf_UserAgentSubstringList When a request is received the Calais Microformats Injector checks whether the HTTP User-Agent header field in the request includes one of the values in this list as a substring. If it does, the request is treated as a request by a search robot and Microformats data is injected. Otherwise, the original page is returned. If you wish to provide Microformats data to additional search robots, and you know the User-Agent used by the search robot you can add the User-Agent or a substring to this list. Add a comma after the last double quotes, and insert the value, enclosed by double quotes, before the closing parenthesis.
Important: Make sure the value you enter is specific, otherwise it might coincide with certain Web browsers, causing visitors to your site to see the Microformats data.
$calaismf_LogIdentity When errors occur the Calais Microformats Injector outputs them to the system log (under Windows you can see these values in the Event Viewer). The Calais Microformats Injector will use the value to identify itself when writing error messages. If you wish to see a different value in Calais Microformats Injector  log messages, change this value to any other value.
$calaismf_ReltagBaseURL This prefix will be attached to rel-tags returned inserted by the Calais Microformats Injector
(see section below for details)
See section below

NOTE: Additional values in conf.php are for internal use and should not be changed.

ReltagBaseURL

The $calaismf_ReltagBaseURL is used as the base URL for OpenCalais generated Rel-Tags microformats. (More information about OpenCalais microformats support can be found in the OpenCalais API Documentation).

Rel-tags generated by Calais meet the rel-tag definition (http://microformats.org/wiki/rel-tag) and appear in this format:

<a href=”http://www.YourWebSite.com/YourPath/TagName rel=”tag”>Tag</a>

Note that this URL MUST exist; therefore the part provided by you (www.YourWebSite.com/YourPath/) which is the value of $calaismf_ReltagBaseURL in conf.php, along with the Calais generated tag must create a valid URL.

In many cases this link would look like this:

http://www.MyWebSite.com/Tags?name=CalaisGenerateTag

Where $calaismf_ReltagBaseURL = www.MyWebSite.com/Tags?name=

Calais adds/attaches the suggested Tag.

Back to top

Activation

In order to activate the Calais Microformats Injector for the PHP pages you want, edit each PHP page and add the following:

At the beginning of the file (immediately, before any other text) insert the following lines.

<?php
if (!defined("CALAISMF_INCLUDE_DIR")) define("CALAISMF_INCLUDE_DIR", "path/to/calaismf/");
require_once (CALAISMF_INCLUDE_DIR . "header.php");
?>

Replace the words path/to/calaismf/with the actual path. Note that the path you specify is relative to the location of the PHP page available to the search robot, so if different PHP pages are placed in different locations, the header stub will have to be changed accordingly.

Important: Make sure you include the last forward slash ( / ) in the path.

At the end of the file (after any other text) insert the following line:

<?php require_once (CALAISMF_INCLUDE_DIR . "footer.php"); ?>

Once the header and footer are added, the Calais Microformats Injector is activated for the page.

Example

Consider the following PHP page:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>
<?php echo "Testing page for Microformats"; ?>
</title>
</head>
<body>
<?php

$h1 = "This is the Microformats test page";
$text = "Sample text Sample text Sample text Sample text Sample text Sample text";

echo "<h1>".$h1."</h1>\n";
echo "<div>\n".$text."\n</div>\n";

?>
</body>
</html>

In order to activate the Calais Microformats Injector for this page add the header and footer as follows:

<?php
if (!defined("CALAISMF_INCLUDE_DIR")) define("CALAISMF_INCLUDE_DIR", "../calaismf/");
require_once (CALAISMF_INCLUDE_DIR . "header.php");
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>
<?php echo "Testing page for Microformats"; ?>
</title>
</head>
<body>
<?php

$h1 = "This is the Microformats test page";
$text = "Sample text Sample text Sample text Sample text Sample text Sample text";

echo "<h1>".$h1."</h1>\n";
echo "<div>\n".$text."\n</div>\n";

?>
</body>
</html>
<?php require_once (CALAISMF_INCLUDE_DIR . "footer.php"); ?>

In this example we assume the page is located under /var/www/html/pages and the calaismf directory is at /var/www/html/calaismf.

Which Pages to Activate

Activate any PHP page for which semantic data can help. Keep in mind the following:

  • The page must be processed as a PHP page by the Web server (usually it means it has a .php extension)
  • The page should be reachable by search robots. This usually means you can follow links from your index page to reach it, or that you had submitted it to search engines specifically
    • There is no need to add the header and footer to PHP pages that are not visible, and are only included/required by other PHP pages
  • The page must return content of type text/html

Back to top

Verification

You can take the following steps in order to verify your installation and activation

  1. Consider a PHP page page.php that you previously activated by adding the header and footer
  2. Request the page using your browser (e.g., enter http://your-domain/page.php in your browser's address bar) - you should see the page as before
  3. Now request the page adding the overrideUserAgent parameter with value true (e.g., enter http://your-domain/page.php?overrideUserAgent=true) - you should see the page with the Microformats data
  4. Use your browser's 'View Source' or 'Page Source' option to see the injected Microformats. Search for '<body' to find the location

Back to top

Troubleshooting

In the event of an error the Calais Microformats Injector outputs errors using the PHP syslog() function. No errors are displayed on the resulting page in order to ensure search robots get the original PHP page (without semantic data) in such cases.

The location of these errors depends on your Web server's configuration. First, ensure that:

  • Logging errors is enabled in your Web server's configuration
  • Determine the location of the errors in your php.ini file
    • Normally this would either be a log file or the system log
    • In Windows based systems, system log messages can be viewed using the Event Viewer (Control Panel -- Administrative Tools)
    • Under Linux based systems system log messages can be viewed at /var/log/messages
  • In case the location is the system log, ensure that the user running your Web server process has sufficient permissions to write to the system log

Common Problems

Calling my page with ?overrideUserAgent=true returns the original page

This usually means that an error occurred. Use the information above to find the location of error messages, and obtain additional information about the error.

An error occurs, but there are no errors in the log

In this case, make sure that:

  1. You know the location where PHP's syslog() outputs error messages (this depends on your specific Web server and PHP configuration)
  2. The operating system user running the Web server has sufficient permissions to write to the system log (under Windows, a user belonging to group 'Guests' may not be allowed to write to the system log, causing error messages to disappear)

ERROR: Non-empty content is required

This usually means that the original PHP page you activated produces an empty page.

Enlighten ERROR: <optional text>

This means the invocation of the Calais Web service failed, normally due to temporary connectivity problems. Ensure your Web server is allowed to establish outbound connections to the Calais Web service (your hosting server provider may have disabled this for security reasons, and may enable it if you ask)

Back to top

AttachmentSize
CalaisMarmoset_08Dec17.zip22.66 KB
CalaisMarmoset_08Dec17.tar_.gz13.28 KB