Problem with Marmoset
Problem with Marmoset
Posted on: Sat, 10/04/2008 - 04:02
Hi,
posted this also in the SIG: Developers forum, not sure exactly where our query should go so my apologies in advance:
we're trying to set up the calais marmoset injector but are encountering this error:
Warning: Invalid argument supplied for foreach() in /mnt/local/home/cameran/babasgourmet.com/calaismf/common.php on line 92
it's on a test domain right now. does anyone know exactly
what the problem is and what is causing this? we're trying to
implement it on a wordpress blog.
thanks a ton,
cameran

Comments
Marmoset is intended for use with plain PHP websites - that is, not within a framework. Future versions of Marmoset or other OpenCalais tools may support attaching Microformats within PHP-based frameworks.
Hi,
Normally this means that the value passed to foreach is null or not an array.
To help resolve the problem, can you please paste the following 3 lines before line 92 in common.php (it's the only line with 'foreach' in the file) and post the output you get?
echo CALAISMF_INCLUDE_DIR;
echo "***";
var_dump($calaismf_UserAgentSubstringList);
In addition, can you post the content of your conf.php file in the same directory (without your API key)?
Thanks,
Shai Dagan
Hi,
Great thanks for responding, I really appreciate it. Here's the info you have requested:
Output I got:
/mnt/local/home/cameran/babasgourmet.com/calaismf/***NULL
Content (shortened by removing top comments) of my conf.php:
/*
* The Calais API key used in calls to the Web service.
*
* ***NOTE*** You must change this value to a valid key. To obtain an API key
* see http://www.opencalais.com
*/
$calaismf_APIKey = "my api key";
/*
* Timeout in seconds for the Calais Enlighten call
* This HTTP POST request is made when the injector identifies the caller
* as a search robot. If this timeout expires the original page is
* returned to the robot.
*/
$calaismf_InvokeEnlightenTimeout = 60;
/*
* List of substrings to look for in the caller's User-Agent field, in order
* to determine whether it is a search robot.
*
* Any caller with a User-Agent that includes one of the strings in this
* array, will be treated as a search robot, and the injector will place
* the Microformats in the returned page.
*
* Comparison is case sensitive.
*
* Initially only Slurp is included (identifies Yahoo's crawler)
*/
$calaismf_UserAgentSubstringList = array("Slurp");
/*
* Identification of OpenCalais Microformats Injector in log output
*/
$calaismf_LogIdentity = "CalaisMFInjector";
/*
* reltagBaseURL is used as the base URL for OpenCalais
* generated Rel-Tags microformats. (More information about OpenCalais
* microformats support can be found in the OpenCalais API
* documentation at http://www.opencalais.com/page/documentation).
*
* Rel-tags generated by Calais will be injected into your page when
* requested by a search robot (HTML elements). By default the
* link's URL (href) will equal '/Tag' where Tag is the identified
* Calais rel-tag.
*
* If you'd like to attach a prefix to the tag, put it here. If you
* change this value to "http://www.example.com", the links for Tag
* will now point to http://www.example.com/Tag
*/
$calaismf_ReltagBaseURL = "http://www.mydomain.com";
/*
* THE VALUES BELOW ARE FOR INTERNAL USE - PLEASE DO NOT CHANGE
*/
$calaismf_OverrideDefaultURL = false;
$calaismf_CalaisWebServiceURL = "";
$calaismf_VerifySSLCertificates = true;
?>
Thanks a lot for responding, I really appreciate it! Please let me know what we need to do if possible.
Cameran
The NULL in the end of the output you receive indicates the variable $calaismf_UserAgentSubstringList is NULL. It should be an array of strings with a single string - 'Slurp'.
The NULL value suggests that the variable was not defined, although in the conf.php file you posted it is defined properly. There can be several reasons for this. Can you please post the content of your common.php file?
Thanks,
Shai Dagan
Sure, thanks so much for helping out. I really appreciate it.
Here is the content of my common.php file, except for the comments at the beginning:
require_once (CALAISMF_INCLUDE_DIR . "conf.php");
/**
* Returns true if the caller is to be treated as a search robot
*/
function calaismf_is_search_robot()
{
global $calaismf_UserAgentSubstringList;
static $hasanswer = false;
static $savedanswer = false;
/*
* Save time on the 2nd call
*/
if ($hasanswer)
{
return $savedanswer;
}
/*
* First check for the overrideUserAgent flag (testing flag)
*
* To view the page as seen by search robots call
*
* page.php?overrideUserAgent=true
*
*/
$override = (isset($_GET["overrideUserAgent"]) &&
($_GET["overrideUserAgent"] == "true"));
if ($override)
{
$hasanswer = true;
$savedanswer = true;
return true;
}
/**
* Check the user agent against the configured list
*/
if (!isset($_SERVER["HTTP_USER_AGENT"]) ||
$_SERVER["HTTP_USER_AGENT"] == "")
{
$hasanswer = true;
$savedanswer = false;
return false;
}
$ua = $_SERVER["HTTP_USER_AGENT"];
echo CALAISMF_INCLUDE_DIR;
echo "***";
var_dump($calaismf_UserAgentSubstringList);
foreach ($calaismf_UserAgentSubstringList as $substring)
{
if (strpos($ua, $substring) != FALSE)
{
$hasanswer = true;
$savedanswer = true;
return true;
}
}
$hasanswer = true;
$savedanswer = false;
return false;
}
/**
* Logs an error to the user log
*/
function calaismf_log_error($error)
{
global $calaismf_LogIdentity;
define_syslog_variables();
openlog($calaismf_LogIdentity, 0, LOG_USER);
syslog(LOG_ERR, $error);
closelog();
}
?>
I have a theory that might explain this. Can you check the following: Copy the following line immediately after the require_once line in the beginning of common.php (the one that includes conf.php) and post the output.
var_dump($calaismf_UserAgentSubstringList);
Thanks,
Shai Dagan
Hi,
Thanks again for all the help. I pasted the line in common.php, and here is the output I got:
array(1) { [0]=> string(5) "Slurp" } ./calaismf/***NULL
Cameran
OK, this confirms it.
As you can see the same variable is properly defined outside the function is_search_robot() and is NULL inside the function.
This means the original definition was not in global scope, but inside another function.
When using Marmoset you have to include the header.php file at the very beginning of the requested PHP file, and the footer.php at the end of the requested PHP file. By requested PHP file, I mean the file that is requested by the HTTP request (usually the one in the address bar of the browser), unlike other PHP files that this file includes using include and require statements. If the requested PHP file can be one of several, you have to add the require statements at the top and bottom of each of these files.
In your case, I believe Marmoset code is included from within another function (inside your code), which means that global variables declared by Marmoset become local to your function. For this reason they cannot be used within the is_search_robot() function.
For example, suppose the PHP file that you wish to run through Marmoset is a.php with the following content:
function f() {
require_once("b.php");
... some other code ...
}
And suppose you added the require statements for header.php and footer.php at the beginning and end of b.php.
If you open b.php in your browser, Marmoset will work, but if you open a.php in your browser, the globals defined by Marmoset will all be local to function f(), causing the error that you have.
The solution, in this example, would be to add the header.php and footer.php to both a.php and b.php, and then both would work.
Does this resolve the problem?
Thanks,
Shai Dagan
Hi,
We've inserted the code from marmoset at the beginning and end of our header and footer.php files (this is for Wordpress). Unfortunately that doesn't seem to solve the problem. The code we're using is the one asked for on the Marmoset page - http://www.opencalais.com/Crawler.
Do you mean we should insert the entire contents of the calais header.php into our own header.php and the entire contents of the calais footer.php into our footer.php?
Thanks and please let me know,
Cameran
Unfortunately the current version of Marmoset is intended for use in plain PHP, i.e. not within frameworks such as Wordpress. This might change in future versions. I did not fully appreciate the meaning of working within Wordpress. Marmoset expects the entire HTML page and your PHP page seems to be only a part of the entire page returned to the browser/robot.
I can suggest the following workaround, but note that it was tested only briefly. It would be helpful if you post your experience with this workaround.
To apply it there are two changes to make:
1. Move variables to global scope
2. Allow injection to partial page
Step 1:
In conf.php change the declaration of all parameters (any non-comment line) as described below.
In footer.php change the declaration of 5 parameters in the beginning of the file (the first one is $calaismf_inject_arr and the last one is $calaismf_parse_err) as described below.
Replace each line of the form:
$name = some_value;
With the two lines:
---START---
$GLOBALS['name'] = some_value;
global $name;
---END---
Step 2:
In footer.php find:
---START---
if ($afterbodyidx < 0)
{
calaismf_log_error("Failed to find body element in original page");
echo $origpage;
}
else
{
/*
* Success - glue it all together and inject
*/
$injectstr = implode("", $calaismf_inject_arr);
echo substr($origpage, 0, $afterbodyidx).$injectstr.
substr($origpage, $afterbodyidx);
}
---END---
And replace with:
---START---
if ($afterbodyidx < 0)
{
$afterbodyidx = 0;
}
/*
* Success - glue it all together and inject
*/
$injectstr = implode("", $calaismf_inject_arr);
echo substr($origpage, 0, $afterbodyidx).$injectstr.
substr($origpage, $afterbodyidx);
---END---
Thanks,
Shai Dagan