I guess i need a "Calais for dummies" section. I'm not a programmer, and not familiar with scripting languages. I have no experience with RDF, but i do have web pages in HTML and am interested in sematically enhancing them.
I suppose one basic question--i already have dublin core metatags in my header for the most important entities in my pages, are any semantic technologies taking advantages of those or do they only pay attention to RDF?
I'm assuming that the RDFs will live somewhere in the web page? How does that work. Here are two things i tried. First, i tried to submit the HTML file to the viewer (when i could get the viewer to work!), to see if the resulting file would be something you could then re-post to the web. I copied and pasted it in. It seems to have changed all the angle brackets into what look like escape characters, so the resultant file is no longer valid HTML and isn't viewable in a browser. For example, here is a sample HTML paragraph from the original file:
<p>Big news--<i><b>Beyond the Rocks</b></i> has been found! This is Swanson's only film with Rudolph Valentino and was long thought lost. Check the link to the half-hour Dutch news story with clips <a href="http://cgi.omroep.nl/cgi-bin/streams?/tv/vpro/ram/bb.20040418.rm?start=50.30&end=30:48.80&title=Beyond%20the%20rocks%20teruggevonden">here</a></p>
and here is what Calais returned:
<p>Big news--<i><b>Beyond the Rocks</b></i> has been found! This is Swanson's only film with Rudolph Valentino and was long thought lost. Check the link to the half-hour Dutch news story with clips <a href="http://cgi.omroep.nl/cgi-bin/streams?/tv/vpro/ram/bb.20040418.rm?start=50.30&end=30:48.80&title=Beyond%20the%20rocks%20teruggevonden">here</a></p>
At the end of the file was :
</Body></Document>]]></c:document><c:externalMetadata/><c:submitter>calaisbridge</c:submitter>
and some RDF stuff, but none looked like entities. So apparently Caliais isn't intended to run on a raw HTML file?
Ok, so this time i copied and pasted just the text from my browser window into the Calais viewer. What was returned was:
terms of service stuff
<!--Relations: PersonPolitical
Facility: Silent Ladies photo gallery
Movie: The Love of Sunya, Don't Change your Husband, Why Change Your Wife?
IndustryTerm: Internet Movie Database
Person: Rudolph Valentino, Sadie Thompson, Denny Jackson, Phillip Oliver, Norma Desmond, Gloria Swanson, Alfred A. Knopf, Kelly
City: New York
ProvinceOrState: New Jersey--><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:c="http://s.opencalais.com/1/pred/"><rdf:Description c:allowDistribution="true" c:allowSearch="true" c:externalID="calaisbridge" c:id="http://id.opencalais.com/GsVvBaqIsdZ86gFD5HTMJA" rdf:about="http://d.opencalais.com/dochash-1/5b4e1344-c0d1-36ab-8d8c-e9e3b8adffa3"><rdf:type rdf:resource="http://s.opencalais.com/1/type/sys/DocInfo"/><c:document><![CDATA[<Document><Title>1213199560587-3CB60A39-4696</Title><Date>2008-06-11</Date><Body>Gloria Swanson
Following this was my original text. This was followed by:
</Body></Document>]]></c:document><c:externalMetadata/><c:submitter>calaisbridge</c:submitter>
and some stuff similar to what was returned with the HTML file. THis was followed by lots of RDF-tagged stuff which did recognizably relate to entities in the file, though apparently in random order. Buried i there i think i found my paragraph again:
Big news--Beyond the Rocks has been found! This]</c:detection><c:offset>81</c:offset><c:length>14</c:length></rdf:Description><rdf:Description rdf:about="http://d.opencalais.com/dochash-1/5b4e1344-c0d1-36ab-8d8c-e9e3b8adffa3/Instance/12"><rdf:type rdf:resource="http://s.opencalais.com/1/type/sys/InstanceInfo"/><c:docId rdf:resource="http://d.opencalais.com/dochash-1/5b4e1344-c0d1-36ab-8d8c-e9e3b8adffa3"/><c:subject rdf:resource="http://d.opencalais.com/pershash-1/c71c407d-b829-3a41-bfc4-45013644eed7"/><!--Person: Gloria Swanson--><c:detection>[news--Beyond the Rocks has been found! This is ]Swanson['s only film with Rudolph Valentino and was long]</c:detection><c:offset>148</c:offset><c:length>7</c:length></rdf:Description><rdf:Description rdf:about="http://d.opencalais.com/dochash-1/5b4e1344-c0d1-36ab-8d8c-e9e3b8adffa3/Instance/13"><rdf:type rdf:resource="http://s.opencalais.com/1/type/sys/InstanceInfo"/><c:docId rdf:resource="http://d.opencalais.com/dochash-1/5b4e1344-c0d1-36ab-8d8c-e9e3b8adffa3"/><c:subject rdf:resource="http://d.opencalais.com/pershash-1/c71c407d-b829-3a41-bfc4-45013644eed7"/><!--Person: Gloria Swanson--><c:detection>[the half-hour Dutch news story with clips here
So, how do i actually make use of this metadata? Is there something i'm supposed to copy and paste into the HTML file of my web page? All the RDF? Some of them? The list-like text that named the entites that was just after the terms of service stuff at the beginning of the file? Those don't appear to be in RDF format. If i do paste it into my HTML file, where does it go? In the header? After the /body but before the /html tag? Is there some other sort of tag that should be used?
Thanks for your patience in reading this and sorry for the really dumb questions. But for those of us not in the field, it's very difficult to know where to begin dipping our toes into the semantic web.
sincerely
Greta de Groat

Comments
Hi,
Thanks for your reply, though it's still a bit technically advanced for me. Are those input and output settings you mention, are they only available in the application or are they available using the document viewer? I was using the latter, because i'm not sure i have enough web pages for it to be worth my while learning how to use the program. But i don't see any alternative settings on that, you either get the RDF format or the text with your entities highlighted. But i don't see an option for a microformat output. Looking at the Marmoset page, what little i can figure out from the examples and from looking at a few microformats they seem pretty semantically limited.
Seems like it would be simpler just to have a few RDF templates for the various type of entities and just enter them by hand. It looks like the microformats are supposed to live in the HTML header. Is that where the RDFs are supposed to live? I'm still not sure what you do with your RDFs of the web page.
Also, what is this:
that was returned in my previous question? It seems to be a list of entities, but is it something that's functional as far as Yahoo is concerned or is it just a simple list? It's certainly a simpler format but it looks more like a comment
thanks
greta
Hi,
The list of entities/facts you see in the header of the RDF is indeed a comment. It gives you a quick view of the entities/facts in the input text without any reference to the text.
The document viewer is using RDF only. However you can use our applications for Simple Format and JSON in http://www.opencalais.com/node/1229 and http://www.opencalais.com/node/289. These samples have also a client that visually shows the entities/facts but uses Simpler output format or JSON format.
Another option to extract metadata in simple output format or microformats is to use the submission tool http://www.opencalais.com/node/307 and setting the outputFormat for these options according to our documentation.
Microformats need to live inside the HTML as you mentioned (you will need to add them to your HTML or better to use our Marmoset tool for this).
The RDF is very powerful and includes a lot of info about each entity and you can make use of it if you run a repository of all the metadata or your pages and can do some linking, analytics and other relations between metadata of different pages you have. You can decide what you want to embed inside your pages.
Simple format will be very useful for you if you just needs the list of entities/facts in a simpler representation.
Hope this helps.
Ofer
Hi,
When sending HTML, pls make sure the input format is TEXT/HTML. For better results it is better to send pure text you want to annotate to Open Calais.
We do not embed the tags inside the original text. We leave this to you to decide how to implement that and how to link the tags to other content you have.
The RDF is a very detailed format of presenting the extracted metadata, enables you to easily detect the entities in the original text and do different types of things with it.
For simplicity you can use the outputFormat=Text/Simple and get a more simplified presentation of the tags extrated from Open Calais.
To make your web page searchable by Yahoo!, use the outputFormat=Text/Microformats and embed the microformats tags inside your page. You can use our Marmoset documentation as a sample for how to embed the microformats or to read in microformats.org. There are also plugins for Firefox you can use that show microformats that are embedded in the HTML page.
Hope this helps.
Ofer