Here's an interesting question for you. When we throw text at OpenCalais we like the entity extraction we get - however - there's a problem in how entities are (not) normalized. If you look at the below - you'd assume that you'd be able to normalize this to one single entity "Microsoft" ? I realize that there could be other problematic situations with local company entities (Microsoft Israel, Microsoft Ireland, etc) - but this one should be a given? We've found many other similar examples.

 Is it a bug or does the system not attempt to "normalize" for example company names back to one single entity? Or is it something we're doing wrong? Can we get "entity normalization" in some way we don't realize?

http://d.opencalais.com
/comphash-1/49bf454b-3fed-3244
-94fc-b3d5115f7df4

Company Microsoft
http://d.opencalais.com
/comphash-1/6ebf4eca-11af-3728
-ae8b-1013c7ee8b5b

Company Microsoft Corp.
http://d.opencalais.com
/comphash-1/95432728-abd1-35b0
-9ddb-5abf80eb0d43

Company Microsoft Corp

BTW - I wish I had the text for this very example, but we've had the issue over and over...

 thanks for any input!

B


Comments

Hi there guys - wanted to check if you had any input on this issue? It seems pretty important... (to us of course :-)

bobjones -

Apologies for the delayed response.
You'll be happy to hear that Company Normalization is already in our labs and we're planning to release this functionality pretty soon - maybe even this quarter.
Note that even today OpenCalais supports name normalization - but it happens only at the article level (e.g. Microsoft and Microsoft Corp. mentioned in the same document - will be normalized the same way) and not cross-articles.
We're planning the same functionality for geographies as well, so "U.S." "U.S.A" and "United States of America" will all point to the same URI.

Michal

Hi there - any update on when this improved normalization functionality might be coming? thanks!

bobjones -
We are going to announce pretty soon -- probably in a matter of days -- the scope for OpenCalais upcoming releases. One of them will include improved normalization and disambiguation capabilities.
Please check our blog in the next few days - it will be posted there.

Michal

Good to hear! Keep us posted.

I've another normalization issue regarding duplicate values. City names like "Amsterdam" and "AMSTERDAM" are treated as two different city values.

Below is a link to a test-drive application. When you enter a stock symbol, a request to the Yahoo search web service is made for the latest 10 news entries on this company and this resultset (item title + summary) is forwarded to OpenCalais (results are shown in the "semantics" panel):

http://prima.appspot.com

Just enter some values (e.g. "fortis", "^aex") and look at the city semantics. And the handshake with OC would be even more responsive if native JSON output would be supported, as requested here:

http://opencalais.com/node/2612

And here's a default input set for OC:

[{"Title": "(AFX UK Focus) 2008-06-30 10:39 Amsterdam shares TFN market data at 11.16 a.m. - Financials lead down", "Summary": "AMSTERDAM (Thomson Financial) - AMSTERDAM (Thomson Financial) - Market data at 11.16 a.m.", "Url": "http://www.iii.co.uk/news/?type=afxnews&articleid=6784548&action=article", "ClickUrl": "http://www.iii.co.uk/news/?type=afxnews&articleid=6784548&action=article", "NewsSource": "Interactive Investor", "NewsSourceUrl": "http://www.iii.co.uk/", "Language": "en", "PublishDate": "1214819178", "ModificationDate": "1214819180"}, {"Title": "Europe markets closes mostly lower", "Summary": "LONDON: European stock markets closed mostly lower yesterday, extending losses in a global sell-off sparked by growing concerns about the economic outlook as oil prices soared to fresh records above $142.", "Url": "http://www.gulf-times.com/site/topics/article.asp?cu_no=2&item_no=226900&version=1&template_id=48&parent_id=28", "ClickUrl": "http://www.gulf-times.com/site/topics/article.asp?cu_no=2&item_no=226900&version=1&template_id=48&parent_id=28", "NewsSource": "Gulf Times", "NewsSourceUrl": "http://www.gulf-times.com/", "Language": "en", "PublishDate": "1214632643", "ModificationDate": "1214632644"}, {"Title": "Global markets roil as oil prices rocket", "Summary": "US stock markets have closed lower on Friday, extending losses in a global sell-off sparked by growing concerns about the economic outlook as oil prices soared to fresh records above $US142.", "Url": "http://www.thewest.com.au/aapstory.aspx?StoryName=493993", "ClickUrl": "http://www.thewest.com.au/aapstory.aspx?StoryName=493993", "NewsSource": "The West Australian", "NewsSourceUrl": "http://www.thewest.com.au/", "Language": "en", "PublishDate": "1214604627", "ModificationDate": "1214604628"}, {"Title": "Global markets roil as oil prices rocket", "Summary": "US stock markets have closed lower on Friday, extending losses in a global sell-off sparked by growing concerns about the economic outlook as oil prices soared.", "Url": "http://au.biz.yahoo.com/080627/2/1t4w5.html", "ClickUrl": "http://au.biz.yahoo.com/080627/2/1t4w5.html", "NewsSource": "AAP via Yahoo!7 Finance", "NewsSourceUrl": "http://au.biz.yahoo.com/financenews/", "Language": "en", "PublishDate": "1214604012", "ModificationDate": "1214604286"}, {"Title": "FTSE just bucks downward trend", "Summary": "EUROPEAN stock markets closed mostly lower overnight, extending losses in a global sell-off sparked by growing concerns about the economic outlook as oil prices soared to fresh records above $US142.", "Url": "http://www.news.com.au/heraldsun/story/0,21985,23935662-5005961,00.html?from=public_rss", "ClickUrl": "http://www.news.com.au/heraldsun/story/0,21985,23935662-5005961,00.html?from=public_rss", "NewsSource": "Herald Sun", "NewsSourceUrl": "http://www.heraldsun.news.com.au/", "Language": "en", "PublishDate": "1214601240", "ModificationDate": "1214602771"}, {"Title": "(AFX UK Focus) 2008-06-27 17:43 Benelux shares close lower on high oil prices; Fortis bucks trend UPDATE", "Summary": "(updating with full report)", "Url": "http://www.iii.co.uk/news/?type=afxnews&articleid=6783455&subject=markets&action=article", "ClickUrl": "http://www.iii.co.uk/news/?type=afxnews&articleid=6783455&subject=markets&action=article", "NewsSource": "Interactive Investor", "NewsSourceUrl": "http://www.iii.co.uk/", "Language": "en", "PublishDate": "1214587032", "ModificationDate": "1214587034"}, {"Title": "Benelux shares close lower on high oil prices; Fortis bucks trend UPDATE", "Summary": "(updating with full report) AMSTERDAM (Thomson Financial) - Benelux shares closed lower on worries surrounding rising oil prices, with Fortis bucking the trend.", "Url": "http://www.sharewatch.com/story.php?storynumber=366914", "ClickUrl": "http://www.sharewatch.com/story.php?storynumber=366914", "NewsSource": "Sharewatch", "NewsSourceUrl": "http://www.sharewatch.com/", "Language": "en", "PublishDate": "1214584825", "ModificationDate": "1214584827"}, {"Title": "STOCKWATCH Macintosh lower after H1 opg profit warning", "Summary": "AMSTERDAM (Thomson Financial) - Shares in Macintosh were lower in morning trade on Friday after the Dutch retailer gave a first-half profit warning. At 9.25 a.m., the stock was down 2.67 percent at 14.60 euros. The AEX was up 0.30 percent at 427.32 points.", "Url": "http://www.sharewatch.com/story.php?storynumber=366347", "ClickUrl": "http://www.sharewatch.com/story.php?storynumber=366347", "NewsSource": "Sharewatch", "NewsSourceUrl": "http://www.sharewatch.com/", "Language": "en", "PublishDate": "1214552449", "ModificationDate": "1214552450"}, {"Title": "Europe share marts close sharply lower", "Summary": "LONDON: European stock markets closed sharply lower yesterday as news of a large cash call by Fortis hit the banks and investors fretted about the prospect of higher interest rates to curb inflation.", "Url": "http://www.gulf-times.com/site/topics/article.asp?cu_no=2&item_no=226695&version=1&template_id=48&parent_id=28", "ClickUrl": "http://www.gulf-times.com/site/topics/article.asp?cu_no=2&item_no=226695&version=1&template_id=48&parent_id=28", "NewsSource": "Gulf Times", "NewsSourceUrl": "http://www.gulf-times.com/", "Language": "en", "PublishDate": "1214549866", "ModificationDate": "1214549867"}, {"Title": "Fortis shares plummet after massive cash call", "Summary": "THE HAGUE (AFP) - Fortis shares slumped more than 19 percent Thursday after the Belgian-Dutch bank announced plans to raise eight billion euros (12.5 billion dollars) to help it cope with the global credit crunch.", "Url": "http://dailynews.muzi.com/news/ll/english/10073144.shtml", "ClickUrl": "http://dailynews.muzi.com/news/ll/english/10073144.shtml", "NewsSource": "Muzi", "NewsSourceUrl": "http://dailynews.muzi.com/", "Language": "en", "PublishDate": "1214519821", "ModificationDate": "1214519822"}]

focusfriend -

I tried the "fortis" example on prima.appspot.com but I got just one version of Amsterdam. ( I guess someone changed the app to be case-insensitive..?!)
To clarify, OpenCalais will generate the same URI for mentions like Amsterdam vs. AMSTERDAM (although in the RDF, each InstanceInfo element might show different capitalization).

If you look in my response to bobjones, you'll also learn about our plans to release a geography normalization/resolution functionality, that will "know" to resolve Amsterdam, The Netherlands vs. Amsterdam, NY as two different entities.

Hope this helps.

Hi there,

I have a rather short question: do you use a dictionary for company name canonization or is this achieved fully automatically?

Thanks and best Regards

Markus

Markus:

We'll be releasing company normalization in the next few weeks and will share a little bit about what goes into making it happen at that time. Dictionaries are only one part of the solution.