Named Entity Recognition Bugs
Named Entity Recognition Bugs
Posted on: Wed, 10/15/2008 - 15:49
Here is a small list of several bugs I have noticed in the Entity Recognition for Persons. I know it is a hard task, so I do not blame anyone, calais works really well. But yet, it can be improved, so here is the list :
This is obtained by pushing the absract from the yahoo search results of "Barack Obama" to calais


Comments
I also noticed a weird thing : if i make an output text/simple "relevance" is not always an attribute
of Persons in the xml file. I do not have the entry, sorry...
I can provide i you ask
Here is an example :
Hi,
Please send us the documents or linked you used so we can check both issues.
Tx,
Ofer
In the file you can find the input and the output ( maybe not in that order, but I believe you will figure out who is who, and do not forget to watch the source ie Ctrl-U, much more readable :D ) :
First bug : Tom Baldwin is notified as personn but without a "relevance" attribute. This sounds really like a bug so...
http://ks36587.kimsufi.com/calaisbug/bug1.html
Second and third bug are rather mistakes of the entity recognition tools
"Women Voters" is not a personn ( but it is questionnable... )
http://ks36587.kimsufi.com/calaisbug/bug2.html
But "On Sunday" is definitely not a personn
http://ks36587.kimsufi.com/calaisbug/bug3.html
I found some others if you want, but I guess you'll be busy enough with these :D
Hi,
The Tom Baldwin issue is indeed a bug; thanks for noticing and letting us know! This will be fixed in the next version.
Thanks,
Orgad
I forget to place the two last files. It is done.
xml is deleted.... hope you'll get it...