Here is a small list of several bugs I have noticed in the Entity Recognition for Persons. I know it is a hard task, so I do not blame anyone, calais works really well. But yet, it can be improved, so here is the list :

 This is obtained by pushing the absract from the yahoo search results of "Barack Obama" to calais


  • Comments

    I also noticed a weird thing : if i make an output text/simple "relevance" is not always an attribute
    of Persons in the xml file. I do not have the entry, sorry...
    I can provide i you ask

    Here is an example :

    Claude Zervas
    Tom Baldwin
    California
    General Hospital

    Hi,

    Please send us the documents or linked you used so we can check both issues.
    Tx,
    Ofer

    In the file you can find the input and the output ( maybe not in that order, but I believe you will figure out who is who, and do not forget to watch the source ie Ctrl-U, much more readable :D ) :

    First bug : Tom Baldwin is notified as personn but without a "relevance" attribute. This sounds really like a bug so...
    http://ks36587.kimsufi.com/calaisbug/bug1.html

    Second and third bug are rather mistakes of the entity recognition tools

    "Women Voters" is not a personn ( but it is questionnable... )
    http://ks36587.kimsufi.com/calaisbug/bug2.html

    But "On Sunday" is definitely not a personn
    http://ks36587.kimsufi.com/calaisbug/bug3.html

    I found some others if you want, but I guess you'll be busy enough with these :D

    Hi,

    The Tom Baldwin issue is indeed a bug; thanks for noticing and letting us know! This will be fixed in the next version.

    Thanks,
    Orgad

    I forget to place the two last files. It is done.

    xml is deleted.... hope you'll get it...