I have spotted a annoying bug.

The xml output is not necesseraly properly formatted. Meaning if a name spotted by Calais contains &lt; it will output < and break the file.


Comments

It took me quite a while but here is the file.

http://ks36587.kimsufi.com/calaisbug/bug5.html

I had the same sort of problem with double quotes.
The unnormalized form of certain compagnies may have double quotes,
and that makes the parser crash ( the xml is no longer properly formed ).

Hi Genz,

From a first glance at your output, it seems that you are unescaping the output XML,
which leads to this errors.

I suspect this is the case because of the "M&A" text:
We do not output it lie this:

<Event count="3">M&A</Event>

We output it like this:

<Event count="3">M&amp;A</Event>

To achieve XML validity.

My guess is that you used to work with our older HTTP-Post interface,
which returned the results in a SOAP-style, where you had to unescape.

Well - this is old - please make sure NOT to escape the output XML.

Could you please confirm here that this was the case ?

Thanks and HTH
Meir

Hi,
Can you send us the XML/RDF file AND the original text you sent?
We will investigate this.
Thanks,
Ofer