Amieltech LLC
Home  ::   FAQs  ::   Docs.  ::   Network products & sign-in

This code simply reads thru XML documents and id's whether it can be relied upon for processing.

This code simply reads thru XML documents and id


I am noticing that a *very large* percentage of Governent Records stored as XML are not valid and unusable suddenly; used for purposes of searching across multiple documents to summarize, aggregate or other to extract facts from the record.

Below is the Expat code that I used to make this determination. It includes a source module, header file and Makefile. Build with GNU make. Code as provided compiles with clang and gnu C compilers. Requires Expat development libraries (click) be installed.

The code is here (click).

[update Oct 23, 2020]

3/4 way into downloading a complete set of records. 50,000+ documents. I've run the pre-process deal and have Zero bad documents. Perhaps this was a transient situation with the Government servers that served up the mangled documents. Hopefully so, and It would be forgivable especially considering that It appears to be cleaned up at this point. We'll see.

Having the bit  of logic available that can serve to Validate XML is useful to prevent errors caused by bad XML documents being feed to code that expects a specific formatting.

So O.K. maybe It was all just a false alarm.