Over the weekend, I've implemented an importer for the Penn Treebank format. All it does is read Penn Treebank data and transform it to CREATE OBJECT statements (well, CREATE OBJECTS WITH OBJECT TYPE statements, actually). This MQL can then be imported into Emdros via the mql(1) program.

The importer works fine on both the BLLIP corpus and the Penn version of the TIGER Corpus. I haven't had a chance to test it with "the real thing" (aka the Penn Treebank) yet. If anyone has access to "the real thing" and want to test the importer on that corpus, please drop me a line.

The importer recognizes and resolves coreference links, as well as splitting "NP-SUBJ" into "type" (NP) and "function" (SUBJ). Even unparsed (but POS-tagged) sentences are imported correctly.

This will make its debut in the next public release after 1.2.0.pre191.