Thursday, April 17, 2008

WikiXMLDB: Querying Wikipedia with XQuery

Interesting times...

With all the benefits that Wikipedia promises, it is not easy to use it off-the-shelf in applications. While Wikipedia is available for download in an XML format, individual articles are formatted in a proprietary wiki format. So the most interesting uses of Wikipedia in applications are still locked behind the access troubles.

Here is where WikiXMLDB comes to the rescue. We have parsed the entire English Wikipedia content into XML representation (its total size is about 21GB), loaded it into Sedna and provided a query interface to it. Now you can dissect individual articles, rip out abstracts, sections, links, infoboxes and other components. Or you can combine pieces of existing documents into new XML documents and convert them to web pages with XSLT for example. And you can do it all using the standard W3C XQuery Language. So finally you can start enriching your content with data from Wikipedia and unlock its power for your applications.

WikiXMLDB demo is deployed on Amazon EC2 and runs on the virtual computer with restricted resources. To achieve better performance and do unlimited customization, you can run WikiXMLDB on your computer.

WikiXMLDB: Querying Wikipedia with XQuery

Post a Comment