Monday, March 18, 2013

R - Simple Recursive XML Parsing

This is intended for those who are starting out in R and interested in parsing an XML document recursively. It uses DT Lang's XML package.

If you want to just read certain types of nodes, then XPATH is great. This document by DT Lang is perfect for that.

However, if you want to read the whole document, then you have to recursively visit every node. Here's the way I ended up doing it. The generic function visitNode could be useful if you are just starting out reading XML in R.

The full code, along with a sample XML file to test it is here.



2 comments:

  1. I tried testing this code with the following XML and upon processing the node '' an error is generated. The error is 'cannot index an internal node with a negative number -1'. How to solve this issue?



    Gambardella, Matthew
    Computer
    44.95
    2000-10-01



    Ralls, Kim
    Fantasy
    5.95
    2000-12-16


    ReplyDelete
    Replies
    1. The XML isn't being published in the comments so I'll just mention the specific element that I am referring to. The error I'm getting is when a node like shown below is visited.

      < description / >

      Delete