If you want to just read certain types of nodes, then XPATH is great. This document by DT Lang is perfect for that.
However, if you want to read the whole document, then you have to recursively visit every node. Here's the way I ended up doing it. The generic function visitNode could be useful if you are just starting out reading XML in R.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(XML) | |
#Recursive Function to visit the XML tree (depth first) | |
visitNode <- function(node) { | |
if (is.null(node)) { | |
#leaf node reached. Turn back | |
return() | |
} | |
print(paste("Node: ", xmlName(node))) | |
num.children = xmlSize(node) | |
if(num.children == 0 ) { | |
# Add your code to process the leaf node here | |
print( paste(" ", xmlValue(node))) | |
} | |
#Go one level deeper | |
for (i in 1 : num.children) { | |
visitNode(node[[i]]) #the i-th child of node | |
} | |
} | |
xmlfile <- "books.xml" | |
#read the XML tree into memory | |
xtree <- xmlInternalTreeParse(xmlfile) | |
root <- xmlRoot(xtree) | |
visitNode(root) |
I tried testing this code with the following XML and upon processing the node '' an error is generated. The error is 'cannot index an internal node with a negative number -1'. How to solve this issue?
ReplyDeleteGambardella, Matthew
Computer
44.95
2000-10-01
Ralls, Kim
Fantasy
5.95
2000-12-16
The XML isn't being published in the comments so I'll just mention the specific element that I am referring to. The error I'm getting is when a node like shown below is visited.
Delete< description / >