Skip to main content
 首页 » 编程设计

xml之将函数应用于R中的xmlNodeList(不是整个xml文件)

2025年12月25日23yyy_WW

我正在尝试使用R从XML文件中解析出信息。每个文件都可以包含模型记录,而我最终希望获得代表这些记录的对象列表。

this file为例,我打算应用一个函数来表示每个PubmedArticle下的节点。当我尝试使用xpathApply库中的XML进行此操作时,每个记录都包含文件中每个已发布文章的信息(而是将功能仅应用于给定PubmedArticle下的那些节点)。一个最小的例子来说明:

library(XML) 
library(RCurl) 
 
raw_record <- getURI("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?&db=pubmed&id=20203609,11959827,19409887&rettype=xml") 
parsed <- xmlTreeParse(raw_record, useInternalNodes=TRUE) 
 
get_title <- function(node) xpathApply(node, "//ArticleTitle", xmlValue) 
xpathApply(parsed, "/PubmedArticleSet/PubmedArticle", get_title) 
#[[1]] 
#[[1]][[1]] 
#[1] "Changes in Hox genes' structure and function during the evolution of the squamate body plan." 
# 
#[[1]][[2]] 
#[1] "Cdx1 and Cdx2 have overlapping functions in anteroposterior patterning and      
# posterior axis elongation." 
# 
#[[1]][[3]] 
#[1] "Axial patterning in snakes and caecilians: evidence for an alternative         interpretation of the Hox code." 
# 
# 
#[[2]] 
#[[2]][[1]] 
#[1] "Changes in Hox genes' structure and function during the evolution of the squamate     body plan." 
# 
#[[2]][[2]] 
#[1] "Cdx1 and Cdx2 have overlapping functions in anteroposterior patterning and posterior axis elongation." 
#[SNIP] 


仅从 xpathApplygetNodeSet创建的每个节点中提取信息的正确方法是什么?

请您参考如下方法:

您只想在get_title函数中使用相对路径,请尝试

get_title <- function(node) xpathApply(node, ".//ArticleTitle", xmlValue) 
titles<-xpathApply(parsed, "/PubmedArticleSet/PubmedArticle", get_title) 
unlist(titles) 


.//表示它将开始在当前节点下方的任何位置查找。这会给你

[1] "Changes in Hox genes' structure and function during the evolution of the squamate body plan."           
[2] "Cdx1 and Cdx2 have overlapping functions in anteroposterior patterning and posterior axis elongation."  
[3] "Axial patterning in snakes and caecilians: evidence for an alternative interpretation of the Hox code."