我有两个不同的 data.frames“字符串”和“关键字”,其中包含单列,如下所述。 “字符串”有 50000 行,“关键字”有 10000 行。
String
#I love New York
#Live in Los Angeles
#He stays in Yorkshire
#Condo in Lowell
# ...
Keywords
#Ohio
#Montreal
#Los Vego
#York
#New York
#Lowell
#...
结果应存储在包含“String”和“Result”列的数据框中,如下所示
Result
# String Result
# I love New York New York
# Live in Los Angeles NA
# He stays in Yorkshire York
# Condo in Lowell Lowell
字符串匹配应该准确,但可以不区分大小写。
请您参考如下方法:
我认为这不是最理想的解决方案,但它确实有效:
stringFrame <- data.frame(String = c("I love New York","Live in Los Angeles","He stays in Yorkshire","Condo in Lowell"),
stringsAsFactors = FALSE)
wordFrame <- data.frame(Keywords = c("Ohio","Montreal","Los Vego","York","New York","Lowell"),
stringsAsFactors = FALSE)
result <- stringFrame
for (i in 1:dim(result)[1]){
string = result[i,"String"]
temp = ""
for (word in wordFrame$Keywords){
if (grepl(word,string,ignore.case=TRUE)){
if (nchar(word) > nchar(temp)){
result[i,"Result"] <- word
temp <- word
}
}
}
}
我在标题中看到您正在寻找最长的单词,所以我更新了答案。现在你总会得到
String Result
I love New York New York