⾃然语⾔数据标注⽅法(脚本)
本数据主要⽤于评估⾃然语⾔单词和程序语⾔API之间的相关性。
每⼀个配对中包含⼀个单词和API,如果两者之间相关性判定为相关则标注为1,如果判定为不相关则标注为0。
判断标准: 主要根据单词的含义和API包含的功能进⾏判断,如果API包含的功能涉及单词的含义,则可认为单词与API相关。
例如,对于名词“bean”,如果API涉及对于bean的操作或者含有bean的属性等则认为⼆者相关;对于动词“exchange”,如果API的功能中包含对数据进⾏接收和发送的动作等,则认为⼆者相关。
标注数据⽰例
根据word的单词,到对应句⼦API中,是否有意思相近的词,如果有相近意思单词,rel输出1,否则输出0。
案例主要将current意思相近的词,标注1,否则标注0
"current"近义词:["current","present","existing","recent","up-to-date","contemporary","present-day","modern","in progress","up to date","dated"]
import pandas as pd
# 查看api列中是否有word列的近义词
# 安装pandas包将csv⽂件与test.py放在同⼀⽬录下执⾏
data_map = {
# 要标注的词 load, load的近义词 get load read import
# 改改改改成相关要修改的近义词,
"current":["current","present","existing","recent","up-to-date","contemporary","present-day","modern","in progress","up to date","dated"], "agent":["agent","go-between","manager","negotiator","mediator","representative","proxy"],
"cache":["board","store","supply","accumulation","reserve","collection"],
"mode": ["mode", "pattern", "model"],
梦见别人请吃饭"message": ["message", "uri", "url", "trace", "print","get"]
}
# 改改改要标注的⽂件名
src_name = "18.csv"
尹正整容>依山尽# 标注完之后⽣成的⽂件名
你到底爱上谁target_name = "18answer.csv"
def find_rel(arr):
word = arr[0]
api = arr[1].upper()
rel = arr[2]
for word_alike in data_map[word]:
if word_alike.upper() in api:
return 1
return 0
df = pd.read_csv(src_name)
六一将发行葫芦兄弟邮票df["rel"] = df.apply(find_rel, axis=1)
<_csv(target_name, index=False, columns=["word", "API", "rel"])
python脚本要修改的地⽅,已标注成改改改tcl空调怎么调制热
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论