Pandas——(6)多个DataFrame的合并、连接、去重、替换@
⽬录
Pandas具有全功能的,⾼性能内存中连接操作,与SQL等关系数据库⾮常相似
<(left, right, how='inner', on=None, left_on=None, right_on=None,left_index=False, right_index=False, sort=True,suffixes=('_x', '_y'), copy=True, indicator=False)
.duplicated()
⼀、merge合并→类似excel的vlookup
1.1 参数on →参考键
网络连接被重设df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
df2 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
df1
df2
# left:第⼀个df
# right:第⼆个df
# on:参考键,参考键相同的⾏会合并
df = pd.merge(df1, df2, on='key')
df3 = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
df4 = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
# 多个链接键,必须key1和key2都相同才合并
(df3, df4, on=['key1','key2']))
1.2 参数how →合并⽅式
# inner:默认,取交集
<(df3, df4,on=['key1','key2'], how = 'inner')
# outer:取并集,数据缺失范围NaN
(df3, df4, on=['key1','key2'], how = 'outer'))
# left:按照df3的on参数为参考合并,数据缺失范围NaN
(df3, df4, on=['key1','key2'], how = 'left'))
# right:按照df4的on参数为参考合并,数据缺失范围NaN
(df3, df4, on=['key1','key2'], how = 'right'))
1.3 参数 left_on, right_on, left_index, right_index →当键不为⼀个列时,可以单独设置左键与右键
df1 = pd.DataFrame({'lkey':list('bbacaab'),
'data1':range(7)})
df2 = pd.DataFrame({'rkey':list('abd'),
'date2':range(3)})
(df1, df2, left_on='lkey', right_on='rkey'))
print('------')
# df1以‘lkey’为键,df2以‘rkey’为键
df1 = pd.DataFrame({'key':list('abcdfeg'),
'data1':range(7)})
df2 = pd.DataFrame({'date2':range(100,105)},
index = list('abcde'))
(df1, df2, left_on='key', right_index=True))
# df1以‘key’为键,df2以index为键
# left_index:为True时,第⼀个df以index为键,默认False
# right_index:为True时,第⼆个df以index为键,默认False
# 所以left_on, right_on, left_index, right_index可以相互组合:
# left_on + right_on, left_on + right_index, left_index + right_on, left_index + right_index
⼆、concat连接
s1 = pd.Series([1,2,3])
s2 = pd.Series([2,3,4])
at([s1,s2]))
print('-----')
# 默认axis=0,⾏+⾏
s3 = pd.Series([1,2,3],index = ['a','c','h'])
s4 = pd.Series([2,3,4],index = ['b','e','d'])
at([s3,s4]).sort_index())
at([s3,s4], axis=1))
print('-----')
# axis=1,列+列,成为⼀个Dataframe
# 重设index为默认的0~n
s_new = pd.concat([s3,s4], axis=1)
set_index(inplace=True, drop=False)# drop 是否把index列丢弃
s_new
三、duplicated去重
⽅法1
s = pd.Series([1,1,1,1,2,2,2,3,4,5,5,5,5])
print(s.duplicated())# 判断是否重复
print(s[s.duplicated() == False])# 通过布尔判断,得到不重复的值⽅法2
# drop.duplicates移除重复
# inplace参数:是否替换原值,默认False
s_re = s.drop_duplicates()
⽅法3
sq = s.unique()
四、replace替换
s = pd.Series(list('ascaazsd'))
place('a', np.nan))
place(['a','s'] ,np.nan))
place({'a':'hello world!','s':123}))
# 可⼀次性替换⼀个值或多个值
# 可传⼊列表或字典
打赏
码字不易,如果对您有帮助,就打赏⼀下吧O(∩_∩)O
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论