Pandas统计重复的列里面的值方法_Python

Pandas统计重复的列里面的值方法

2021-05-24 00:31耗子来啦 Python

今天小编就为大家分享一篇Pandas统计重复的列里面的值方法，具有很好的参考价值，希望对大家有所帮助。一起跟随小编过来看看吧

pandas

代码如下:

				?

									import pandas as pd

									import numpy as np

									salaries = pd.DataFrame({

									 'name': ['BOSS', 'Lilei', 'Lilei', 'Han', 'BOSS', 'BOSS', 'Han', 'BOSS'],

									 'Year': [2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017],

									 'Salary': [1, 2, 3, 4, 5, 6, 7, 8],

									 'Bonus': [2, 2, 2, 2, 3, 4, 5, 6]

									})

									print(salaries)

									print(salaries['Bonus'].duplicated(keep='first'))

									print(salaries[salaries['Bonus'].duplicated(keep='first')].index)

									print(salaries[salaries['Bonus'].duplicated(keep='first')])

									print(salaries['Bonus'].duplicated(keep='last'))

									print(salaries[salaries['Bonus'].duplicated(keep='last')].index)

									print(salaries[salaries['Bonus'].duplicated(keep='last')])

输出如下：

				?

									Bonus Salary Year name

									0  2  1 2016 BOSS

									1  2  2 2016 Lilei

									2  2  3 2016 Lilei

									3  2  4 2016 Han

									4  3  5 2017 BOSS

									5  4  6 2017 BOSS

									6  5  7 2017 Han

									7  6  8 2017 BOSS

									0 False

									1  True

									2  True

									3  True

									4 False

									5 False

									6 False

									7 False

									Name: Bonus, dtype: bool

									Int64Index([1, 2, 3], dtype='int64')

									 Bonus Salary Year name

									1  2  2 2016 Lilei

									2  2  3 2016 Lilei

									3  2  4 2016 Han

									0  True

									1  True

									2  True

									3 False

									4 False

									5 False

									6 False

									7 False

									Name: Bonus, dtype: bool

									Int64Index([0, 1, 2], dtype='int64')

									 Bonus Salary Year name

									0  2  1 2016 BOSS

									1  2  2 2016 Lilei

									2  2  3 2016 Lilei

非pandas

对于如nunpy中的这些操作主要如下:

假设有数组

a = np.array([1, 2, 1, 3, 3, 3, 0])

想找出 [1 3]

则有

				?

									方法1

									m = np.zeros_like(a, dtype=bool)

									m[np.unique(a, return_index=True)[1]] = True

									a[~m]

				?

									方法2

									a[~np.in1d(np.arange(len(a)), np.unique(a, return_index=True)[1], assume_unique=True)]

				?

									方法3

									np.setxor1d(a, np.unique(a), assume_unique=True)

				?

									方法4

									u, i = np.unique(a, return_inverse=True)

									u[np.bincount(i) > 1]

				?

									方法5

									s = np.sort(a, axis=None)

									s[:-1][s[1:] == s[:-1]]

参考：https://stackoverflow.com/questions/11528078/determining-duplicate-values-in-an-array

以上这篇Pandas统计重复的列里面的值方法就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持服务器之家。

原文链接：https://blog.csdn.net/hguo11/article/details/82556171

Pandas统计重复的列里面的值方法

延伸 · 阅读

Python实现ping指定IP的示例

python直接访问私有属性的简单方法

使用NumPy和pandas对CSV文件进行写操作的实例

Python3以GitHub为例来实现模拟登录和爬取的实例讲解

在Windows系统上搭建Nginx+Python+MySQL环境的教程

python 列表转为字典的两个小方法(小结)

python 插入Null值数据到Postgresql的操作

Python的dict字典结构操作方法学习笔记

PyCharm设置SSH远程调试的方法

Python安装图文教程 Pycharm安装教程

python是什么意思？python有什么用？

使用Python抓取模板之家的CSS模板

Python 列表(List)操作方法详解