背景:
读一个文件中的数据,用来训练一个小模型,发现数据中有异常值,如下:
使用pandas读数据,然后对数值类型特征,进行归一化,报错:
def minmax_norm(df):return (df - df.min()) / (df.max() - df.min())if __name__=='__main__':train_data_path = 'train_1205_shanghai.txt'test_data_path = 'test_1206_shanghai.txt'# load_data_to_df(path)col_name = ['a','b','c']train_data = pd.read_table(train_data_path, header=None)train_data.columns = col_nametest_data = pd.read_table(test_data_path, header=None)test_data.columns = col_name# print(data.head(3)) 'avg_rider_done_ord_cnt'number_feat = ['a','b']for i in range(len(number_feat)):train_data[[number_feat[i]]] = minmax_norm(train_data[[number_feat[i]]])test_data[[number_feat[i]]] = minmax_norm(test_data[[number_feat[i]]])
报错:
Traceback (most recent call last):File "/Users/alsc/.conda/envs/algoTest/lib/python3.6/site-packages/pandas/core/ops/array_ops.py", line 143, in na_arithmetic_opresult = expressions.evaluate(op, left, right)File "/Users/alsc/.conda/envs/algoTest/lib/python3.6/site-packages/pandas/core/computation/expressions.py", line 233, in evaluatereturn _evaluate(op, op_str, a, b)# type: ignoreFile "/Users/alsc/.conda/envs/algoTest/lib/python3.6/site-packages/pandas/core/computation/expressions.py", line 68, in _evaluate_standardreturn op(a, b)TypeError: unsupported operand type(s) for -: 'str' and 'float'
排查:
这个错误的意思:类型有错误,不能将str和float类型的数据进行相减‘-’。
划重点,终于知道了【for -:】是什么意思,就是在减号处,出现了类型不匹配的问题,想修复问题,就去减号附近看看有没有涉及到不同类型计算的。
解决:
删除数据中的异常值。