Not in index python ошибка

I have the following code,

df = pd.read_csv(CsvFileName)

p = df.pivot_table(index=['Hour'], columns='DOW', values='Changes', aggfunc=np.mean).round(0)
p.fillna(0, inplace=True)

p[["1Sun", "2Mon", "3Tue", "4Wed", "5Thu", "6Fri", "7Sat"]] = p[["1Sun", "2Mon", "3Tue", "4Wed", "5Thu", "6Fri", "7Sat"]].astype(int)

It has always been working until the csv file doesn’t have enough coverage (of all week days). For e.g., with the following .csv file,

DOW,Hour,Changes
4Wed,01,237
3Tue,07,2533
1Sun,01,240
3Tue,12,4407
1Sun,09,2204
1Sun,01,240
1Sun,01,241
1Sun,01,241
3Tue,11,662
4Wed,01,4
2Mon,18,4737
1Sun,15,240
2Mon,02,4
6Fri,01,1
1Sun,01,240
2Mon,19,2300
2Mon,19,2532

I’ll get the following error:

KeyError: "['5Thu' '7Sat'] not in index"

It seems to have a very easy fix, but I’m just too new to Python to know how to fix it.

The KeyError in Pandas and scikit-learn can be a frustrating issue to solve, as it often stems from the fact that the index labels in the DataFrame or Series objects being used in the code do not match up. This can lead to unexpected results and errors in your code. However, there are several methods to fix the KeyError: not in index, and get your code running smoothly again.

Method 1: Check index labels

To fix the KeyError: [....] not in index error in Pandas and scikit-learn, you can check the index labels using the isin() method. Here are the steps to do it:

  1. First, import the necessary libraries:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
  1. Load the data into a Pandas DataFrame:
df = pd.read_csv('data.csv')
  1. Split the data into training and testing sets:
X_train, X_test, y_train, y_test = train_test_split(df[['feature1', 'feature2', 'feature3']], df['target'], test_size=0.2, random_state=42)
  1. Fit a linear regression model to the training data:
model = LinearRegression()
model.fit(X_train, y_train)
  1. Predict the target values for the testing data:
y_pred = model.predict(X_test)
  1. Check the index labels of the testing data using the isin() method:
missing_labels = y_test[~y_test.index.isin(y_pred.index)]
print(missing_labels)

This will print out any index labels that are in y_test but not in y_pred. You can then investigate why these labels are missing and take appropriate action.

Note that the isin() method returns a Boolean mask indicating whether each index label is in the specified index or not. The tilde (~) operator is used to invert this mask, so that we get the labels that are not in the index.

Method 2: Reindex data

If you encounter a KeyError in Pandas and Scikit-Learn, it means that the key you are trying to access is not in the index. One way to fix this is by reindexing the data. Here’s how you can do it in Python:

Step 1: Import the necessary libraries

import pandas as pd
from sklearn.model_selection import train_test_split

Step 2: Load the data

df = pd.read_csv("data.csv")

Step 3: Split the data into training and testing sets

X = df.drop("target", axis=1)
y = df["target"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 4: Reindex the data

X_train = X_train.reindex(columns=X.columns)
X_test = X_test.reindex(columns=X.columns)

Step 5: Train your model

from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)

Step 6: Test your model

y_pred = model.predict(X_test)

By reindexing the data, you ensure that all the columns in the training and testing sets are aligned. This helps prevent KeyError when accessing the data.

Method 3: Use .loc or .iloc

To fix the KeyError in Pandas and scikit-learn, you can use the .loc or .iloc methods. Here’s how to use them:

Step 1: Import the necessary libraries and load your data into a pandas DataFrame.

import pandas as pd
from sklearn.model_selection import train_test_split

data = pd.read_csv('data.csv')

Step 2: Split your data into training and testing sets.

X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.2)

Step 3: Use .loc or .iloc to access the data.

X_train.loc[:, ['column1', 'column2']]

X_train.iloc[:, [0, 1]]

Step 4: If you encounter a KeyError, check if the column or index you are trying to access exists in your DataFrame.

'column1' in X_train.columns

0 in X_train.index

Example:

import pandas as pd
from sklearn.model_selection import train_test_split

data = pd.read_csv('data.csv')

X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.2)

X_train.loc[:, ['column1', 'column2']]

X_train.iloc[:, [0, 1]]

'column1' in X_train.columns

0 in X_train.index

In summary, you can use .loc or .iloc to access your data and check if the column or index you are trying to access exists in your DataFrame to fix the KeyError.

Method 4: Verify data type

One possible solution to fix the «KeyError: [….] not in index» error in Pandas and scikit-learn is to verify the data type of the column or feature that is causing the error. Here are the steps to do that:

Step 1: Identify the column or feature that is causing the error. For example, let’s say the error message is «KeyError: ‘age’ not in index», which means the ‘age’ column is causing the error.

Step 2: Check the data type of the ‘age’ column using the Pandas ‘dtypes’ attribute. For example:

import pandas as pd

df = pd.read_csv('data.csv')
print(df.dtypes)

This will print the data types of all columns in the DataFrame. Look for the ‘age’ column and make sure it has the expected data type (e.g. integer, float, etc.).

Step 3: If the data type of the ‘age’ column is not what you expected, you can use the Pandas ‘astype’ method to convert it to the desired data type. For example:

df['age'] = df['age'].astype(int)

This will convert the ‘age’ column to integer data type. Make sure to assign the result back to the DataFrame.

Step 4: If the data type of the ‘age’ column is correct, but you still get the error, you can try converting the DataFrame to a NumPy array and then back to a DataFrame using the correct data type. For example:

import numpy as np

X = df.values.astype(np.int32)
df = pd.DataFrame(X, columns=df.columns)

This will convert the entire DataFrame to an array of integers and then back to a DataFrame with the correct data type.

By following these steps, you should be able to fix the «KeyError: [….] not in index» error in Pandas and scikit-learn by verifying the data type of the problematic column or feature.

Method 5: Handle missing values

To handle missing values in Pandas, you can use the fillna() method. This method replaces missing values with a specified value or method. For example, you can replace missing values with the mean value of the column:

import pandas as pd

df = pd.DataFrame({'A': [1, 2, np.nan, 4], 'B': [5, np.nan, 7, 8]})

df.fillna(df.mean(), inplace=True)

To handle missing values in scikit-learn, you can use the SimpleImputer class. This class replaces missing values with a specified value or method. For example, you can replace missing values with the mean value of the column:

from sklearn.impute import SimpleImputer

imputer = SimpleImputer(strategy='mean')

imputer.fit(X)

X_imputed = imputer.transform(X)

If you are getting a KeyError in Pandas or scikit-learn, it may be because the column or feature you are trying to access is missing. In this case, you can use the drop() method to remove the missing column or feature:

df.drop('missing_column', axis=1, inplace=True)

X = np.delete(X, missing_feature_index, axis=1)

Alternatively, you can use the dropna() method to remove rows with missing values:

df.dropna(inplace=True)

X = X[~np.isnan(X).any(axis=1)]

Solving Keyerror exceptions in pandas

Most probably the reason you are getting a KeyError exception when working with a pandas DataFrame is that you have a typo in a column or row label name. When in doubt, make sure to check the correct column name using the following commands below:

print( your_df.columns) # for columns
print(your_df.index) # for row indexes

Define an Example DataFrame

Let’s start by creating a very simple DataFrame that you can use to follow along this tutorial. Feel free to use the following snippet in your Jupyter notebook, or Python script:

import pandas as pd
month = ['November', 'March', 'December']
language = ['Javascript', 'R', 'Java']
office = ['New York', 'New York', 'Los Angeles']
salary = [155.0, 137.0, 189.0]
hiring = dict(month=month, language = language, salary = salary)
hrdf = pd.DataFrame(data=hiring

Key error not found in axis exception

Let’s assume that we would like to drop one or more columns from ours DataFrame. We’ll purposely make a spelling mistake in the column name – instead of salary we’ll write salaries.

hrdf.drop('salaries')

Pandas will throw the following exception:

KeyError: "['salaries'] not found in axis"

Reason is simple: we have a typo in the column name. If in doubt about your column label value, simply use the columns() property:

print( hrdf.columns) 

This will return:

Index(['month', 'language', 'salary'], dtype='object')

All we need now is to fix the column name:

hrdf.drop('salary')

Key error not in index pandas

Another very similar error happens when we try to subset columns or rows from our DataFrame, and accidentally have a typo in one or more of our row or column label names. In the following example we would like to select a couple of columns from our DataFrame:

subset = hrdf[['language', 'salaries']]

This returns an exception. Fixing the typo will do the trick.

subset = hrdf[['language', 'salary']]

I have a dataframe called delivery and when I print(delivery.columns) I get the following:

Index(['Complemento_endereço', 'cnpj', 'Data_fundação', 'Número',
   'Razão_social', 'CEP', 'situacao_cadastral', 'situacao_especial', 'Rua',
   'Nome_Fantasia', 'last_revenue_normalized', 'last_revenue_year',
   'Telefone', 'email', 'Capital_Social', 'Cidade', 'Estado',
   'Razão_social', 'name_bairro', 'Natureza_Jurídica', 'CNAE', '#CNAE',
   'CNAEs_secundários', 'Pessoas', 'percent'],
  dtype='object')

Well, we can clearly see that there is a column ‘Rua’.

Also, if I print(delivery.Rua) I get a proper result:

82671                         R JUDITE MELO DOS SANTOS
817797                                R DOS GUAJAJARAS
180081           AV MARCOS PENTEADO DE ULHOA RODRIGUES
149373                                 AL MARIA TEREZA
455511                               AV RANGEL PESTANA
...

Even if I write «if ‘Rua’ in delivery.columns: print(‘here I am’)» it does print the ‘here I am’. So ‘Rua’ is in fact there.

Well, in the immediate line after I have this code:

delivery=delivery.set_index('cnpj')[['Razão_social','Nome_Fantasia','Data_fundação','CEP','Estado','Cidade','Bairro','Rua','Número','Complemento_endereço','Telefone','email','Capital_Social', 'CNAE', '#CNAE', 'Natureza_Jurídica','Pessoas' ]]

And voilá, I get this weird error:

Traceback (most recent call last):
File "/file.py", line 45, in <module>
   'Telefone', 'email', 'Capital_Social', 'Cidade', 'Estado',
   'Razão_social', 'name_bairro', 'Natureza_Jurídica', 'CNAE', '#CNAE',
'Telefone','email','Capital_Social', 'CNAE', '#CNAE', 'Natureza_Jurídica','Pessoas' ]]
   'CNAEs_secundários', 'Pessoas', 'percent'],
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py", line 1991, in __getitem__
  dtype='object')
return self._getitem_array(key)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py", line 2035, in _getitem_array
indexer = self.ix._convert_to_indexer(key, axis=1)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/indexing.py", line 1214, in _convert_to_indexer
raise KeyError('%s not in index' % objarr[mask])
KeyError: "['Rua'] not in index"

Can someone help? I tried stackoverflow but no one could help. I’m starting to think I’m crazy and ‘Rua’ is an illusion of my troubled mind.

ADDITIONAL INFO

I’m using this code right before the error line:

delivery=pd.DataFrame()

for i in selection.index:
    sample=groups.get_group(selection['#CNAE'].loc[i]).sample(selection['samples'].loc[i])
    delivery=pd.concat((delivery,sample)).sort_values('Capital_Social',ascending=False)


print(delivery.columns)
print(delivery.Rua)
print(delivery.set_index('cnpj').columns)

delivery=delivery.set_index('cnpj')[['Razão_social','Nome_Fantasia','Data_fundação','CEP','Estado','Cidade','Bairro','Rua','Número','Complemento_endereço',
                                 'Telefone','email','Capital_Social', 'CNAE', '#CNAE', 'Natureza_Jurídica','Pessoas' ]]

EDIT

New weird stuff:
I gave up and deleted ‘Rua’ from that last piece of code, wishing that it would work. For my surprise, I had the same problem but now with the column ‘Número’.

delivery=delivery.set_index('cnpj')[['Razão_social','Nome_Fantasia','Data_fundação','CEP','Estado','Cidade','Bairro','Número','Complemento_endereço',
                                                 'Telefone','email','Capital_Social', 'CNAE', '#CNAE', 'Natureza_Jurídica' ]]

KeyError: "['Número'] not in index"

EDIT 2

And then I gave up on ‘Número’ and took it out. Then the same problem happened with ‘Complemento_endereço’. Then I deleted ‘Complemento_endereço’. And it happend to ‘Telefone’ and so on.

** EDIT 3 **

If I do a pd.show_versions(), that’s the output:

INSTALLED VERSIONS

commit: None
python: 3.5.0.final.0
python-bits: 64
OS: Darwin
OS-release: 16.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 18.2
Cython: None
numpy: 1.11.0
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: None
sphinx: None
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.3
pymysql: 0.7.11.None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
None

When working with Pandas and scikit-learn, you might encounter a KeyError with the message “[….] not in index”. This error occurs when you are trying to access a column or row that does not exist in your dataset. In this guide, we will explore some common scenarios where this error occurs and how to fix it.

Scenario 1: Trying to access a non-existent column

Suppose you have a dataset called “df” with columns “A”, “B”, and “C”. You are trying to access column “D” using the following code:

df["D"]

This will result in the following error message:

KeyError: 'D'

To fix this error, you need to make sure that the column “D” exists in your dataset. You can check the columns of your dataset using the following code:

print(df.columns)

Scenario 2: Trying to access a non-existent row

Suppose you have a dataset called “df” with rows indexed by dates. You are trying to access the row for the date “2022-01-01” using the following code:

df.loc["2022-01-01"]

This will result in the following error message:

KeyError: '2022-01-01'

To fix this error, you need to make sure that the row for the date “2022-01-01” exists in your dataset. You can check the index of your dataset using the following code:

print(df.index)

Scenario 3: Trying to fit a model using non-existent features

Suppose you have a dataset called “df” with columns “A”, “B”, and “C”. You are trying to fit a model using the features “A”, “B”, “C”, and “D” using the following code:

X = df[["A", "B", "C", "D"]]
y = df["target"]
model.fit(X, y)

This will result in the following error message:

KeyError: 'D'

To fix this error, you need to make sure that the feature “D” exists in your dataset. You can check the columns of your dataset using the following code:

print(df.columns)

Conclusion

In this guide, we explored some common scenarios where the KeyError with the message “[….] not in index” occurs and how to fix it. Always make sure that the column or row you are trying to access exists in your dataset before accessing it. Also, make sure that the features you are trying to fit a model with exist in your dataset. Happy coding!

Понравилась статья? Поделить с друзьями:
  • Nox google play ошибка
  • Not found pid process igmpx ошибка
  • Normal mode ford focus 3 ошибка при запуске
  • Not enough input arguments matlab ошибка
  • Normal mode focus 3 что означает ошибка