Unknown label type continuous ошибка

17 авг. 2022 г.
читать 1 мин

Одна распространенная ошибка, с которой вы можете столкнуться в Python:

ValueError : Unknown label type: 'continuous'

Эта ошибка обычно возникает, когда вы пытаетесь использовать sklearn для соответствия модели классификации, такой как логистическая регрессия , и значения, которые вы используете для переменной ответа, являются непрерывными, а не категориальными.

В следующем примере показано, как использовать этот синтаксис на практике.

Как воспроизвести ошибку

Предположим, мы пытаемся использовать следующий код для соответствия модели логистической регрессии:

import numpy as np
from sklearn. linear_model import LogisticRegression

#define values for predictor and response variables
x = np.array([[2, 2, 3], [3, 4, 3], [5, 6, 6], [7, 5, 5]])
y = np.array([0, 1.02, 1.02, 0])

#attempt to fit logistic regression model
classifier = LogisticRegression()
classifier. fit (x, y)

ValueError : Unknown label type: 'continuous'

Мы получаем ошибку, потому что в настоящее время значения для нашей переменной ответа непрерывны.

Напомним, что модель логистической регрессии требует, чтобы значения переменной ответа были категориальными , например:

0 или 1
«Да или нет»
«Пройдено» или «Не пройдено»

В настоящее время наша переменная ответа содержит непрерывные значения, такие как 0 и 1,02 .

Как исправить ошибку

Способ устранения этой ошибки — просто преобразовать непрерывные значения переменной ответа в категориальные значения с помощью функции LabelEncoder() из sklearn :

from sklearn import preprocessing
from sklearn import utils

#convert y values to categorical values
lab = preprocessing. LabelEncoder ()
y_transformed = lab. fit_transform (y)

#view transformed values
print(y_transformed)

[0 1 1 0]

Каждое из исходных значений теперь кодируется как 0 или 1 .

Теперь мы можем подобрать модель логистической регрессии:

#fit logistic regression model
classifier = LogisticRegression()
classifier. fit (x, y_transformed)

На этот раз мы не получаем никакой ошибки, потому что значения ответа для модели являются категориальными.

Дополнительные ресурсы

В следующих руководствах объясняется, как исправить другие распространенные ошибки в Python:

Как исправить: ValueError: Индекс содержит повторяющиеся записи, не может изменить форму
Как исправить: ошибка типа: ожидаемая строка или байтовый объект
Как исправить: TypeError: объект ‘numpy.float64’ не вызывается

Источник

There are two types of supervised learning algorithms, regression and classification. Classification problems require categorical or discrete response variables (y variable). If you try to train a scikit-learn imported classification model with a continuous variable, you will encounter the error ValueError: Unknown label type: ‘continuous’.

To solve this error, you can encode the continuous y variable into categories using Scikit-learn’s preprocessing.LabelEncoder or if it is a regression problem use a regression model suitable for the data.

This tutorial will go through the error in detail and how to solve it with code examples.

ValueError: Unknown label type: ‘continuous’
- What Does Continuous Mean?
- What is the Difference Between Regression and Classification?
Example #1: Evaluating the Data
- Solution
Example #2: Evaluating the Model
- Solution
Summary

ValueError: Unknown label type: ‘continuous’

In Python, a value is a piece of information stored within a particular object. You will encounter a ValueError in Python when you use a built-in operation or function that receives an argument with the right type but an inappropriate value. In this case, the y variable data has continuous values instead of discrete or categorical values.

What Does Continuous Mean?

There are two categories of data:

Discrete data: categorical data, for example, True/False, Pass/Fail, 0/1 or count data, for example, number of students in a class.
Continuous data: Data that we can measure on an infinite scale; it can take any value between two numbers, no matter how small. For example, the length of a string can be 1.00245 centimetres.

However, you cannot have 1.5 of a student in a class; count is a discrete measure. Measures of time, height, and temperature are all examples of continuous data.

What is the Difference Between Regression and Classification?

We can classify supervised learning algorithms into two types: Regression and Classification. For regression, the response variable or label is continuous, for example, weight, height, price, or time. In each case, a regression model seeks to predict a continuous quantity.

For classification, the response variable or label is categorical, for example, Pass or Fail, True or False. A classification model seeks to predict a class label.

Example #1: Evaluating the Data

Let’s look at an example of training a Logistic Regression model to perform classification on arrays of integers. First, let’s look at the data. We will import numpy to create our explanatory variable data X and our response variable data y. Note that the data used here has no real relationship and is only for explaining purposes.

import numpy as np

# Values for Predictor and Response variables
X = np.array([[2, 4, 1, 7], [3, 5, 9, 1], [5, 7, 1, 2], [7, 4, 2, 8], [4, 2, 3, 8]])
y = np.array([0, 1.02, 1.02, 0, 0])

Next, we will import the LogisticRegression class and create an object of this class, our logistic regression model. We will then fit the model using the values for the predictor and response variables.

from sklearn.linear_model import LogisticRegression

# Attempt to fit Logistic Regression Model
cls = LogisticRegression()
cls.fit(X, y)

Let’s run the code to see what happens:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-556cca8758bd> in <module>
      3 # Attempt to fit Logistic Regression Model
      4 cls = LogisticRegression()
----> 5 cls.fit(X, y)

~/opt/anaconda3/lib/python3.8/site-packages/sklearn/linear_model/_logistic.py in fit(self, X, y, sample_weight)
   1514             accept_large_sparse=solver not in ["liblinear", "sag", "saga"],
   1515         )
-> 1516         check_classification_targets(y)
   1517         self.classes_ = np.unique(y)
   1518 

~/opt/anaconda3/lib/python3.8/site-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
    195         "multilabel-sequences",
    196     ]:
--> 197         raise ValueError("Unknown label type: %r" % y_type)
    198 
    199 

ValueError: Unknown label type: 'continuous'

The error occurs because logistic regression is a classification problem that requires the values of the response variable to be categorical or discrete such as: “Yes” or “No”, “True” or “False”, 0 or 1. In the above code, our response variable values contain continuous values 1.02.

Solution

To solve this error, we can convert the continuous values of the response variable y to categorical values using the LabelEncoder class under sklearn.preprocessing. Let’s look at the revised code:

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn import preprocessing

# Values for Predictor and Response variables
X = np.array([[2, 4, 1, 7], [3, 5, 9, 1], [5, 7, 1, 2], [7, 4, 2, 8], [4, 2, 3, 8]])

y = np.array([0, 1.02, 1.02, 0, 0])

# Create label encoder object
labels = preprocessing.LabelEncoder()

# Convert continous y values to categorical
y_cat = labels.fit_transform(y)

print(y_cat)

[0 1 1 0 0]

We have encoded the original values as 0 or 1. Now, we can fit the logistic regression model and perform a prediction on test data:

# Attempt to fit Logistic Regression Model
cls = LogisticRegression()
cls.fit(X, y_cat)

X_pred = np.array([5, 6, 9, 1])

X_pred = X_pred.reshape(1, -1)

y_pred = cls.predict(X_pred)

print(y_pred)

Let’s run the code to get the result:

[1]

We successfully fit the model and used it to predict unseen data.

Example #2: Evaluating the Model

Let’s look at an example where we want to train a k-Nearest Neighbours classifier to fit on some data. The data, which we will store in a file called regression_data.csv looks like this:

Avg.Session Length,TimeonApp,TimeonWebsite,LengthofMembership,Yearly Amount Spent
34.497268,12.655651,39.577668,4.082621,587.951054
31.926272,11.109461,37.268959,2.664034,392.204933
33.000915,11.330278,37.110597,4.104543,487.547505
34.305557,13.717514,36.721283,3.120179,581.852344
33.330673,12.795189,37.536653,4.446308,599.406092
33.871038,12.026925,34.476878,5.493507,637.102448
32.021596,11.366348,36.683776,4.685017,521.572175

Next, we will import the data into a DataFrame. We will define four columns as the explanatory variables and the last column as the response variable. Then, we will split the data into training and test data:

import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.read_csv('regression_data.csv')

X = df[['Avg.Session Length', 'TimeonApp','TimeonWebsite', 'LengthofMembership']]

y = df['Yearly Amount Spent']

 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

Next, we will define a KNeighborsClassifier model and fit to the data:

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=1)

knn.fit(X_train,y_train)

Let’s run the code to see what happens:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-889312abc571> in <module>
----> 1 knn.fit(X_train,y_train)

~/opt/anaconda3/lib/python3.8/site-packages/sklearn/neighbors/_classification.py in fit(self, X, y)
    196         self.weights = _check_weights(self.weights)
    197 
--> 198         return self._fit(X, y)
    199 
    200     def predict(self, X):

~/opt/anaconda3/lib/python3.8/site-packages/sklearn/neighbors/_base.py in _fit(self, X, y)
    418                     self.outputs_2d_ = True
    419 
--> 420                 check_classification_targets(y)
    421                 self.classes_ = []
    422                 self._y = np.empty(y.shape, dtype=int)

~/opt/anaconda3/lib/python3.8/site-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
    195         "multilabel-sequences",
    196     ]:
--> 197         raise ValueError("Unknown label type: %r" % y_type)
    198 
    199 

ValueError: Unknown label type: 'continuous'

The error occurs because the k-nearest neighbors classifier is a classification algorithm and therefore requires categorical data for the response variable. The data we provide in the df['Yearly Amount Spent'] series is continuous.

Solution

We can interpret this problem as a regression problem, not a classification problem because the response variable is continuous and it is not intuitive to encode “Length of membership” into categories. We need to use the regression algorithm KNeighborsRegressor instead of KNeighborsClassifier to solve this error. Let’s look at the revised code:

from sklearn.neighbors import KNeighborsRegressor

knn = KNeighborsRegressor(n_neighbors=1)

knn.fit(X_train,y_train)

Once we have fit to the data we can get our predictions with the test data.

y_pred = knn.predict(X_test)
print(y_pred)

Let’s run the code to see the result:

[599.406092 487.547505 521.572175]

We successfully predicted three “Yearly Amount Spent” values for the test data.

Summary

Congratulations on reading to the end of this tutorial! The ValueError: Unknown label type: ‘continuous’ occurs when you try to use continuous values for your response variable in a classification problem. Classification requires categorical or discrete values of the response variable. To solve this error, you can re-evaluate the response variable data and encode it to categorical. Alternatively, you can re-evaluate the model and use a regression model instead of a classification model.

Although “regression” is in the name, logistic regression is a classification algorithm that attempts to classify observations from a dataset into discrete categories. Whenever you want to perform logistic regression, ensure the response variable data is categorical.

For further reading on Scikit-learn, go to the article: How to Solve Python ValueError: input contains nan, infinity or a value too large for dtype(‘float64’).

Go to the online courses page on Python to learn more about coding in Python for data science and machine learning.

Have fun and happy researching!

Источник

Unknown label type continuous means that you’re trying to use a classifier with a continuous target variable, and it’s among the Python ValueError that occurs when you’re using scikit-learn (sklearn), the machine learning Python package. This article will show you code examples that will reproduce this error and the related ones that you can encounter if your project is using scikit-learn.

We know that you’re here because you want to solve this error quickly; we can assure you that we’ll do our best not to disappoint. With that said, open your favorite integrated development environment, such as VS Code, and let’s start.

Contents

Why do “Unknown” Label Types and Related Errors Occur?
- – A Continuous Target Variable Cannot Work With the Classifier
- – The Classifiers Are Not Suitable for Multi-Output Regression
- – You’re Using “Linearregression” To Predict Discrete Values
How To Fix Sklearn “Unknown” Label Types and Related Errors?
- – Use an Encoder To Convert Continuous To Categorical Values
- – Discretize the Continuous Labels Into Different Classes
- – Use Linear Regression Classifier for Continuous Labels
- – Use Suitable Classifiers for Multi-Output Regression
- – Use Suitable Classification Algorithms for Discrete Values
Conclusion

Why do “Unknown” Label Types and Related Errors Occur?

“Unknown” label type and related errors occurred in your code because a “continuous” target variable is not suitable for the classifier that you’re using. You’ll also get an error if you’re using a classifier that’s not suitable for multi-output regression or discrete class labels.

– A Continuous Target Variable Cannot Work With the Classifier

If you’re using a classifier that’s not suited for a continuous target variable, you’ll get a “ValueError” for an “unknown” label. For example, in the following code, we’re using “LogisticRegression” from sklearn in an attempt to model a linear regression.

However, a “ValueError” will occur because the target variable, “y” contains continuous values (the decimal values).

import numpy as np

from sklearn.linear_model import LogisticRegression

# Define variables.

X = np.array([[3, 1, 3], [4, 4, 3], [3, 6, 1], [4, 3, 3]])

y = np.array([0, 1.02, 1.03, 0])

# Try and fit the logistic regression model

LR_classifier = LogisticRegression()

LR_classifier.fit(X, y)

The same error will occur in the following code, but this time we’re using a support vector machine (SVM). Here, the code will lead to the “ValueError: unknown label type: ‘continuous’ SVM” error message because SVM is designed to work with discrete labels that represent different classes. That means it will not work with continuous values/labels of variable “y” in the code.

from sklearn import svm

# Sample data

X = [[1, 2], [3, 4], [5, 6]]

y = [2.5, 4.2, 6.7] # Continuous labels

# Create an SVM classifier

clf = svm.SVC()

# Attempt to train the classifier will result

# in a “ValueError” and Part of the error message

# will read “**raise valueerror(“unknown label type: %r” % y_type)**”

clf.fit(X, y)

– The Classifiers Are Not Suitable for Multi-Output Regression

An attempt to use a classifier that’s not suitable for multi-output regression can also lead to “ValueError” because you’re asking it to do what it’s not designed for. For example, in the following code, we generated a synthetic dataset with “make_regression” where “n_targets” is set to three, indicating a multi-output scenario.

Then, we attempt to fit a “DecisionTreeClassifier” to the data. Running this code will result in a “ValueError: unknown label type ‘continuous-multioutput’ Decision Tree” error.

# Generate the continuous-multioutput.

from sklearn.datasets import make_regression

from sklearn.tree import DecisionTreeClassifier

# Generate a regression dataset with

# continuous target variable.

X, y = make_regression(n_samples=100, n_features=1, n_targets=3, noise=0.1)

# Create and train a DecisionTreeClassifier

DT_classifier = DecisionTreeClassifier()

DT_classifier.fit(X, y)

Also, in the following, we are using “make_regression” to generate a synthetic dataset with 100 samples, 10 features, and three target variables. However, we’re using “RandomForestClassifier” which cannot handle multi-output regression problems because it’s designed for classification tasks with discrete target variables.

As a result, running the code leads to the “ValueError: unknown label type ‘continuous-multioutput’ Random Forest” error.

from sklearn.ensemble import RandomForestClassifier

from sklearn.datasets import make_regression

# Generate a synthetic dataset for

# multioutput regression.

X, y = make_regression(n_samples=100, n_features=10, n_targets=3, random_state=42)

# Create a Random Forest classifier

rf = RandomForestClassifier()

# Try to fit the model with multioutput

# regression targets.

rf.fit(X, y)

– You’re Using “Linearregression” To Predict Discrete Values

Any attempt to use “LinearRegression” in sklearn to predict discrete values can lead to “ValueError: classification metrics can’t handle a mix of continuous and multiclass targets“. For example, in the following, we’re using it to predict discrete class labels in the Iris dataset.

Since it predicts continuous values rather than class labels, it will produce a mix of multiclass and continuous targets. As a result, when we attempt to calculate the accuracy score using “accuracy_score()”, the error will be raised.

from sklearn.datasets import load_iris

from sklearn.linear_model import LinearRegression

from sklearn.metrics import accuracy_score

# Load the Iris dataset.

data = load_iris()

X = data.data

y = data.target

# Create a linear regression model.

model = LinearRegression()

# Fit the model to the data.

model.fit(X, y)

# Predict the target variable.

y_pred = model.predict(X)

# Calculate the accuracy score.

accuracy = accuracy_score(y, y_pred)

print(“Accuracy:”, accuracy)

How To Fix Sklearn “Unknown” Label Types and Related Errors?

To fix sklearn “unknown” label type and related errors, you can convert continuous values to categorical values, discretize continuous labels or use linear regression for continuous labels. Also, use classifiers that are suitable for multi-output regression and discrete class labels.

– Use an Encoder To Convert Continuous To Categorical Values

If you have continuous values in your code and still want logistic regression, you can convert them to categorical values. To do this, you’ll need the “LabelEncoder” method from the “preprocessing” class in sklearn. The following code shows you how to do this, and you can read the code comments for a better understanding.

import numpy as np

from sklearn.linear_model import LogisticRegression

from sklearn import preprocessing

# Define predictor and response variables.

X = np.array([[3, 1, 3], [4, 4, 3], [3, 6, 1], [4, 3, 3]])

y = np.array([0, 1.02, 1.03, 0]

# Convert values of “y” to discrete

# labels, also called categorical values

LE_preprocessing = preprocessing.LabelEncoder()

convert_y_categorical = LE_preprocessing.fit_transform(y)

# Prepare the Logistic Regression.

LR_classifier = LogisticRegression()

LR_classifier.fit(X, convert_y_categorical

# Predict the output for new data.

new_data = np.array([[4, 5, 2], [1, 3, 6]])

predicted_output = LR_classifier.predict(new_data)

# Print the output.

print(predicted_output)

– Discretize the Continuous Labels Into Different Classes

When you’re using SVM from sklearn, you can discretize the continuous labels in your code to prevent a possible “ValueError”. For example, based on the previous SVM code, we have discretized the continuous labels “[2.5, 4.2, 6.7]” into the corresponding classes “[‘low’, ‘medium’, ‘high’]”. By providing these discrete labels to the SVM classifier (“clf.fit(X, y_discrete)”), you can train the model without encountering the error.

from sklearn import svm

# Sample data.

X = [[1, 2], [3, 4], [5, 6]]

y = [2.5, 4.2, 6.7] # Continuous labels

# Discretize the continuous labels into classes.

y_discrete = [‘low’, ‘medium’, ‘high’]

# Create an SVM classifier.

clf = svm.SVC()

# Train the classifier.

clf.fit(X, y_discrete)

# New data for prediction.

new_data = [[2, 3], [4, 5]]

# Make predictions using the trained classifier.

predictions = clf.predict(new_data)

# Print the predicted values.

for prediction in predictions:

print(prediction)

– Use Linear Regression Classifier for Continuous Labels

Using linear regression for continuous labels also works to prevent the “ValueError” in our previous example. That means we can replace SVM totally with “LinearRegression” from sklearn.

For example, we’ve changed the previous code, and now we’re using “LinearRegression” to perform regression instead of classification.

from sklearn.linear_model import LinearRegression

# Sample data

X = [[1, 2], [3, 4], [5, 6]]

y = [2.5, 4.2, 6.7] # Continuous labels

# Create a Linear Regression model

reg = LinearRegression()

# Train the model

reg.fit(X, y)

# Predict using the trained model

predictions = reg.predict(X)

# Print the predicted values

for prediction in predictions:

print(prediction)

– Use Suitable Classifiers for Multi-Output Regression

If you’re working with multi-output regression, you should use classifiers that will work and not a “DecisionTreeClassifier”, which can cause a “ValueError”. The first classifier that you can use in multi-output regression is “DecisionTreeRegressor,” and the following is how to use it.

from sklearn.datasets import make_regression

from sklearn.tree import DecisionTreeRegressor

# Generate a multi-output regression dataset.

X, y = make_regression(n_samples=100, n_features=1, n_targets=3, noise=0.1)

# Create and train a DecisionTreeRegressor.

DT_regressor = DecisionTreeRegressor()

DT_regressor.fit(X, y)

# Make predictions on new data.

new_data = [[0.5]]

predictions = DT_regressor.predict(new_data)

# Print the predicted values.

print(“Predicted values:”)

for i, target_name in enumerate([“Target 1”, “Target 2”, “Target 3″]):

print(f”{target_name}: {predictions[0][i]}”)

Gradient Boosting is another popular method that can be extended to handle multi-output regression. Scikit-learn provides a “MultiOutputRegressor” wrapper that allows you to use any base regressor to perform multi-output regression. Here, the base regressor will be “GradientBoostingRegresssor” from “sklearn.ensemble” and the following is how to apply it to the previous code.

from sklearn.datasets import make_regression

from sklearn.ensemble import GradientBoostingRegressor

from sklearn.multioutput import MultiOutputRegressor

# Generate a multi-output regression dataset.

X, y = make_regression(n_samples=100, n_features=1, n_targets=3, noise=0.1)

# Create a base Gradient Boosting regressor.

base_regressor = GradientBoostingRegressor()

# Create and train a “MultiOutputRegressor”

# with the base regressor.

MO_regressor = MultiOutputRegressor(base_regressor)

MO_regressor.fit(X, y)

# Make predictions on new data.

new_data = [[0.5]]

predictions = MO_regressor.predict(new_data)

# Print the predicted values.

print(“Predicted values:”)

for i, target_name in enumerate([“Target 1”, “Target 2”, “Target 3″]):

print(f”{target_name}: {predictions[0][i]}”)

– Use Suitable Classification Algorithms for Discrete Values

A suitable classification algorithm for discrete values is the “LogisticRegression” and we can use it to fix the “ValueError” from the Iris dataset example. This will work because the dataset contains discrete class labels, as a result, it’s perfect for logistic regression.

The following code has more details on how it works, and you can run it to get a printed output.

from sklearn.datasets import load_iris

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score

from sklearn.model_selection import train_test_split

# Load the Iris dataset

data = load_iris()

X = data.data

y = data.target

# Split the data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a logistic regression model

model = LogisticRegression(max_iter=1000)

# model = LogisticRegression(solver=’liblinear’)

# Fit the model to the training data

model.fit(X_train, y_train)

# Predict the target variable for the test set

y_pred = model.predict(X_test)

# Calculate the accuracy score

accuracy = accuracy_score(y_test, y_pred)

print(“Accuracy:”, accuracy)

Conclusion

This article explained common errors that you can encounter while you’re working with sklearn, and we discussed solutions that can solve them. Before you leave, kindly read the following recap of what we talked about:

Any attempt to use a classifier that’s not suitable for a continuous target variable will lead to a “ValueError unknown label-type”.
Classifiers that are not designed for multi-output regression, will lead to a “ValueError” .
The right classifier for the job will solve the “unknown” label-type errors when you’re using sklearn.

We appreciate that you took your time to read to the end, and we hope that you’ve learned something new. Stay safe and happy coding!

Author
Recent Posts

Your Go-To Resource for Learn & Build: CSS,JavaScript,HTML,PHP,C++ and MYSQL. Meet The Team

Источник

One common error you may encounter in Python is:

ValueError: Unknown label type: 'continuous'

This error usually occurs when you attempt to use sklearn to fit a classification model like logistic regression and the values that you use for the response variable are continuous instead of categorical.

The following example shows how to use this syntax in practice.

How to Reproduce the Error

Suppose we attempt to use the following code to fit a logistic regression model:

import numpy as np
from sklearn.linear_model import LogisticRegression

#define values for predictor and response variables
x = np.array([[2, 2, 3], [3, 4, 3], [5, 6, 6], [7, 5, 5]])
y = np.array([0, 1.02, 1.02, 0])

#attempt to fit logistic regression model
classifier = LogisticRegression()
classifier.fit(x, y)

ValueError: Unknown label type: 'continuous'

We receive an error because currently the values for our response variable are continuous.

Recall that a logistic regression model requires the values of the response variable to be categorical such as:

0 or 1
“Yes” or “No”
“Pass” or “Fail”

Currently our response variable contains continuous values such as 0 and 1.02.

How to Fix the Error

The way to resolve this error is to simply convert the continuous values of the response variable to categorical values using the LabelEncoder() function from sklearn:

from sklearn import preprocessing
from sklearn import utils

#convert y values to categorical values
lab = preprocessing.LabelEncoder()
y_transformed = lab.fit_transform(y)

#view transformed values
print(y_transformed)

[0 1 1 0]

Each of the original values is now encoded as a 0 or 1.

We can now fit the logistic regression model:

#fit logistic regression model
classifier = LogisticRegression()
classifier.fit(x, y_transformed)

This time we don’t receive any error because the response values for the model are categorical.

Additional Resources

The following tutorials explain how to fix other common errors in Python:

How to Fix: ValueError: Index contains duplicate entries, cannot reshape
How to Fix: Typeerror: expected string or bytes-like object
How to Fix: TypeError: ‘numpy.float64’ object is not callable

Источник

Causes of ValueError: Unknown label type: 'continuous' in Python
Use Scikit’s LabelEncoder() Function to Fix ValueError: Unknown label type: 'continuous'
Evaluate the Data to Fix ValueError: Unknown label type: 'continuous'

Python ValueError: Unknown Label Type: 'continuous'

This article will tackle the causes and solutions to the ValueError: Unknown label type: 'continuous' error in Python.

Causes of `ValueError: Unknown label type: 'continuous'` in Python

Python interpreter throws this error when we try to train sklearn imported classifier on the continuous target variable.

Classifiers such as K Nearest Neighbor, Decision Tree, Logistic Regression, etc., predict the class of input variables. Class variables are in discrete or categorical forms such that 0 or 1, True or False, and Pass or Fail.

If sklearn imported classification algorithm, i.e., Logistic Regression is trained on the continuous target variable, it throws ValueError: Unknown label type:'continuous'.

Code:

import numpy as np
from sklearn.linear_model import LogisticRegression
input_var=np.array([[1.1,1.2,1.5,1.6],[0.5,0.9,0.6,0.8]])
target_var=np.array([1.4,0.4])
classifier_logistic_regression=LogisticRegression()
classifier_logistic_regression.fit(input_var,target_var)

Output:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [6], in <module>
----> 1 lr.fit(x,y)

File c:\users\hp 840 g3\appdata\local\programs\python\python39\lib\site-packages\sklearn\linear_model\_logistic.py:1516, in LogisticRegression.fit(self, X, y, sample_weight)
   1506     _dtype = [np.float64, np.float32]
   1508 X, y = self._validate_data(
   1509     X,
   1510     y,
   (...)
   1514     accept_large_sparse=solver not in ["liblinear", "sag", "saga"],
   1515 )
-> 1516 check_classification_targets(y)
   1517 self.classes_ = np.unique(y)
   1519 multi_class = _check_multi_class(self.multi_class, solver, len(self.classes_))

File c:\users\hp 840 g3\appdata\local\programs\python\python39\lib\site-packages\sklearn\utils\multiclass.py:197, in check_classification_targets(y)
    189 y_type = type_of_target(y)
    190 if y_type not in [
    191     "binary",
    192     "multiclass",
   (...)
    195     "multilabel-sequences",
    196 ]:
--> 197     raise ValueError("Unknown label type: %r" % y_type)

ValueError: Unknown label type: 'continuous'

Float values as target label y are passed to the logistic regression classifier, which accepts categorical or discrete class labels. As a result, the code throws an error at the lr.fit() function, and the model refuses to train on the given data.

Use Scikit’s `LabelEncoder()` Function to Fix `ValueError: Unknown label type: 'continuous'`

LabelEncoder() Function encodes the continuous target variables into discrete or categorical labels.

The classifier now accepts these values. The classifier trains on the given data and predicts the output class.

Code:

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn import preprocessing
from sklearn import utils
input_var=np.array([[1.1,1.2,1.5,1.6],[0.5,0.9,0.6,0.8]])
target_var=np.array([1.4,0.4])
predict_var=np.array([ [1.3, 1.7, 1.8,1.4], [0.2, 0.6, 0.3, 0.4] ])
encoded = preprocessing.LabelEncoder()
encoded_target= encoded.fit_transform(target_var)
print(encoded_target)
classifier_logistic_regression=LogisticRegression()
classifier_logistic_regression.fit(input_var,encoded_target)
predict=classifier_logistic_regression.predict(predict_var)
print(predict)

Output:

Float values of target variable target_var are encoded into discrete or categorical i.e. encoded_target value using LabelEncoder() function.

The classifier now accepts these values. The classifier is trained to predict the class of new data, denoted by predict_var.

Evaluate the Data to Fix `ValueError: Unknown label type: 'continuous'`

Sometimes the data must be carefully examined to determine whether the issue is one of regression or classification. Some output variables, such as house price, cannot be classified or discretized.

In such cases, the issue is one of regression. Because the regression model accepts continuous target variables, the target variable does not need to be encoded.

Code:

import numpy as np
from sklearn.linear_model import LinearRegression
input_var=np.array([[1.1,1.2,1.5,1.6],[0.5,0.9,0.6,0.8]])
target_var=np.array([1.4,0.4])
predict_var=np.array([ [1.3, 1.7, 1.8,1.4], [0.2, 0.6, 0.3, 0.4] ])
linear_Regressor_model=LinearRegression()
linear_Regressor_model.fit(input_var,target_var)
predict=linear_Regressor_model.predict(predict_var)
print(predict)

Output:

Float values in the output variable target_var shows that the problem is the regression. The model must predict the value of the input variable rather than its class.

A linear regression model is trained and predicts the outcome value of new data.

Источник

Как воспроизвести ошибку

Как исправить ошибку

Дополнительные ресурсы

Table of contents

ValueError: Unknown label type: ‘continuous’

What Does Continuous Mean?

What is the Difference Between Regression and Classification?

Example #1: Evaluating the Data

Solution

Example #2: Evaluating the Model

Solution

Summary

Why do “Unknown” Label Types and Related Errors Occur?

– A Continuous Target Variable Cannot Work With the Classifier

– The Classifiers Are Not Suitable for Multi-Output Regression

– You’re Using “Linearregression” To Predict Discrete Values

How To Fix Sklearn “Unknown” Label Types and Related Errors?

– Use an Encoder To Convert Continuous To Categorical Values

– Discretize the Continuous Labels Into Different Classes

– Use Linear Regression Classifier for Continuous Labels

– Use Suitable Classifiers for Multi-Output Regression

– Use Suitable Classification Algorithms for Discrete Values

Conclusion

How to Reproduce the Error

How to Fix the Error

Additional Resources

Causes of `ValueError: Unknown label type: 'continuous'` in Python

Use Scikit’s `LabelEncoder()` Function to Fix `ValueError: Unknown label type: 'continuous'`

Evaluate the Data to Fix `ValueError: Unknown label type: 'continuous'`

Интересное по теме:

Как воспроизвести ошибку

Как исправить ошибку

Дополнительные ресурсы

Table of contents

ValueError: Unknown label type: ‘continuous’

What Does Continuous Mean?

What is the Difference Between Regression and Classification?

Example #1: Evaluating the Data

Solution

Example #2: Evaluating the Model

Solution

Summary

Why do “Unknown” Label Types and Related Errors Occur?

– A Continuous Target Variable Cannot Work With the Classifier

– The Classifiers Are Not Suitable for Multi-Output Regression

– You’re Using “Linearregression” To Predict Discrete Values

How To Fix Sklearn “Unknown” Label Types and Related Errors?

– Use an Encoder To Convert Continuous To Categorical Values

– Discretize the Continuous Labels Into Different Classes

– Use Linear Regression Classifier for Continuous Labels

– Use Suitable Classifiers for Multi-Output Regression

– Use Suitable Classification Algorithms for Discrete Values

Conclusion

How to Reproduce the Error

How to Fix the Error

Additional Resources

Causes of ValueError: Unknown label type: 'continuous' in Python

Use Scikit’s LabelEncoder() Function to Fix ValueError: Unknown label type: 'continuous'

Evaluate the Data to Fix ValueError: Unknown label type: 'continuous'

Интересное по теме:

Causes of `ValueError: Unknown label type: 'continuous'` in Python

Use Scikit’s `LabelEncoder()` Function to Fix `ValueError: Unknown label type: 'continuous'`

Evaluate the Data to Fix `ValueError: Unknown label type: 'continuous'`