Parsererror python ошибка

Пытаюсь зугрузить tsv–файл в pandas.

import pandas as pd
df = pd.read_csv(filename, sep='\t')
print(df)

После выполнения этого кода в консоли вижу ошибку

df = pd.read_csv(filename, sep='\t') File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/parsers.py", line 655, in parser_f
return _read(filepath_or_buffer, kwds) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/parsers.py", line 411, in _read
data = parser.read(nrows) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/parsers.py", line 982, in read
ret = self._engine.read(nrows) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/parsers.py", line 1719, in read
data = self._reader.read(nrows) File "pandas/_libs/parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas/_libs/parsers.c:10862) File "pandas/_libs/parsers.pyx", line 912, in pandas._libs.parsers.TextReader._read_low_memory (pandas/_libs/parsers.c:11138) File "pandas/_libs/parsers.pyx", line 966, in pandas._libs.parsers.TextReader._read_rows (pandas/_libs/parsers.c:11884) File "pandas/_libs/parsers.pyx", line 953, in pandas._libs.parsers.TextReader._tokenize_rows (pandas/_libs/parsers.c:11755) File "pandas/_libs/parsers.pyx", line 2184, in pandas._libs.parsers.raise_parser_error (pandas/_libs/parsers.c:28765) pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 5, saw 9

Подскажите, в чем может быть проблема?

Пытаюсь зугрузить tsv–файл в pandas.

import pandas as pd
df = pd.read_csv(filename, sep='\t')
print(df)

После выполнения этого кода в консоли вижу ошибку

df = pd.read_csv(filename, sep='\t') File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/parsers.py", line 655, in parser_f
return _read(filepath_or_buffer, kwds) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/parsers.py", line 411, in _read
data = parser.read(nrows) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/parsers.py", line 982, in read
ret = self._engine.read(nrows) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/parsers.py", line 1719, in read
data = self._reader.read(nrows) File "pandas/_libs/parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas/_libs/parsers.c:10862) File "pandas/_libs/parsers.pyx", line 912, in pandas._libs.parsers.TextReader._read_low_memory (pandas/_libs/parsers.c:11138) File "pandas/_libs/parsers.pyx", line 966, in pandas._libs.parsers.TextReader._read_rows (pandas/_libs/parsers.c:11884) File "pandas/_libs/parsers.pyx", line 953, in pandas._libs.parsers.TextReader._tokenize_rows (pandas/_libs/parsers.c:11755) File "pandas/_libs/parsers.pyx", line 2184, in pandas._libs.parsers.raise_parser_error (pandas/_libs/parsers.c:28765) pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 5, saw 9

Подскажите, в чем может быть проблема?

In this tutorial you’ll learn how to fix the “ParserError: Error tokenizing data. C error: Expected X fields in line Y, saw Z” in the Python programming language.

The article consists of the following information:

Let’s get started.

Example Data & Software Libraries

Consider the CSV file illustrated below as a basis for this tutorial:

CSV DataFrame pandas read_csv error tokenizing data python

You may already note that rows 4 and 6 contain one value too much. Those two rows contain four different values, but the other rows contain only three values.

Let’s assume that we want to read this CSV file as a pandas DataFrame into Python.

For this, we first have to import the pandas library:

import pandas as pd                        # Load pandas

Let’s move on to the examples!

Reproduce the ParserError: Error tokenizing data. C error: Expected X fields in line Y, saw Z

In this section, I’ll show how to replicate the error message “ParserError: Error tokenizing data. C error: Expected X fields in line Y, saw Z”.

Let’s assume that we want to read our example CSV file using the default settings of the read_csv function. Then, we might try to import our data as shown below:

data_import = pd.read_csv('data.csv')      # Try to import CSV file
# ParserError: Error tokenizing data. C error: Expected 3 fields in line 4, saw 4

Unfortunately, the “ParserError: Error tokenizing data. C error: Expected X fields in line Y, saw Z” is returned after executing the Python syntax above.

The reason for this is that our CSV file contains too many values in some of the rows.

In the next section, I’ll show an easy solution for this problem. So keep on reading…

Debug the ParserError: Error tokenizing data. C error: Expected X fields in line Y, saw Z

In this example, I’ll explain an easy fix for the “ParserError: Error tokenizing data. C error: Expected X fields in line Y, saw Z” in the Python programming language.

We can ignore all lines in our CSV file that are formatted wrongly by specifying the error_bad_lines argument to False.

Have a look at the example code below:

data_import = pd.read_csv('data.csv',      # Remove rows with errors
                          error_bad_lines = False)
print(data_import)                         # Print imported pandas DataFrame

table 1 DataFrame pandas read_csv error tokenizing data python

As shown in Table 2, we have created a valid pandas DataFrame output using the previous code. As you can see, we have simply skipped the rows with too many values.

This is a simply trick that usually works. However, please note that this trick should be done with care, since the discussed error message typically points to more general issues with your data.

For that reason, it’s advisable to investigate why some of the rows are not formatted properly.

For this, I can also recommend this thread on Stack Overflow. It discusses how to identify wrong lines, and it also discusses other less common reasons for the error message “ParserError: Error tokenizing data. C error: Expected X fields in line Y, saw Z”.

Video & Further Resources

Have a look at the following video on my YouTube channel. In the video, I’m explaining the Python codes of this tutorial:

The YouTube video will be added soon.

Furthermore, you might read the other articles on this website. You can find some interesting tutorials below:

  • Read CSV File as pandas DataFrame in Python
  • Skip Rows but Keep Header when Reading CSV File
  • Skip First Row when Reading pandas DataFrame from CSV File
  • Specify Delimiter when Reading pandas DataFrame from CSV File
  • Ignore Header when Reading CSV File as pandas DataFrame
  • Check Data Type of Columns in pandas DataFrame in Python
  • Change Data Type of pandas DataFrame Column in Python
  • Basic Course for the pandas Library in Python
  • Introduction to Python Programming

Summary: In this article, I have explained how to handle the “ParserError: Error tokenizing data. C error: Expected X fields in line Y, saw Z” in the Python programming language. If you have any further questions or comments, let me know in the comments. Furthermore, don’t forget to subscribe to my email newsletter to get updates on new articles.

As a data scientist or software engineer, you might have encountered the ParserError when working with Python/Pandas. This error occurs when the Pandas module is unable to read a file due to a formatting issue. The error message usually reads: ‘ParserError: Error tokenizing data. C error: Expected x fields in line i, saw y’. In this blog post, we will explore what ParserError is, what causes it, and how to fix it.

What Is ParserError in Python/Pandas and How to Fix It?

As a data scientist or software engineer, you might have encountered the ParserError when working with Python/Pandas. This error occurs when the Pandas module is unable to read a file due to a formatting issue. The error message usually reads: “ParserError: Error tokenizing data. C error: Expected x fields in line i, saw y”. In this blog post, we will explore what ParserError is, what causes it, and how to fix it.

What is ParserError in Python/Pandas?

ParserError is an error that occurs when the Pandas module is unable to read a file due to a formatting issue. The error message indicates that the parser was expecting a certain number of fields in a specific line of the file but found a different number of fields. This error can occur in various file formats, including CSV, TSV, and Excel files.

What Causes ParserError?

ParserError is caused by a variety of issues, including:

Incorrect Delimiter

One of the common causes of ParserError is an incorrect delimiter. For instance, if you are working with a CSV file and use a comma (‘,’) as a delimiter when the actual delimiter is a semicolon (‘;’), Pandas will not be able to read the file, and you will get a ParserError.

Inconsistent Number of Fields

Another common cause of ParserError is an inconsistent number of fields in the file. For instance, if you are working with a CSV file, and some rows have a different number of fields, Pandas will not be able to read the file, and you will get a ParserError.

Special Characters

Special characters such as quotes, commas, and semicolons can also cause ParserError. For instance, if a comma appears within a field that is enclosed in quotes, Pandas may interpret it as a delimiter and cause a ParserError.

How to Fix ParserError

Fixing ParserError requires identifying the cause of the error and taking appropriate action. Here are some possible solutions:

Check the Delimiter

If the cause of the ParserError is an incorrect delimiter, you will need to check the delimiter used in the file and update it in your code. For instance, if the delimiter is a semicolon, you can specify it in your code using the sep parameter.

import pandas as pd
df = pd.read_csv('file.csv', sep=';')

Deal with Inconsistent Fields

If the cause of the ParserError is an inconsistent number of fields, you will need to fix the file. You can either remove the rows with missing fields or add the missing fields to the file.

import pandas as pd
try:
    df = pd.read_csv('file.csv')
except pd.errors.ParserError:
    df = pd.read_csv('file.csv', error_bad_lines=False)

Deal with Special Characters

If the cause of the ParserError is special characters, you will need to modify the file to remove or escape the characters causing the error. For instance, if a comma appears within a field enclosed in quotes, you can escape the comma using a backslash (»).

import pandas as pd
df = pd.read_csv('file.csv', quotechar='"', escapechar='\\')

Conclusion

In conclusion, ParserError is a common error that occurs when working with Python/Pandas. It is caused by various issues, including incorrect delimiters, inconsistent number of fields, and special characters. Fixing ParserError requires identifying the cause of the error and taking appropriate action, including checking the delimiter, dealing with inconsistent fields, and dealing with special characters. By following the solutions outlined in this post, you can fix ParserError and continue working with your data with minimal disruption.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Join today and get 150 hours of free compute per month.

  1. What Is the ParserError: Error tokenizing data. C error in Python
  2. How to Fix the ParserError: Error tokenizing data. C error in Python
  3. Skip Rows to Fix the ParserError: Error tokenizing data. C error
  4. Use the Correct Separator to Fix the ParserError: Error tokenizing data. C error
  5. Use dropna() to Fix the ParserError: Error tokenizing data. C error
  6. Use the fillna() Function to Fill Up the NaN Values

Error Tokenizing Data C Error in Python

When playing with data for any purpose, it is mandatory to clean the data, which means filling the null values and removing invalid entries to clean the data, so it doesn’t affect the results, and the program runs smoothly.

Furthermore, the causes of the ParserError: Error tokenizing data. C error can be providing the wrong data in the files, like mixed data, a different number of columns, or several data files stored as a single file.

And you can also encounter this error if you read a CSV file as read_csv but provide different separators and line terminators.

What Is the ParserError: Error tokenizing data. C error in Python

As discussed, the ParserError: Error tokenizing data. C error occurs when your Python program parses CSV data but encounters errors like invalid values, null values, unfilled columns, etc.

Let’s say we have this data in the data.csv file, and we are using it to read with the help of pandas, although it has an error.

Name,Roll,Course,Marks,CGPA
Ali,1,SE,87,3
John,2,CS,78,
Maria,3,DS,13,,

Code example:

import pandas as pd
pd.read_csv('data.csv')

Output:

ParserError: Error tokenizing data. C error: Expected 5 fields in line 4, saw 6

As you can see, the above code has thrown a ParserError: Error tokenizing data. C error while reading data from the data.csv file, which says that the compiler was expecting 5 fields in line 4 but got 6 instead.

The error itself is self-explanatory; it indicates the exact point of the error and shows the reason for the error, too, so we can fix it.

How to Fix the ParserError: Error tokenizing data. C error in Python

So far, we have understood the ParserError: Error tokenizing data. C error in Python; now let’s see how we can fix it.

It is always recommended to clean the data before analyzing it because it may affect the results or fail your program to run.

Data cleansing helps in removing invalid data inputs, null values, and invalid entries; basically, it is a pre-processing stage of the data analysis.

In Python, we have different functions and parameters that help clean the data and avoid errors.

Skip Rows to Fix the ParserError: Error tokenizing data. C error

This is one of the most common techniques that skip the row, causing the error; as you can see from the above data, the last line was causing the error.

Now using the argument on_bad_lines = 'skip', it has ignored the buggy row and stored the remaining in data frame df.

import pandas as pd
df = pd.read_csv('data.csv', on_bad_lines='skip')
df

Output:

	Name	Roll	Course	Marks	CGPA
0	Ali		1		SE		87		3.0
1	John	2		CS		78		NaN

The above code will skip all those lines causing errors and printing the others; as you can see in the output, the last line is skipping because it was causing the error.

But we are getting the NaN values that need to be fixed; otherwise, it will affect the results of our statistical analysis.

Use the Correct Separator to Fix the ParserError: Error tokenizing data. C error

Using an invalid separator can also cause the ParserError, so it is important to use the correct and suitable separator depending on the data you provide.

Sometimes we use tab to separate the CSV data or space, so it is important to specify that separator in your program too.

import pandas as pd
pd.read_csv('data.csv', sep=',',on_bad_lines='skip' ,lineterminator='\n')

Output:

	Name	Roll	Course	Marks	CGPA\r
0	Ali		1		SE		87		3\r
1	John	2		CS		78		\r

The separator is , that’s why we have mentioned sep=',' and the lineterminator ='\n' because our line ends with \n.

Use dropna() to Fix the ParserError: Error tokenizing data. C error

The dropna function is used to drop all the rows that contain any Null or NaN values.

import pandas as pd
df = pd.read_csv('data.csv', on_bad_lines='skip')
print("      **** Before dropna ****")
print(df)

print("\n      **** After dropna ****")
print(df.dropna())

Output:

      **** Before dropna ****
   Name  Roll Course  Marks  CGPA
0   Ali     1     SE     87   3.0
1  John     2     CS     78   NaN

      **** After dropna ****
  Name  Roll Course  Marks  CGPA
0  Ali     1     SE     87   3.0

Since we have only two rows, one row has all the attributes but the second row has NaN values so the dropna() function has skip the row with the NaN value and displayed just a single row.

Use the fillna() Function to Fill Up the NaN Values

When you get NaN values in your data, you can use the fillna() function to replace other values that use the default value 0.

Code Example:

import pandas as pd

print("      **** Before fillna ****")
df = pd.read_csv('data.csv', on_bad_lines='skip')
print(df,"\n\n")

print("      **** After fillna ****")
print(df.fillna(0))  # using 0 inplace of NaN

Output:

      **** Before fillna ****
   Name  Roll Course  Marks  CGPA
0   Ali     1     SE     87   3.0
1  John     2     CS     78   NaN


      **** After fillna ****
   Name  Roll Course  Marks  CGPA
0   Ali     1     SE     87   3.0
1  John     2     CS     78   0.0

The fillna() has replaced the NaN with 0 so we can analyze the data properly.

Понравилась статья? Поделить с друзьями:

Интересное по теме:

  • Parsec ошибка 6023 как исправить
  • Pdfconversionerror как исправить ошибку
  • Parsec ошибка 800
  • Pcm ошибка форд фокус 2
  • Pdf24 ошибка печати

  • 0 0 голоса
    Рейтинг статьи
    Подписаться
    Уведомить о
    guest

    0 комментариев
    Старые
    Новые Популярные
    Межтекстовые Отзывы
    Посмотреть все комментарии