I have created a list datatype which has the path of three folders where each folder has a lot of .txt files.
I am trying to work with each file in the folder by making it a pandas dataframe but I am getting the error as listed.
CODE-
for l in list:
for root, dirs, files in os.walk(l, topdown=False):
for name in files:
#print(os.path.join(root, name))
df = pd.read_csv(os.path.join(root, name))
ERROR-
Traceback (most recent call last):
File "feature_drebin.py", line 18, in <module>
df = pd.read_csv(os.path.join(root, name))
File "E:\anaconda\lib\site-packages\pandas\io\parsers.py", line 709, in parser_f
return _read(filepath_or_buffer, kwds)
File "E:\anaconda\lib\site-packages\pandas\io\parsers.py", line 449, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "E:\anaconda\lib\site-packages\pandas\io\parsers.py", line 818, in __init__
self._make_engine(self.engine)
File "E:\anaconda\lib\site-packages\pandas\io\parsers.py", line 1049, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "E:\anaconda\lib\site-packages\pandas\io\parsers.py", line 1695, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 565, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
.txt file
Handling data and working with file formats like CSV, JSON, and Excel is a common task for developers. However, sometimes you might encounter an EmptyDataError
while working with these file formats. This error occurs when there are no columns to parse from the file, which usually means the file is empty or contains only whitespace. In this guide, we’ll explore the reasons behind this error and provide a step-by-step solution to resolve the ‘No Columns to Parse from File’ issue.
Table of Contents
- Understanding the EmptyDataError
- How to Resolve the EmptyDataError
- Step 1: Check the File Path
- Step 2: Inspect the File Content
- Step 3: Clean the File Content
- Step 4: Use the
skip_blank_lines
Parameter - FAQ
- Related Resources
Understanding the EmptyDataError
The EmptyDataError
is often encountered while using the pandas library in Python. Pandas is an open-source data analysis and data manipulation library that provides data structures and functions needed to work with structured data seamlessly. The error usually occurs when you’re trying to read an empty or whitespace-only file using functions like pd.read_csv()
, pd.read_json()
, or pd.read_excel()
.
Here’s an example of the error message you might see:
EmptyDataError: No columns to parse from file
How to Resolve the EmptyDataError
To resolve the EmptyDataError
, follow these steps:
Step 1: Check the File Path
Make sure you’re using the correct file path while reading the file. If the file path is incorrect, Python might be trying to read a non-existent file, leading to the error. You can use os.path
to verify the file’s existence.
import os
file_path = "path/to/your/file.csv"
if os.path.exists(file_path):
print("File exists")
else:
print("File not found")
Step 2: Inspect the File Content
Check the file contents to ensure it contains data. Open the file using a text editor or a spreadsheet application and inspect the content. If the file is empty or contains only whitespace, it will cause the EmptyDataError
.
Step 3: Clean the File Content
Before reading the file using pandas, ensure that the file contains valid data. Remove any unnecessary whitespace or empty rows and columns from the file. You can use a text editor or a spreadsheet application to clean the file manually. Alternatively, you can use Python’s built-in functions to remove whitespace and empty lines programmatically.
with open("path/to/your/file.csv", "r") as file:
lines = file.readlines()
cleaned_lines = [line.strip() for line in lines if line.strip()]
with open("path/to/your/cleaned_file.csv", "w") as file:
file.writelines(cleaned_lines)
Step 4: Use the skip_blank_lines
Parameter
When reading a CSV file using pandas, you can use the skip_blank_lines
parameter to ignore empty lines in the file. Set the parameter to True
while using pd.read_csv()
.
import pandas as pd
data_frame = pd.read_csv("path/to/your/cleaned_file.csv", skip_blank_lines=True)
By following these steps, you should be able to resolve the EmptyDataError
issue.
FAQ
1. What is pandas in Python?
Pandas is an open-source data analysis and data manipulation library for Python. It provides data structures and functions needed to work with structured data seamlessly. Pandas is widely used for data cleaning, transformation, analysis, and visualization.
2. What causes the EmptyDataError in pandas?
The EmptyDataError
occurs when there are no columns to parse from the file. This usually means that the file is empty or contains only whitespace.
3. How to check if a file exists in Python?
You can use the os.path.exists()
function to check if a file exists in Python. Pass the file path as a parameter, and the function will return True
if the file exists and False
otherwise.
4. How do I skip blank lines while reading a CSV file using pandas?
You can use the skip_blank_lines
parameter while reading a CSV file using pandas. Set the parameter to True
while using pd.read_csv()
to skip blank lines in the file.
5. Can I use pandas with other file formats like JSON and Excel?
Yes, pandas can be used to work with various file formats like CSV, JSON, Excel, and more. You can use functions like pd.read_json()
and pd.read_excel()
to read JSON and Excel files, respectively.
- Pandas Official Documentation
- Working with CSV Files in Python
- Python File Handling: Create, Open, Append, Read, Write
Problem Description:
I have a string object («textData») which contains CSV data.
I’m able to save it as CSV by:
with open(fileName, "w") as text_file:
print(textData, file=text_file)
but I would like to work with the data in pandas before saving the csv. So I’m trying to get the data into a pandas df.
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO(textData), sep=",")
I get this error: EmptyDataError: No columns to parse from file
This is a the textData string:
R$M21,2021-01-26,1.3265,1.3265,1.3265,1.3265,0,0
R$M21,2021-01-27,1.3263,1.3263,1.3263,1.3263,0,0
R$M21,2021-01-28,1.3319,1.3319,1.3319,1.3319,0,0
R$M21,2021-01-29,1.3287,1.3287,1.3287,1.3287,0,0
R$M21,2021-02-01,1.3315,1.3315,1.3315,1.3315,0,0
R$M21,2021-02-02,1.3328,1.3328,1.3328,1.3328,0,0
R$M21,2021-02-03,1.3331,1.3331,1.3331,1.3331,0,0
R$M21,2021-02-04,1.3361,1.3361,1.3361,1.3361,0,0
R$M21,2021-02-05,1.3383,1.3383,1.3383,1.3383,0,0
R$M21,2021-02-08,1.3354,1.3354,1.3354,1.3354,0,0
R$M21,2021-02-09,1.3279,1.3279,1.3279,1.3279,0,0
R$M21,2021-02-10,1.3259,1.3259,1.3259,1.3259,0,0
R$M21,2021-02-11,1.3253,1.3253,1.3253,1.3253,0,0
R$M21,2021-02-12,1.3272,1.3272,1.3272,1.3272,0,0
R$M21,2021-02-15,1.3224,1.3224,1.3224,1.3224,0,0
R$M21,2021-02-16,1.3232,1.3232,1.3232,1.3232,0,0
R$M21,2021-02-17,1.329,1.329,1.329,1.329,0,0
R$M21,2021-02-18,1.3275,1.3275,1.3275,1.3275,0,0
R$M21,2021-02-19,1.3246,1.3246,1.3246,1.3246,0,0
R$M21,2021-02-22,1.3235,1.3235,1.3235,1.3235,0,0
R$M21,2021-02-23,1.3216,1.3216,1.3216,1.3216,0,0
R$M21,2021-02-24,1.321,1.321,1.321,1.321,0,0
R$M21,2021-02-25,1.3181,1.3181,1.3181,1.3181,0,0
R$M21,2021-02-26,1.3313,1.3313,1.3313,1.3313,0,0
R$M21,2021-03-01,1.3323,1.3323,1.3323,1.3323,0,0
R$M21,2021-03-02,1.3315,1.3315,1.3315,1.3315,0,0
R$M21,2021-03-03,1.3309,1.3309,1.3309,1.3309,0,0
R$M21,2021-03-04,1.3328,1.3328,1.3328,1.3328,0,0
R$M21,2021-03-05,1.3417,1.3417,1.3417,1.3417,0,0
R$M21,2021-03-08,1.3479,1.3479,1.3479,1.3479,0,0
R$M21,2021-03-09,1.345,1.345,1.345,1.345,0,0
R$M21,2021-03-10,1.3476,1.3476,1.3476,1.3476,0,0
R$M21,2021-03-11,1.3403,1.3403,1.3403,1.3403,0,0
R$M21,2021-03-12,1.3463,1.3463,1.3463,1.3463,0,0
R$M21,2021-03-15,1.3456,1.3456,1.3456,1.3456,35,35
R$M21,2021-03-16,1.3455,1.3456,1.3452,1.3454,85,20
R$M21,2021-03-17,1.3457,1.3479,1.3451,1.3479,0,20
R$M21,2021-03-18,1.3432,1.3432,1.3432,1.3432,0,20
R$M21,2021-03-19,1.3425,1.3425,1.3425,1.3425,20,0
R$M21,2021-03-22,1.3434,1.3434,1.3405,1.3405,20,0
R$M21,2021-03-23,1.3433,1.3433,1.3433,1.3433,0,0
R$M21,2021-03-24,1.3461,1.3461,1.3461,1.3461,6,6
R$M21,2021-03-25,1.3476,1.3476,1.3472,1.3472,0,6
R$M21,2021-03-26,1.3477,1.3477,1.3477,1.3477,0,6
R$M21,2021-03-29,1.3467,1.3467,1.3467,1.3467,0,6
R$M21,2021-03-30,1.3483,1.3483,1.3483,1.3483,0,6
R$M21,2021-03-31,1.3448,1.3448,1.3448,1.3448,0,6
R$M21,2021-04-01,1.3461,1.3461,1.3461,1.3461,0,6
R$M21,2021-04-02,1.3442,1.3442,1.3442,1.3442,0,6
R$M21,2021-04-05,1.3446,1.3446,1.3446,1.3446,0,6
R$M21,2021-04-06,1.3418,1.3418,1.3418,1.3418,10,11
R$M21,2021-04-07,1.339,1.3398,1.3389,1.3389,0,11
R$M21,2021-04-08,1.3406,1.3406,1.3406,1.3406,0,11
R$M21,2021-04-09,1.3411,1.3411,1.3411,1.3411,23,28
R$M21,2021-04-12,1.3427,1.3427,1.3406,1.3406,3,31
R$M21,2021-04-13,1.3425,1.3431,1.3425,1.3431,20,51
R$M21,2021-04-14,1.3374,1.3378,1.3374,1.3375,0,51
R$M21,2021-04-15,1.335,1.335,1.335,1.335,217,222
R$M21,2021-04-16,1.3358,1.3358,1.3337,1.3337,416,407
R$M21,2021-04-19,1.3344,1.3346,1.331,1.331,370,428
R$M21,2021-04-20,1.3305,1.3316,1.3265,1.3283,5,431
R$M21,2021-04-21,1.3291,1.3302,1.3291,1.3302,100,422
R$M21,2021-04-22,1.3304,1.3304,1.3279,1.3279,10,427
R$M21,2021-04-23,1.3277,1.3277,1.3274,1.3274,16,437
R$M21,2021-04-26,1.3273,1.3273,1.3256,1.326,204,438
R$M21,2021-04-27,1.3259,1.3267,1.3255,1.3257,79,429
R$M21,2021-04-28,1.3274,1.3278,1.3262,1.3262,22,441
R$M21,2021-04-29,1.326,1.3265,1.3245,1.3255,16,457
R$M21,2021-04-30,1.3266,1.3277,1.3266,1.3277,60,457
R$M21,2021-05-03,1.328,1.3341,1.328,1.3318,8,458
R$M21,2021-05-04,1.3298,1.3366,1.3298,1.3366,110,466
R$M21,2021-05-05,1.3376,1.3387,1.3351,1.3358,0,466
R$M21,2021-05-06,1.3349,1.3349,1.3349,1.3349,1,467
R$M21,2021-05-07,1.332,1.332,1.3316,1.3316,25,466
R$M21,2021-05-10,1.3263,1.3263,1.3247,1.3247,187,480
R$M21,2021-05-11,1.3244,1.3276,1.3244,1.3251,6,486
R$M21,2021-05-12,1.329,1.329,1.3287,1.3287,119,586
R$M21,2021-05-13,1.3312,1.3366,1.3294,1.3343,270,738
R$M21,2021-05-14,1.3346,1.3371,1.3338,1.3338,392,841
R$M21,2021-05-17,1.3332,1.3361,1.3319,1.3356,99,835
R$M21,2021-05-18,1.3358,1.3358,1.3295,1.33,93,785
R$M21,2021-05-19,1.3295,1.333,1.3287,1.3328,25,784
R$M21,2021-05-20,1.335,1.3354,1.3326,1.3329,26,773
R$M21,2021-05-21,1.3309,1.3309,1.3301,1.3301,25,777
R$M21,2021-05-24,1.3298,1.3318,1.3298,1.3301,39,767
R$M21,2021-05-25,1.3293,1.3293,1.3253,1.3254,28,782
R$M21,2021-05-26,1.3249,1.3249,1.323,1.3235,48,770
R$M21,2021-05-27,1.3245,1.3247,1.3229,1.3229,51,805
R$M21,2021-05-28,1.3238,1.3247,1.323,1.3244,76,826
R$M21,2021-05-31,1.3237,1.3237,1.3223,1.3226,16,826
R$M21,2021-06-01,1.3194,1.3227,1.3194,1.3227,34,808
R$M21,2021-06-02,1.323,1.3248,1.322,1.3248,50,785
R$M21,2021-06-03,1.3235,1.3245,1.3228,1.3244,137,720
R$M21,2021-06-04,1.3276,1.3285,1.3274,1.3285,219,564
R$M21,2021-06-07,1.3251,1.3252,1.3232,1.3232,42,544
R$M21,2021-06-08,1.3236,1.3238,1.3226,1.3237,290,343
R$M21,2021-06-09,1.3232,1.3243,1.3231,1.3233,48,343
R$M21,2021-06-10,1.3239,1.3253,1.3238,1.3244,406,292
R$M21,2021-06-11,1.3249,1.3261,1.3217,1.324,107,0
R$M21,2021-06-14,1.3252,1.3271,1.3252,1.3261,107,0
What am I doing wrong?
Thanks
Solution – 1
The error is in the parts you aren’t showing us, because your code works fine. I’m guessing you don’t have newlines separating the lines.
C:tmp>type x.py
textData="""
R$M21,2021-06-08,1.3236,1.3238,1.3226,1.3237,290,343
R$M21,2021-06-09,1.3232,1.3243,1.3231,1.3233,48,343
R$M21,2021-06-10,1.3239,1.3253,1.3238,1.3244,406,292
R$M21,2021-06-11,1.3249,1.3261,1.3217,1.324,107,0
R$M21,2021-06-14,1.3252,1.3271,1.3252,1.3261,107,0"""
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO(textData), sep=",")
print(df)
C:tmp>python x.py
R$M21 2021-06-08 1.3236 1.3238 1.3226 1.3237 290 343
0 R$M21 2021-06-09 1.3232 1.3243 1.3231 1.3233 48 343
1 R$M21 2021-06-10 1.3239 1.3253 1.3238 1.3244 406 292
2 R$M21 2021-06-11 1.3249 1.3261 1.3217 1.3240 107 0
3 R$M21 2021-06-14 1.3252 1.3271 1.3252 1.3261 107 0
C:tmp>
Solution – 2
First, make sure to add newline after each line, best through os.linesep
.
Then set the StringIO
buffer «head position» to start, aka 0, before passing it to pandas:
import os
import pandas as pd
from io import StringIO
buffer = StringIO()
buffer.write('hello,23,2022,bye' + os.linesep)
buffer.write('world,43,2025,then' + os.linesep)
buffer.seek(0)
df = pd.read_csv(buffer, sep=',', header=None)
print(df)
This will yield:
0 1 2 3
0 hello 23 2022 bye
1 world 43 2025 then
[Python-3.9]
I am trying to extract 4 tables from an input file. Here is an extract of said file with one of the four tables:
...
*** USER INFORMATION MESSAGE 7570 (GPWG1S)
RESULTS OF RIGID BODY CHECKS OF MATRIX KGG (G-SET) FOLLOW:
PRINT RESULTS IN ALL SIX DIRECTIONS AGAINST THE LIMIT OF 1.000000E-03
DIRECTION STRAIN ENERGY PASS/FAIL
--------- ------------- ---------
# Table I am trying to extract
1 2.783836E-05 PASS
2 1.069251E-04 PASS
3 1.004842E-04 PASS
4 1.589776E-04 PASS
5 1.644181E-06 PASS
6 2.628610E-05 PASS
# End of table
SOME POSSIBLE REASONS MAY LEAD TO THE FAILURE:
1. CELASI ELEMENTS CONNECTING TO ONLY ONE GRID POINT;
...
My code is:
import pandas as pd
tram_f06 = open('tram.txt', 'r')
g_set_word = 'KGG'
n_set_word = 'KNN'
f_set_word = 'KFF'
a_set_word = 'KAA'
index_tram = 0
with tram_f06 as in_file:
## First we get the lines of the tables we're interested in
for line in in_file:
if g_set_word in line:
start_g = index_tram + 4 # Le debut de la table se situe 4 lignes apres l'apparition du mot cle
stop_g = start_g + 6 # Il y a 6 lignes, une pour chaque degre de liberte
elif n_set_word in line:
start_n = index_tram + 4
stop_n = start_n + 6
elif f_set_word in line:
start_f = index_tram + 4
stop_f = start_f + 6
elif a_set_word in line:
start_a = index_tram + 4
stop_a = start_a + 6
index_tram = index_tram + 1
in_file.seek(0)
## Then we extract those lines
gset_df = pd.read_csv(in_file, header = None, delim_whitespace=True, skiprows = start_g, skipfooter = index_tram - stop_g, engine = 'python')
nset_df = pd.read_csv(in_file, header = None, delim_whitespace=True, skiprows = start_n, skipfooter = index_tram - stop_n, engine = 'python')
fset_df = pd.read_csv(in_file, header = None, delim_whitespace=True, skiprows = start_f, skipfooter = index_tram - stop_f, engine = 'python')
aset_df = pd.read_csv(in_file, header = None, delim_whitespace=True, skiprows = start_a, skipfooter = index_tram - stop_a, engine = 'python')
I get the indices of the lines I want to extract (start_X/ stop_X variables), but I can’t get the dataframes.
With or without the in_file.seek(0)
line, I get the error:
EmptyDataError: No columns to parse from file
Would anyone know how to solve this?
Thank you in advance!
Solution 1:[1]
Firstly, declare your filename inside testdata
as a string, and make sure it is either in the local directory, or that you have the correct filepath.
import pandas as pd
testdata = pd.read_csv("filename.csv", header=None, delim_whitespace=True)
If that does not work, post some information about the environment you are using.
Solution 2:[2]
First, you probably don’t need header=None as you seem to have headers in the file.
Also try removing the blank line between the headers and the first line of data.
Check and double check your file name.