Проверка csv файла на ошибки

Check whether the CSV file is valid or not, this tool also give hint about where the location of the error for easier debugging.

CSV Validator & Linter

What is CSV ?

Comma-separated values (CSV) file is a delimited text file that uses a comma or other delimiter to separate values.

This tool support custom delimiter
This tool also give information about the error if there are any for debugging purpose (linting)

Sample of Valid CSV Text :

cat, dog, horse
house, car, cycle

About

Toolkit Bay or TKB is an online tools website providing free and easy to use tools to increase productivity.

If you have any inquiries or suggestions or issues, you can contact us on:

contact@toolkitbay.com

Data & Privacy

We respect your data. Uploaded file/data/input will be automatically deleted. And the processed data will be deleted less than a day.

More detail on privacy here

Copyright © 2021 Toolkit Bay. All Rights Reserved

I wrote an open source Python tool to simplify validation of such files available from http://pypi.python.org/pypi/cutplace/.

The basic idea is that you describe the data format in a structured interface specification using OpenOffice.org, Excel or plain CSV. This is done in a few minutes and legible enough to serve as documentation too. We use it to validate files with about 200.000 rows on a daily base.

You can validate a CSV file using the command line:

cutplace specification.csv data.csv

In case invalid data rows are found, the exit code is 1. If you need more control, you can write a little Python script that imports the cutplace module and adds a listener for validation events.

As example, here’s a specification that would validate the sample data you provided, filling the gaps of your short description by making a few assumptions. (I’m writing the specification in CSV to inline it in this post. In practice I prefer OpenOffice.org’s Calc and ODS because I can use more formating and make it easier to read and maintain.)

,"Interface: Show statistics"
,
,"Data format"
"D","Format","CSV"
"D","Item delimiter",";"
"D","Header","1"
"D","Encoding","ASCII"
,
,"Fields"
,"Name","Example","Empty","Length","Type","Rule"
"F","date","15-Mar-10",,,"RegEx","\d\d-[A-Z][a-z][a-z]-\d\d"
"F","id","231",,,"Integer","0:"
"F","shown","345",,,"Integer","0:"
,
,"Checks"
,"Description","Type","Rule"
"C","id per date must be unique","IsUnique","date, id"

Lines starting with «D» describe the basic data format. In this case it is a CSV file using «;» as delimiter with 1 header line in ASCII encoding.

Lines starting with «F» describe the various fields. For example,

,"Name","Example","Empty","Length","Type","Rule"
"F","id","231",,,"Integer","0:"

defines a mandatory field «id» of type Integer with a value of 0 or greater. To allow the field to be empty, specify an «X» in the «Empty» column:

,"Name","Example","Empty","Length","Type","Rule"
"F","id","231","X",,"Integer","0:"

Finally there is an optional section to contain more advances checks spawning the whole file, not only single rows. For example, if each date in your file must provide date for an id only once, you can state this using:

,"Description","Type","Rule"
"C","id per date must be unique","IsUnique","date, id"

Any row that starts with an empty column can contain any text you like and will not be processed during validation. This is useful for headings, comments and so on.

Are there any good sites/services to validate consistency of CSV file ?

The same as W3C validator but for CSV ?

Thanks!

asked Jul 18, 2011 at 20:27

Scherbius.com's user avatar

Scherbius.comScherbius.com

3,3964 gold badges24 silver badges44 bronze badges

2

The Open Data Institute is developing a CSV validation service that will allow users to check the structure of their data as well as validate it against a simple schema.

The service is still very much in alpha but can be found here:

http://csvlint.io/

The code for the application and the underlying library are both open source:

https://github.com/theodi/csvlint

https://github.com/theodi/csvlint.rb

The README in the library provides a summary of the errors and warnings that can be generated. The following types of error can be reported:

  • :wrong_content_type — content type is not text/csv
  • :ragged_rows — row has a different number of columns (than the first row in the file)
  • :blank_rows — completely empty row, e.g. blank line or a line where all column values are empty
  • :invalid_encoding — encoding error when parsing row, e.g. because of invalid characters
  • :not_found — HTTP 404 error when retrieving the data
  • :quoting — problem with quoting, e.g. missing or stray quote, unclosed quoted field
  • :whitespace — a quoted column has leading or trailing whitespace

The following types of warning can be reported:

  • :no_encoding — the Content-Type header returned in the HTTP request does not have a charset parameter
  • :encoding — the character set is not UTF-8
  • :no_content_type — file is being served without a Content-Type header
  • :excel — no Content-Type header and the file extension is .xls
  • :check_options — CSV file appears to contain only a single column
  • :inconsistent_values — inconsistent values in the same column. Reported if <90% of values seem to have same data type (either numeric or alphanumeric including punctuation)

answered Feb 11, 2014 at 17:55

ldodds's user avatar

ldoddsldodds

2492 silver badges4 bronze badges

1

The National Archives developed a CSV Schema Language and CSV Validator, software written in Java. It’s open source.

answered Aug 7, 2016 at 12:05

Milos's user avatar

MilosMilos

1923 silver badges11 bronze badges

To validate a CSV file I use the RAINBOW CSV extension in Visual Studio Code and also I open the CSV file in Excel.

answered Feb 15, 2018 at 16:18

mruanova's user avatar

mruanovamruanova

6,3516 gold badges37 silver badges55 bronze badges

There is a great way to validate your CSV file.I am referring to this article, where the whole process is explained in tiniest details.

The validation process has two steps: the first one is to post the file to the API. Once your file is accepted,the API returns a polling endpoint that contains the results of the validation process.10 MB limit per file.

answered Feb 5, 2020 at 23:45

monkrus's user avatar

monkrusmonkrus

1,47024 silver badges23 bronze badges

CSV Lint at csvlint.com (not .io :) is a service we’re building to solve this problem. It checks CSV files against user-defined validation rules / schemas cell by cell.

We spent a lot of time tweaking the UI to allow users to create complex validation rules / schemas easily that meet their business needs without a single line of code.

Our offline validation feature allows users to see the results in-realtime even when validating multiple large size (with millions+ rows) files, and most importantly it 100% protects user data privacy.

answered Jun 17, 2018 at 6:57

Joe's user avatar

JoeJoe

2791 gold badge4 silver badges15 bronze badges

1

CSV File Validator Twitter URL

MIT Licence
codecov
Build Status
Known Vulnerabilities
npm version

Validation of CSV file against user defined schema (returns back object with data and invalid messages)

Getting csv-file-validator

npm

npm install --save csv-file-validator

yarn

yarn add csv-file-validator --save

Example

import CSVFileValidator from 'csv-file-validator'

CSVFileValidator(file, config)
    .then(csvData => {
        csvData.data // Array of objects from file
        csvData.inValidData // Array of error messages
    })
    .catch(err => {})

Please see Demo for more details /demo/index.html

API

CSVFileValidator(file, config)

returns the Promise

file

Type: File

.csv file

config

Type: Object

Config object should contain:
headers — Type: Array, row header (title) objects
isHeaderNameOptional — Type: Boolean, skip headers name if it is empty
isColumnIndexAlphabetic — Type: Boolean, convert numeric column index to alphabetic letter
parserConfig — Type: Object, optional, papaparse options.
Default options, which can’t be overridden: skipEmptyLines, complete and error

const config = {
    headers: [], // required
    isHeaderNameOptional: false, // default (optional)
    isColumnIndexAlphabetic: false // default (optional)
}

name

Type: String
name of the row header (title)

inputName

Type: String
key name which will be return with value in a column

optional

Type: Boolean

Makes column optional. If true column value will be return

headerError

Type: Function

If a header name is omitted or is not the same as in config name headerError function will be called with arguments
headerValue, headerName, rowNumber, columnNumber

required

Type: Boolean

If required is true then a column value will be checked if it is not empty

requiredError

Type: Function

If value is empty requiredError function will be called with arguments
headerName, rowNumber, columnNumber

unique

Type: Boolean

If it is true all header (title) column values will be checked for uniqueness

uniqueError

Type: Function

If one of the header value is not unique uniqueError function will be called with argument headerName, rowNumber

validate

Type: Function

Validate column value. As an argument column value will be passed
For e.g.

/**
 * @param {String} email
 * @return {Boolean}
 */
function(email) {
    return isEmailValid(email);
}

validateError

Type: Function

If validate returns false validateError function will be called with arguments headerName, rowNumber, columnNumber

dependentValidate

Type: Function

Validate column value that depends on other values in other columns.
As an argument column value and row will be passed.
For e.g.

/**
 * @param {String} email
 * @param {Array<string>} row
 * @return {Boolean}
 */
function(email, row) {
    return isEmailDependsOnSomeDataInRow(email, row);
}

isArray

Type: Boolean

If column contains list of values separated by comma in return object it will be as an array

Config example

const config = {
    headers: [
        {
            name: 'First Name',
            inputName: 'firstName',
            required: true,
            requiredError: function (headerName, rowNumber, columnNumber) {
                return `${headerName} is required in the ${rowNumber} row / ${columnNumber} column`
            }
        },
        {
            name: 'Last Name',
            inputName: 'lastName',
            required: false
        },
        {
            name: 'Email',
            inputName: 'email',
            unique: true,
            uniqueError: function (headerName) {
                return `${headerName} is not unique`
            },
            validate: function(email) {
                return isEmailValid(email)
            },
            validateError: function (headerName, rowNumber, columnNumber) {
                return `${headerName} is not valid in the ${rowNumber} row / ${columnNumber} column`
            }
        },
        {
            name: 'Roles',
            inputName: 'roles',
            isArray: true
        },
        {
            name: 'Country',
            inputName: 'country',
            optional: true,
            dependentValidate: function(email, row) {
                return isEmailDependsOnSomeDataInRow(email, row);
            }
        }
    ]
}

Contributing

Any contributions you make are greatly appreciated.

Please read the Contributions Guidelines before submitting a PR.

License

MIT © Vasyl Stokolosa

Number of input fields different from number of schema fields

Sequence of input fields different from sequence of schema fields

Input fields that are not defined in the schema

Schema fields for which no input fields are defined

Names of input fields with different casing compared to schema fields

Multiple input fields defined with the same name

Multiple input fields mapped to the same schema field

Понравилась статья? Поделить с друзьями:
  • Проверить хтмл код на ошибки
  • Проверка crontab на ошибки
  • Проверить фразу на ошибки
  • Проверка ccd на ошибки
  • Проверить флешку на вирусы и ошибки