Psql игнорировать ошибки

I am using psql with a PostgreSQL database and the following copy command:

\COPY isa (np1, np2, sentence) FROM 'c:\Downloads\isa.txt' WITH DELIMITER '|'

I get:

ERROR:  extra data after last expected column

How can I skip the lines with errors?

Braiam's user avatar

asked Apr 14, 2016 at 21:30

Superdooperhero's user avatar

SuperdooperheroSuperdooperhero

7,62419 gold badges85 silver badges139 bronze badges

You cannot skip the errors without skipping the whole command up to and including Postgres 14. There is currently no more sophisticated error handling.

\copy is just a wrapper around SQL COPY that channels results through psql. The manual for COPY:

COPY stops operation at the first error. This should not lead to problems in the event of a COPY TO, but the target table will
already have received earlier rows in a COPY FROM. These rows will
not be visible or accessible, but they still occupy disk space. This
might amount to a considerable amount of wasted disk space if the
failure happened well into a large copy operation. You might wish to
invoke VACUUM to recover the wasted space.

Bold emphasis mine. And:

COPY FROM will raise an error if any line of the input file contains
more or fewer columns than are expected.

COPY is an extremely fast way to import / export data. Sophisticated checks and error handling would slow it down.

There was an attempt to add error logging to COPY in Postgres 9.0 but it was never committed.

Solution

Fix your input file instead.

If you have one or more additional columns in your input file and the file is otherwise consistent, you might add dummy columns to your table isa and drop those afterwards. Or (cleaner with production tables) import to a temporary staging table and INSERT selected columns (or expressions) to your target table isa from there.

Related answers with detailed instructions:

  • How to update selected rows with values from a CSV file in Postgres?
  • COPY command: copy only specific columns from csv

answered Apr 14, 2016 at 22:46

Erwin Brandstetter's user avatar

Erwin BrandstetterErwin Brandstetter

608k145 gold badges1083 silver badges1232 bronze badges

5

It is too bad that in 25 years Postgres doesn’t have -ignore-errors flag or option for COPY command. In this era of BigData you get a lot of dirty records and it can be very costly for the project to fix every outlier.

I had to make a work-around this way:

  1. Copy the original table and call it dummy_original_table
  2. in the original table, create a trigger like this:
    CREATE OR REPLACE FUNCTION on_insert_in_original_table() RETURNS trigger AS  $$  
    DECLARE
        v_rec   RECORD;
    BEGIN
        -- we use the trigger to prevent 'duplicate index' error by returning NULL on duplicates
        SELECT * FROM original_table WHERE primary_key=NEW.primary_key INTO v_rec;
        IF v_rec IS NOT NULL THEN
            RETURN NULL;
        END IF; 
        BEGIN 
            INSERT INTO original_table(datum,primary_key) VALUES(NEW.datum,NEW.primary_key)
                ON CONFLICT DO NOTHING;
        EXCEPTION
            WHEN OTHERS THEN
                NULL;
        END;
        RETURN NULL;
    END;
  1. Run a copy into the dummy table. No record will be inserted there, but all of them will be inserted in the original_table

psql dbname -c \copy dummy_original_table(datum,primary_key) FROM '/home/user/data.csv' delimiter E'\t'

Alon Barad's user avatar

Alon Barad

1,5311 gold badge14 silver badges26 bronze badges

answered Dec 30, 2020 at 22:59

Nulik's user avatar

Workaround: remove the reported errant line using sed and run \copy again

Later versions of Postgres (including Postgres 13), will report the line number of the error. You can then remove that line with sed and run \copy again, e.g.,

#!/bin/bash
bad_line_number=5  # assuming line 5 is the bad line
sed ${bad_line_number}d < input.csv > filtered.csv

[per the comment from @Botond_Balázs ]

answered Jan 12, 2021 at 0:24

Rob Bednark's user avatar

Rob BednarkRob Bednark

26.1k23 gold badges80 silver badges125 bronze badges

1

Here’s one solution — import the batch file one line at a time. The performance can be much slower, but it may be sufficient for your scenario:

#!/bin/bash

input_file=./my_input.csv
tmp_file=/tmp/one-line.csv
cat $input_file | while read input_line; do
    echo "$input_line" > $tmp_file
    psql my_database \
     -c "\
     COPY my_table \
     FROM `$tmp_file` \
     DELIMITER '|'\
     CSV;\
    "
done

Additionally, you could modify the script to capture the psql stdout/stderr and exit
status, and if the exit status is non-zero, echo $input_line and the captured stdout/stderr to stdin and/or append it to a file.

answered Jan 11, 2021 at 23:20

Rob Bednark's user avatar

Rob BednarkRob Bednark

26.1k23 gold badges80 silver badges125 bronze badges

The \copy meta-command in PostgreSQL is used to import or export data between a file and a database table. During the process, errors can occur, such as when a file contains invalid data or if a table structure doesn’t match the data being imported. In such cases, the \copy operation will terminate and raise an error, potentially leaving the database in an inconsistent state. To prevent this, it is necessary to find a way to ignore errors during the \copy operation.

Method 1: Use the «NULL AS» clause

To ignore errors with psql \copy meta-command in PostgreSQL, you can use the «NULL AS» clause. This clause allows you to specify a value to use when a column in your data file is missing or contains an error.

Here’s an example of how to use the «NULL AS» clause with the psql \copy meta-command:

\copy mytable FROM 'data.csv' WITH (FORMAT CSV, NULL AS 'NULL');

In this example, we’re copying data from a CSV file into a table called «mytable». The «FORMAT CSV» option tells PostgreSQL to expect a CSV file, and the «NULL AS ‘NULL'» option tells it to use the string «NULL» when a column in the data file is missing or contains an error.

You can also use the «NULL AS» clause with other file formats, such as TSV or text files. Here’s an example:

\copy mytable FROM 'data.txt' WITH (FORMAT TEXT, NULL AS '');

In this example, we’re copying data from a text file into the «mytable» table. The «FORMAT TEXT» option tells PostgreSQL to expect a text file, and the «NULL AS »» option tells it to use an empty string when a column in the data file is missing or contains an error.

By using the «NULL AS» clause with the psql \copy meta-command, you can easily ignore errors in your data files and continue with the import process.

Method 2: Use a Staging Table

To ignore errors with psql \copy meta-command in Postgresql, you can use a staging table. Here are the steps to do it:

  1. Create a staging table with the same structure as the target table, but with less strict constraints:
CREATE TABLE my_table_staging (
  id SERIAL PRIMARY KEY,
  name TEXT,
  email TEXT,
  CONSTRAINT email_format CHECK (email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}$')
);
  1. Use the \copy meta-command to copy the data from the source file to the staging table:
\copy my_table_staging(name, email) FROM 'path/to/source/file.csv' WITH (FORMAT csv, HEADER true);
  1. Use a SQL query to insert the data from the staging table to the target table, ignoring any errors:
INSERT INTO my_table (name, email)
SELECT name, email FROM my_table_staging
ON CONFLICT DO NOTHING;

In this query, the ON CONFLICT DO NOTHING clause tells Postgresql to ignore any conflicts that occur during the insertion process.

  1. Optionally, you can drop the staging table once the data has been successfully inserted into the target table:
DROP TABLE my_table_staging;

By using a staging table, you can copy the data from the source file to the database without worrying about errors, and then insert the data into the target table while ignoring any conflicts. This approach can be useful when dealing with large datasets or when the source data is not perfectly clean.

Method 3: Use the «ON CONFLICT» clause with a DO NOTHING statement

To ignore errors with psql \copy meta-command in Postgresql, you can use the «ON CONFLICT» clause with a DO NOTHING statement.

Here’s an example code:

CREATE TABLE example_table (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL
);

\copy example_table FROM 'example.csv' WITH (FORMAT csv);

INSERT INTO example_table (id, name)
SELECT id, name
FROM example_table
ON CONFLICT (id) DO NOTHING;

In this example, we create a table called «example_table» with two columns: «id» and «name». Then, we use the \copy meta-command to import data from a CSV file into the table.

Finally, we use the INSERT statement with the ON CONFLICT clause to insert data into the table. The ON CONFLICT clause specifies that if there is a conflict on the «id» column (i.e., if the value already exists in the table), then we should do nothing.

This ensures that any errors that occur during the import process are ignored, and the data that can be inserted will be inserted.

History

Error logging in COPY was a proposed feature developed by Aster Data against the PostgreSQL 9.0 code base. It was submitted and reviewed (1) but not accepted into the core product for that or any other version so far.

Overview

The purpose of error logging in COPY is to prevent the backend from erroring out if a malformed tuple is encountered during a COPY operation. Bad tuples can either be skipped or logged into an error logging table.

The format of the error logging table is as follows:

 CREATE TABLE error_logging_table(
   tupletimestamp TIMESTAMP WITH TIME ZONE,
   targettable    VARCHAR,
   dmltype        CHAR(1),
   errmessage     VARCHAR,
   sqlerrcode     CHAR(5),
   label          VARCHAR,
   key            BIGINT,
   rawdata        BYTEA
 );

The COPY command returns the number of successfully copied tuples only.

COPY options

Error logging is set by adding options to the COPY command. Here is the list of the available options:

Variable name Description Default value
ERROR_LOGGING Enables error handling for COPY commands (when set to true). true
ERROR_LOGGING_SKIP_BAD_ROWS Enables the ability to skip malformed tuples that are encountered in COPY commands (when set to true). true
ERROR_LOGGING_MAX_ERRORS Maximum number of bad rows to log before stopping the COPY operation (0 means unlimited). 0
ERROR_LOGGING_SCHEMA_NAME Schema name of the table where malformed tuples are inserted by the error logging module ‘public’
ERROR_LOGGING_TABLE_NAME Relation name where malformed tuples are inserted by the error logging module. The table is automatically created if it does not exist. ‘error_table’
ERROR_LOGGING_LABEL Optional label that is used to identify malformed tuples COPY command text
ERROR_LOGGING_KEY Optional key to identify malformed tuples Index of the tuple in the COPY stream

Bad tuples can be rejected for a number of reasons (extra or missing column, constraint violation, …). The error table tries to capture as much context as possible about the error. If the table does not exist it is created automatically. The format of the error logging table is as follows:

 CREATE TABLE error_logging_table(
   tupletimestamp TIMESTAMP WITH TIME ZONE,
   targettable    VARCHAR,
   dmltype        CHAR(1),
   errmessage     VARCHAR,
   sqlerrcode     CHAR(5),
   label          VARCHAR,
   key            BIGINT,
   rawdata        BYTEA
 );

tupletimestamp stores the time at which the error occured. targettable describes the table in which the row was inserted when the error occured. The exact error message and sql error code are recorded in errmessage and sqlerrcode, respectively. The original data of the row can be found in rawdata.

Example

 CREATE TEMP TABLE foo (a bigint, b text);

— input_file.txt —

 1	one
 2	
 3	three	111
 four    4
 5	five

— end of input_file.txt —

error logging off

 COPY foo FROM 'input_file.txt';
 ERROR:  missing data for column "b"
 CONTEXT:  COPY foo, line 2: "2"

skip bad rows

 --skip bad rows
 COPY foo FROM 'input_file.txt' (ERROR_LOGGING, ERROR_LOGGING_SKIP_BAD_ROWS);
 SELECT * from foo;
  a |  b   
 ---+------
  1 | one
  5 | five
 (2 rows)

turn error logging on (default logs in error_logging_table)

 --turn error logging on (default logs in error_logging_table)
 COPY foo FROM 'input_file.txt' (ERROR_LOGGING);
 SELECT * from foo;
  a |  b   
 ---+------
  1 | one
  5 | five
 (2 rows)
 SELECT * FROM error_logging_table;
  key |           tupletimestamp            |              label              |  targettable  | dmltype |                errmessage                | sqlerrcode |         rawdata          
 -----+-------------------------------------+---------------------------------+---------------+---------+------------------------------------------+------------+--------------------------
    2 | Thu Sep 10 07:09:17.869521 2009 PDT | COPY foo FROM 'input_file.txt'; | pg_temp_2.foo | C       | missing data for column "b"              | 22P04      | \x32
    3 | Thu Sep 10 07:09:17.86953 2009 PDT  | COPY foo FROM 'input_file.txt'; | pg_temp_2.foo | C       | extra data after last expected column    | 22P04      | \x3309746872656509313131
    4 | Thu Sep 10 07:09:17.869538 2009 PDT | COPY foo FROM 'input_file.txt'; | pg_temp_2.foo | C       | invalid input syntax for integer: "four" | 22P02      | \x666f75720934
 (3 rows)

Redirect to another table with a specific label

 -- Redirect to another table with a specific label
 COPY foo FROM 'input_file.txt' (ERROR_LOGGING, ERROR_LOGGING_SCHEMA_NAME 'error', ERROR_LOGGING_TABLE_NAME 'table1', ERROR_LOGGING_LABEL 'batch1');
 SELECT * FROM error.table1;
  key |           tupletimestamp            | label  |  targettable  | dmltype |                errmessage                | sqlerrcode |         rawdata          
 -----+-------------------------------------+--------+---------------+---------+------------------------------------------+------------+--------------------------
    2 | Thu Sep 10 07:09:17.869521 2009 PDT | batch1 | pg_temp_2.foo | C       | missing data for column "b"              | 22P04      | \x32
    3 | Thu Sep 10 07:09:17.86953 2009 PDT  | batch1 | pg_temp_2.foo | C       | extra data after last expected column    | 22P04      | \x3309746872656509313131
    4 | Thu Sep 10 07:09:17.869538 2009 PDT | batch1 | pg_temp_2.foo | C       | invalid input syntax for integer: "four" | 22P02      | \x666f75720934
 (3 rows)

Limit to 2 bad rows:

 -- Limit to 2 bad rows:  
 COPY foo FROM 'input_file.txt' (ERROR_LOGGING, ERROR_LOGGING_MAX_ERRORS 2);
 ERROR:  invalid input syntax for integer: "four"
 CONTEXT:  COPY foo, line 4, column a: "four"
 SELECT count(*) from error_logging_table;
  count 
  -------
       0
  (1 row)

It is too bad that in 25 years Postgres doesn’t have -ignore-errors flag or option for COPY command. In this era of BigData you get a lot of dirty records and it can be very costly for the project to fix every outlier.

I had to make a work-around this way:

  1. Copy the original table and call it dummy_original_table
  2. in the original table, create a trigger like this:
    CREATE OR REPLACE FUNCTION on_insert_in_original_table() RETURNS trigger AS  $$      DECLARE        v_rec   RECORD;    BEGIN        -- we use the trigger to prevent 'duplicate index' error by returning NULL on duplicates        SELECT * FROM original_table WHERE primary_key=NEW.primary_key INTO v_rec;        IF v_rec IS NOT NULL THEN            RETURN NULL;        END IF;         BEGIN             INSERT INTO original_table(datum,primary_key) VALUES(NEW.datum,NEW.primary_key)                ON CONFLICT DO NOTHING;        EXCEPTION            WHEN OTHERS THEN                NULL;        END;        RETURN NULL;    END;
  1. Run a copy into the dummy table. No record will be inserted there, but all of them will be inserted in the original_table

psql dbname -c \copy dummy_original_table(datum,primary_key) FROM '/home/user/data.csv' delimiter E'\t'


I have a table stored all my project rss channel url, now I found some url end with ‘/’ but some sub url are not. I my app I have to handle this situation in everywhere. Then I want to store all the sub url link without the last ‘/’, if the url end with ‘/’, I want to delete the end of ‘/’. I have write the update sql command like this:

UPDATE rss_sub_source 
SET sub_url = SUBSTRING(sub_url, 1, CHAR_LENGTH(sub_url) - 1) 
WHERE sub_url LIKE '%/';

when I execute the sql:

SQL Error [23505]: ERROR: duplicate key value violates unique constraint "unique_sub_url"
  Detail: Key (sub_url)=(https://physicsworld.com/feed) already exists.

the error shows that some url without ‘/’ have already exists. when I update wht end with ‘/’ url, it will conflict with the exists one because I add an uniq constraint. There table contains thousands of url, update one by one obviously impossible. So I want to ignore and jump to update the url if it did not obey the uniq constraint, only update the success record. Finnaly delete the end with ‘/’ record.

Is it possible to ignore the update error events in PostgreSQL? if not what should I do to make all rss url did not end with ‘/’?

Понравилась статья? Поделить с друзьями:
  • Psp произошла внутренняя ошибка 80410a0b
  • Python baseexception вывод ошибки
  • Psn ошибка 80710092
  • Psi ошибка аутентификации
  • Pytest обработка ошибок