Pickle data was truncated ошибка - TopOshibok.ru - решение и исправление самых разных ошибок

I can not send my numpy array in socket. I use pickle but my client pickle crashes with this error: pickle data was truncated

My server :
I create a numpy array and I want to send in my client with pickle (it’s work)

import socket, pickle
import numpy as np
from PIL import ImageGrab
import cv2


while(True):
    HOST = 'localhost'
    PORT = 50007
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 4096)
    s.bind((HOST, PORT))
    s.listen(1)
    conn, addr = s.accept()
    print ('Connected by', addr)

    arr = np.array([[0, 1], [2, 3]])
    printscreen_pil=ImageGrab.grab(bbox=(10,10,500,500))
    img = np.array(printscreen_pil) ## Transform to Array
    
    data_string = pickle.dumps(img)
    conn.send(data_string)

    msg_recu = conn.recv(4096)
    print(msg_recu.decode())

    conn.close()

My client
He has my numpy array, but I can not load with pickle. I have this error.

import socket, pickle
import numpy as np

HOST = 'localhost'
PORT = 50007
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))

msg_a_envoyer = "hello".encode()
s.send(msg_a_envoyer)


while 1:
    data = s.recv(4096)
    if not data: break
    data_arr = pickle.loads(data)
    print (data_arr)
s.close()

Источник

🐛 Bug

I am trying to load my monodatasets for XLM and am stumped with this pickle data issue. I have attempted to pass various arguments including expression ascii, latin1 and utf-8
data = torch.load(path) File "\lib\site-packages\torch\serialization.py", line 358, in load return _load(f, map_location, pickle_module) File "\lib\site-packages\torch\serialization.py", line 532, in _load magic_number = pickle_module.load(f) _pickle.UnpicklingError: pickle data was truncated

To Reproduce

I am working with 0.4 pytorch on the recent (translation model)[https://github.com/facebookresearch/XLM/]

Environment

PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro
GCC version: Could not collect
CMake version: Could not collect

Python version: 3.5
Is CUDA available: N/A
CUDA runtime version: 9.0
GPU models and configuration: GPU 0: GeForce GTX 960M
Nvidia driver version: 419.35
cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin\cudnn64_7.dll

Versions of relevant libraries:
[pip] numpy==1.16.2
[pip] torch==0.4.1
[conda] blas 1.0 mkl
[conda] cuda90 1.0 0 pytorch
[conda] mkl 2019.1 144
[conda] mkl 2019.0
[conda] mkl_fft 1.0.10 py36h14836fe_0
[conda] mkl_random 1.0.2 py36h343c172_0
[conda] pytorch 0.4.1 py36_cuda90_cudnn7he774522_1 pytorch

Additional context

Источник

PyTorch Forums

Источник

Problem Description:

I can not send my numpy array in socket. I use pickle but my client pickle crashes with this error: pickle data was truncated

My server :
I create a numpy array and I want to send in my client with pickle (it’s work)

import socket, pickle
import numpy as np
from PIL import ImageGrab
import cv2


while(True):
    HOST = 'localhost'
    PORT = 50007
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 4096)
    s.bind((HOST, PORT))
    s.listen(1)
    conn, addr = s.accept()
    print ('Connected by', addr)

    arr = np.array([[0, 1], [2, 3]])
    printscreen_pil=ImageGrab.grab(bbox=(10,10,500,500))
    img = np.array(printscreen_pil) ## Transform to Array
    
    data_string = pickle.dumps(img)
    conn.send(data_string)

    msg_recu = conn.recv(4096)
    print(msg_recu.decode())

    conn.close()

My client
He has my numpy array, but I can not load with pickle. I have this error.

import socket, pickle
import numpy as np

HOST = 'localhost'
PORT = 50007
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))

msg_a_envoyer = "hello".encode()
s.send(msg_a_envoyer)


while 1:
    data = s.recv(4096)
    if not data: break
    data_arr = pickle.loads(data)
    print (data_arr)
s.close()

Solution – 1

the problem is that if the size of the pickled data is > 4096 you only get the first part of the pickled data (hence the pickle data was truncated message you’re getting)

You have to append the data and pickle it only when the reception is complete, for example like this:

data = b""
while True:
    packet = s.recv(4096)
    if not packet: break
    data += packet

data_arr = pickle.loads(data)
print (data_arr)
s.close()

increasing a bytes object is not very performant, would be better to store the parts in a list of objects, then join, though. Faster variant:

data = []
while True:
    packet = s.recv(4096)
    if not packet: break
    data.append(packet)
data_arr = pickle.loads(b"".join(data))
print (data_arr)
s.close()

Solution – 2

In simple words, the file you are trying to load is not complete. Either you have not downloaded it correctly or it’s just that your pickle file is corrupt. You can create a new pickle to solve this issue

Источник

published on Monday, December 21, 2020

You are probably aware that pickle.load can execute arbitrary code and
must not be used for untrusted data.

This post is not about that.

In fact, pickle.load can’t even really be trusted for trusted data. To
demonstrate the issue, consider this simple program:

import os, pickle, threading

message = '0' * 65537

recv_fd, send_fd = os.pipe()
read_end = os.fdopen(recv_fd, 'rb', 0)
write_end = os.fdopen(send_fd, 'wb', 0)

thread = threading.Thread(
    target=pickle.dump,
    args=(message, write_end, -1))
thread.start()

pickle.load(read_end)

This simply transmits a pickled message over a pipe over a pipe.
Looks innocuous enough, right?

Wrong! The program fails with the following traceback every time:

Traceback (most recent call last):
  File "<...>/example.py", line 16, in <module>
    pickle.load(read_end)
_pickle.UnpicklingError: pickle data was truncated

Worse: once you get this error, there is safe way to resume listening for
messages on this channel, because you don’t know how long the first message
really was, and hence, at which offset to resume reading. If you try this, you
invite evil into your home. A typical result of trying to continue reading
messages on the stream may be _pickle.UnpicklingError: unpickling stack
underflow, but I’ve even seen segfaults occur.

The reason that we get the error in the first place is of course that the
message size above the pipe capacity, which is 65,536 on my system. The
threshold at which you start getting errors may of course be different for
you. Try increasing the message size if you don’t see errors at first.

If you are using a channel other than os.pipe(), you might be safe – but I
can’t give any guarantees on that. I just can say that I wasn’t able to
reproduce the error on my system when exchanging the pipe for a socket or
regular file.

We used a thread here to send us the data, but it doesn’t matter if the remote
end is a thread or another process. Also, this is not limited to a specific
python version, or version of the pickle protocol. I could reproduce the same
error with several python versions up to python 3.9, and protocols 1-5.

Workaround

So, how to fix that?

The problem empirically seems to disappear when changing the buffering policy
of the reading end, i.e. by not disabling input buffering:

- read_end = os.fdopen(recv_fd, 'rb', 0)
+ read_end = os.fdopen(recv_fd, 'rb')

I haven’t inspected the source of the pickle module, so I can’t vouch that
this is reliable.

What I turned out doing is to use the pickle.dumps()/pickle.loads()
combination to serialize to/from a bytes object, and manually transmit this
data along with its size over the channel. This has some overhead, but still
performs fine for my use-case:

import pickle
from struct import Struct

HEADER = Struct("!L")

def send(obj, file):
    """Send a pickled message over the given channel."""
    payload = pickle.dumps(data, -1)
    file.write(HEADER.pack(len(payload)))
    file.write(payload)

def recv(file):
    """Receive a pickled message over the given channel."""
    header = read_file(file, HEADER.size)
    payload = read_file(file, *HEADER.unpack(header))
    return pickle.loads(payload)

def read_file(file, size):
    """Read a fixed size buffer from the file."""
    parts = []
    while size > 0:
        part = file.read(size)
        if not part:
            raise EOFError
        parts.append(part)
        size -= len(part)
    return b''.join(parts)

Technically, transmitting the size is redundant with information contained in
the pickle protocol. However, where excessive performance is not an issue
(remember: we are using python, after all), I prefer transmitting the size
explicitly anyway. This evades the complexity of manually interacting with the
pickled frames, avoids dependency on a specific pickle protocol, and would
also make it easy to exchange pickle for any other serialization format here.

Conclusion

Be careful with using pickle.dump + pickle.load for RPC. It may result
in an UnpicklingError from which there seems to be no safe way of recovery
that allows to continue transmitting further messages on the same channel.
This occurs when the message size exceeds a certain threshold.

To avoid this issue, make sure that the channel capacity and buffering policy
works with pickle.load. Alternatively, consider using pickle.dumps +
pickle.loads, and handling the channel layer manually instead.

This entry was tagged

bug,
coding and
python

Источник

🐛 Bug

To Reproduce

Environment

Additional context

Solution – 1

Solution – 2

Workaround

Conclusion

Интересное по теме: