Requests python 404 ошибка

Look at the r.status_code attribute:

if r.status_code == 404:
    # A 404 was issued.

Demo:

>>> import requests
>>> r = requests.get('http://httpbin.org/status/404')
>>> r.status_code
404

If you want requests to raise an exception for error codes (4xx or 5xx), call r.raise_for_status():

>>> r = requests.get('http://httpbin.org/status/404')
>>> r.raise_for_status()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "requests/models.py", line 664, in raise_for_status
    raise http_error
requests.exceptions.HTTPError: 404 Client Error: NOT FOUND
>>> r = requests.get('http://httpbin.org/status/200')
>>> r.raise_for_status()
>>> # no exception raised.

You can also test the response object in a boolean context; if the status code is not an error code (4xx or 5xx), it is considered ‘true’:

if r:
    # successful response

If you want to be more explicit, use if r.ok:.

Источник

Eager to get started? This page gives a good introduction in how to get started
with Requests.

First, make sure that:

Requests is installed
Requests is up-to-date

Let’s get started with some simple examples.

Make a Request¶

Making a request with Requests is very simple.

Begin by importing the Requests module:

Now, let’s try to get a webpage. For this example, let’s get GitHub’s public
timeline:

>>> r = requests.get('https://api.github.com/events')

Now, we have a Response object called r. We can
get all the information we need from this object.

Requests’ simple API means that all forms of HTTP request are as obvious. For
example, this is how you make an HTTP POST request:

>>> r = requests.post('https://httpbin.org/post', data={'key': 'value'})

Nice, right? What about the other HTTP request types: PUT, DELETE, HEAD and
OPTIONS? These are all just as simple:

>>> r = requests.put('https://httpbin.org/put', data={'key': 'value'})
>>> r = requests.delete('https://httpbin.org/delete')
>>> r = requests.head('https://httpbin.org/get')
>>> r = requests.options('https://httpbin.org/get')

That’s all well and good, but it’s also only the start of what Requests can
do.

Passing Parameters In URLs¶

You often want to send some sort of data in the URL’s query string. If
you were constructing the URL by hand, this data would be given as key/value
pairs in the URL after a question mark, e.g. httpbin.org/get?key=val.
Requests allows you to provide these arguments as a dictionary of strings,
using the params keyword argument. As an example, if you wanted to pass
key1=value1 and key2=value2 to httpbin.org/get, you would use the
following code:

>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.get('https://httpbin.org/get', params=payload)

You can see that the URL has been correctly encoded by printing the URL:

>>> print(r.url)
https://httpbin.org/get?key2=value2&key1=value1

Note that any dictionary key whose value is None will not be added to the
URL’s query string.

You can also pass a list of items as a value:

>>> payload = {'key1': 'value1', 'key2': ['value2', 'value3']}

>>> r = requests.get('https://httpbin.org/get', params=payload)
>>> print(r.url)
https://httpbin.org/get?key1=value1&key2=value2&key2=value3

Response Content¶

We can read the content of the server’s response. Consider the GitHub timeline
again:

>>> import requests

>>> r = requests.get('https://api.github.com/events')
>>> r.text
'[{"repository":{"open_issues":0,"url":"https://github.com/...

Requests will automatically decode content from the server. Most unicode
charsets are seamlessly decoded.

When you make a request, Requests makes educated guesses about the encoding of
the response based on the HTTP headers. The text encoding guessed by Requests
is used when you access r.text. You can find out what encoding Requests is
using, and change it, using the r.encoding property:

>>> r.encoding
'utf-8'
>>> r.encoding = 'ISO-8859-1'

If you change the encoding, Requests will use the new value of r.encoding
whenever you call r.text. You might want to do this in any situation where
you can apply special logic to work out what the encoding of the content will
be. For example, HTML and XML have the ability to specify their encoding in
their body. In situations like this, you should use r.content to find the
encoding, and then set r.encoding. This will let you use r.text with
the correct encoding.

Requests will also use custom encodings in the event that you need them. If
you have created your own encoding and registered it with the codecs
module, you can simply use the codec name as the value of r.encoding and
Requests will handle the decoding for you.

Binary Response Content¶

You can also access the response body as bytes, for non-text requests:

>>> r.content
b'[{"repository":{"open_issues":0,"url":"https://github.com/...

The gzip and deflate transfer-encodings are automatically decoded for you.

The br transfer-encoding is automatically decoded for you if a Brotli library
like brotli or brotlicffi is installed.

For example, to create an image from binary data returned by a request, you can
use the following code:

>>> from PIL import Image
>>> from io import BytesIO

>>> i = Image.open(BytesIO(r.content))

JSON Response Content¶

There’s also a builtin JSON decoder, in case you’re dealing with JSON data:

>>> import requests

>>> r = requests.get('https://api.github.com/events')
>>> r.json()
[{'repository': {'open_issues': 0, 'url': 'https://github.com/...

In case the JSON decoding fails, r.json() raises an exception. For example, if
the response gets a 204 (No Content), or if the response contains invalid JSON,
attempting r.json() raises requests.exceptions.JSONDecodeError. This wrapper exception
provides interoperability for multiple exceptions that may be thrown by different
python versions and json serialization libraries.

It should be noted that the success of the call to r.json() does not
indicate the success of the response. Some servers may return a JSON object in a
failed response (e.g. error details with HTTP 500). Such JSON will be decoded
and returned. To check that a request is successful, use
r.raise_for_status() or check r.status_code is what you expect.

Raw Response Content¶

In the rare case that you’d like to get the raw socket response from the
server, you can access r.raw. If you want to do this, make sure you set
stream=True in your initial request. Once you do, you can do this:

>>> r = requests.get('https://api.github.com/events', stream=True)

>>> r.raw
<urllib3.response.HTTPResponse object at 0x101194810>

>>> r.raw.read(10)
b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'

In general, however, you should use a pattern like this to save what is being
streamed to a file:

with open(filename, 'wb') as fd:
    for chunk in r.iter_content(chunk_size=128):
        fd.write(chunk)

Using Response.iter_content will handle a lot of what you would otherwise
have to handle when using Response.raw directly. When streaming a
download, the above is the preferred and recommended way to retrieve the
content. Note that chunk_size can be freely adjusted to a number that
may better fit your use cases.

Note

An important note about using Response.iter_content versus Response.raw.
Response.iter_content will automatically decode the gzip and deflate
transfer-encodings. Response.raw is a raw stream of bytes – it does not
transform the response content. If you really need access to the bytes as they
were returned, use Response.raw.

POST a Multipart-Encoded File¶

Requests makes it simple to upload Multipart-encoded files:

>>> url = 'https://httpbin.org/post'
>>> files = {'file': open('report.xls', 'rb')}

>>> r = requests.post(url, files=files)
>>> r.text
{
  ...
  "files": {
    "file": "<censored...binary...data>"
  },
  ...
}

You can set the filename, content_type and headers explicitly:

>>> url = 'https://httpbin.org/post'
>>> files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}

>>> r = requests.post(url, files=files)
>>> r.text
{
  ...
  "files": {
    "file": "<censored...binary...data>"
  },
  ...
}

If you want, you can send strings to be received as files:

>>> url = 'https://httpbin.org/post'
>>> files = {'file': ('report.csv', 'some,data,to,send\nanother,row,to,send\n')}

>>> r = requests.post(url, files=files)
>>> r.text
{
  ...
  "files": {
    "file": "some,data,to,send\\nanother,row,to,send\\n"
  },
  ...
}

In the event you are posting a very large file as a multipart/form-data
request, you may want to stream the request. By default, requests does not
support this, but there is a separate package which does —
requests-toolbelt. You should read the toolbelt’s documentation for more details about how to use it.

For sending multiple files in one request refer to the advanced
section.

Warning

It is strongly recommended that you open files in binary
mode. This is because Requests may attempt to provide
the Content-Length header for you, and if it does this value
will be set to the number of bytes in the file. Errors may occur
if you open the file in text mode.

Response Status Codes¶

We can check the response status code:

>>> r = requests.get('https://httpbin.org/get')
>>> r.status_code
200

Requests also comes with a built-in status code lookup object for easy
reference:

>>> r.status_code == requests.codes.ok
True

If we made a bad request (a 4XX client error or 5XX server error response), we
can raise it with
Response.raise_for_status():

>>> bad_r = requests.get('https://httpbin.org/status/404')
>>> bad_r.status_code
404

>>> bad_r.raise_for_status()
Traceback (most recent call last):
  File "requests/models.py", line 832, in raise_for_status
    raise http_error
requests.exceptions.HTTPError: 404 Client Error

But, since our status_code for r was 200, when we call
raise_for_status() we get:

>>> r.raise_for_status()
None

All is well.

Cookies¶

If a response contains some Cookies, you can quickly access them:

>>> url = 'http://example.com/some/cookie/setting/url'
>>> r = requests.get(url)

>>> r.cookies['example_cookie_name']
'example_cookie_value'

To send your own cookies to the server, you can use the cookies
parameter:

>>> url = 'https://httpbin.org/cookies'
>>> cookies = dict(cookies_are='working')

>>> r = requests.get(url, cookies=cookies)
>>> r.text
'{"cookies": {"cookies_are": "working"}}'

Cookies are returned in a RequestsCookieJar,
which acts like a dict but also offers a more complete interface,
suitable for use over multiple domains or paths. Cookie jars can
also be passed in to requests:

>>> jar = requests.cookies.RequestsCookieJar()
>>> jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')
>>> jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere')
>>> url = 'https://httpbin.org/cookies'
>>> r = requests.get(url, cookies=jar)
>>> r.text
'{"cookies": {"tasty_cookie": "yum"}}'

Redirection and History¶

By default Requests will perform location redirection for all verbs except
HEAD.

We can use the history property of the Response object to track redirection.

The Response.history list contains the
Response objects that were created in order to
complete the request. The list is sorted from the oldest to the most recent
response.

For example, GitHub redirects all HTTP requests to HTTPS:

>>> r = requests.get('http://github.com/')

>>> r.url
'https://github.com/'

>>> r.status_code
200

>>> r.history
[<Response [301]>]

If you’re using GET, OPTIONS, POST, PUT, PATCH or DELETE, you can disable
redirection handling with the allow_redirects parameter:

>>> r = requests.get('http://github.com/', allow_redirects=False)

>>> r.status_code
301

>>> r.history
[]

If you’re using HEAD, you can enable redirection as well:

>>> r = requests.head('http://github.com/', allow_redirects=True)

>>> r.url
'https://github.com/'

>>> r.history
[<Response [301]>]

Timeouts¶

You can tell Requests to stop waiting for a response after a given number of
seconds with the timeout parameter. Nearly all production code should use
this parameter in nearly all requests. Failure to do so can cause your program
to hang indefinitely:

>>> requests.get('https://github.com/', timeout=0.001)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)

Note

timeout is not a time limit on the entire response download;
rather, an exception is raised if the server has not issued a
response for timeout seconds (more precisely, if no bytes have been
received on the underlying socket for timeout seconds). If no timeout is specified explicitly, requests do
not time out.

Errors and Exceptions¶

In the event of a network problem (e.g. DNS failure, refused connection, etc),
Requests will raise a ConnectionError exception.

Response.raise_for_status() will
raise an HTTPError if the HTTP request
returned an unsuccessful status code.

If a request times out, a Timeout exception is
raised.

If a request exceeds the configured number of maximum redirections, a
TooManyRedirects exception is raised.

All exceptions that Requests explicitly raises inherit from
requests.exceptions.RequestException.

Ready for more? Check out the advanced section.

If you’re on the job market, consider taking this programming quiz. A substantial donation will be made to this project, if you find a job through this platform.

Источник

I try to use requests library to get a content from an URL. In more details, I do it in the following way:

import requests

proxies = {'http':'my_proxy.blabla.com/'}
r = requests.get(url, proxies = proxies)
print r.text

As a result I get the following:

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head>
  <title>404 - Not Found</title>
 </head>
 <body>
  <h1>404 - Not Found</h1>
 </body>
</html>

So, it looks like the proxy let me go and I reached the server. However, the web server was unable to interpret my request (wrong path or so) and did not know what content to return. Do I interpret it correctly?

What can be the reason for that? I do get the expected content if I put the URL in one of my browsers.

ADDED

It has been suggested in the comments that the root of the problem is in the headers. So, I used this web site: http://www.procato.com/my+headers/ to find out what headers are sent by my browser. I used these values to set the headers variable given to the requests.get function. I set the values for the following keys: ‘User-Agent’, ‘Accept’, ‘Referer’, ‘Accept-Encoding’, ‘Accept-Language’, ‘X-Forwarded-For’, ‘Cache-Control’, ‘Connection’. Unfortunately, it does not resolve the problem. I am still getting the same 404 response.

ADDED 2

I have tested my function for tow different URLs and got exactly the same response. So, my previous assumption that the responses (XML that I see) comes from the web-server is probably wrong. It is unlikely that two completely different web-servers (one of them was Google) generate the same responses.

So, now I do not understand where the XML comes from. Can it be that it comes from the proxy server?

Источник

import requests

url = f'https://subtitry.ru/'
response = requests.get(url)
print(response)

Код выше вместе с сайтом. Самостоятельно он работает, а при отправке запроса возвращает 404, можно это как то исправить, или нет?

Вопрос задан

более двух лет назад
650 просмотров

import requests

headers = {
    'User-Agent': ('Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 '
                   'Firefox/14.0.1'),
    'Accept':
    'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language':
    'ru-ru,ru;q=0.8,en-us;q=0.5,en;q=0.3',
    'Accept-Encoding':
    'gzip, deflate',
    'Connection':
    'keep-alive',
    'DNT':
    '1'
}

url = f'https://subtitry.ru/'
response = requests.post(url, headers=headers).text
print(response)

Заголовок с юзерагентом добавьте

response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1.2 Safari/605.1.15'})

Пригласить эксперта

Показать ещё
Загружается…

21 сент. 2023, в 19:28

10000 руб./за проект

21 сент. 2023, в 19:06

11111 руб./за проект

21 сент. 2023, в 19:00

6000000 руб./за проект

Минуточку внимания

Источник

When we visit a website or navigate through multiple pages of a website, we often come across the 404 error page. This error is displayed when the server cannot find the requested page, and it may be due to several reasons like page deletion, broken links, or changes in URLs. In such cases, the server sends an HTTP status code 404 to the client’s browser, which displays a 404 error message, commonly known as the «Page not found» error.

To handle 404 errors effectively and provide a better user experience, websites use customized 404 pages that serve relevant information to the user, redirect to the homepage, or offer alternative links to the relevant pages. In this article, we will explore the concept of handling 404 requests using Python and compare some code examples.

What is 404 Response in Python?

In Python, the 404 response refers to an HTTP error status code that indicates that the requested resource is not found on the server. This status code is a part of the HTTP response header that the server sends to the client’s browser, indicating that the requested resource is not available. To handle 404 requests effectively, we need to create customized error pages that can be served to the user when a 404 error occurs.

Handling 404 Requests in Python

Handling 404 requests in Python involves a series of steps that need to be followed for better error handling and user experience. Here are the steps involved in handling 404 requests:

Step 1: Create a Customized 404 Page

To handle 404 requests, we need to create a customized 404 page that serves the requested information to the user. This page can be in the form of an HTML or Django template that can be displayed when a 404 error occurs. For instance, we can create a base.html file that holds the template for all other HTML pages and add a specific 404.html page that gets displayed when a 404 error occurs. This page can be customized based on the website’s design and layout.

Step 2: Define the 404 View

The next step in handling 404 requests is to define a view that helps to display the customized 404 template when a 404 error occurs. In Django, we can create a views.py file that holds a function that returns the customized 404 template. This function can be decorated with the django.views.decorators.common @require_http_methods decorator to restrict the function to handle only HTTP GET requests. The view function should accept the request parameter and render the 404 template using the Django render() function.

Step 3: Map the 404 View to URLs

After defining the 404 view function, we need to map it to the website’s URLs to get activated whenever a 404 error occurs. We can do this by including the view function in the Django URL configuration. The URL pattern should be defined with the re_path() function that matches any URL pattern and forwards the request to the 404 view function.

Code Examples

Code example 1: Customized 404 view function in Django

Here is an example of a Django view function that displays a customized 404 template when a 404 error occurs:

from django.shortcuts import render

def error_404(request, exception):
    return render(request,'404.html',{})

Code example 2: URL Configuration in Django

The next step is to map the view function to the website’s URLs using the Django URL configuration. Here is an example of the URL configuration in Django:

from django.urls import path, re_path
from . import views

handler404 = views.error_404 # To handle 404 requests

urlpatterns = [
    # Define your website URL patterns here,
    # Include all URLs to be mapped
]

urlpatterns += [
    re_path(r'^.*', views.error_404), # Handle any unmatched URL requests
]

Conclusion

In conclusion, the 404 error page is a common occurrence on websites and can be frustrating for users if not handled well. By creating a customized 404 error page and defining a 404 view function that gets triggered when a 404 error occurs, we can provide a better user experience by displaying the requested information to the user. The above mentioned code examples demonstrate how to handle 404 requests in Python, specifically in Django.

I’ll expand a bit more on the previous topics discussed.

HTTP Response Codes

HTTP response codes are three-digit numbers that indicate the status of a client’s request to the server. These codes are sent as a part of the HTTP response header and they convey information about the success or failure of the request. There are several types of HTTP status codes, the most common being the 200, 300, 400, and 500 series. Preceded by an HTTP header, response codes notify the browser as to the status of the requested document. These response codes are used to enable a web server to return the correct document to a client machine. A web server always spends a lot of time to get the best-suited document to be transferred to the client’s browser.

Python Requests Module

Python Requests is a powerful HTTP library that is used to send HTTP requests and responses. This module is designed to simplify the process of interacting with web services by abstracting the underlying HTTP protocol. With Python Requests, developers can handle complex HTTP requests with ease. Whether it’s sending a simple GET request or a more complex POST request, the Requests module has got you covered.

Handling HTTP Errors with Python Requests Module

When working with APIs and web services, it is not uncommon to encounter HTTP errors. These errors can occur due to a variety of reasons such as incorrect URLs, server downtime, or invalid credentials. The good news is that Python Requests makes it easy to handle HTTP errors. Whenever a request is made using the requests module, an HTTP response object is returned. This response object contains the HTTP status code and additional information about the response. When an error occurs, the requests module automatically raises an exception.

To handle HTTP errors in your Python code, you can use the try and except blocks to catch the raised exceptions. You can then use the response variable to get more information about the error, such as the HTTP status code or the error message.

Customize 404 Pages

Customizing the 404 page of your website can be a great way to improve user experience. When a user navigates to a page that does not exist, the server responds with a 404 error code. This can be frustrating for the user. By creating a customized 404 page, you can guide the user to other parts of your website, show them related content, or simply apologize for the inconvenience.

To create a customized 404 page, you’ll need to create a custom HTML or Django template that will be displayed when a 404 error occurs. You can then define a 404 view function that returns this template whenever a 404 error occurs. This view function can be mapped to your website’s URL configuration so that it gets displayed whenever a 404 error occurs.

In conclusion, HTTP response codes, the Python Requests module, and customizing 404 pages are important concepts for web developers. Knowing how to handle HTTP errors and customize error pages can greatly improve the user experience of your website.

Make a Request¶

Passing Parameters In URLs¶

Response Content¶

Binary Response Content¶

JSON Response Content¶

Raw Response Content¶

More complicated POST requests¶

POST a Multipart-Encoded File¶

Response Status Codes¶

Cookies¶

Redirection and History¶

Timeouts¶

Errors and Exceptions¶

Минуточку внимания

Popular questions

Tag

Интересное по теме: