Ошибка 999 на сайте

The HTTP response status code 999, known as “Request Denied,” is an unofficial status code that is not defined in the official HTTP specification. It is typically used as a generic or catch-all error code by certain services or hosts, and its specific meaning and implications can vary depending on the provider.

USE CASE

When encountering the 999 Request Denied status code, the interpretation and consequences of the response depend on the specific service or host involved. For example, social media platforms like LinkedIn may use this status code to restrict or block web crawlers or automated requests. The response may be intermittent and limited to a certain duration.
The factors that determine the occurrence of the 999 Request Denied status code can include the source IP address, the user agent string, and the type of HTTP request. In the case of LinkedIn, sending an HTTP GET or HEAD request to a specific profile page may trigger the 999 status code based on factors such as the user agent. Additionally, exceeding the allowed number of HTTP requests in a single day can also result in the 999 Request Denied response, similar to the HTTP 429 Too Many Requests error.

Example

In the provided example, when attempting to access the LinkedIn profile page of Fili (the creator of the website) using tools like curl, the response would be the 999 Request Denied status code. However, accessing the same URL through a web browser would display the profile as expected.
It’s important to note that since the 999 status code is unofficial, its behavior and usage can vary among different services or hosts. To understand the specific implications of encountering the 999 Request Denied status code in a particular context, it is recommended to refer to the documentation or support resources provided by the service or host in question.

curl Command

curl -I --url http://www.linkedin.com/in/filwiese

Server Request

GET /in/filwiese HTTP/1.1 Host: www.linkedin.com User-Agent: curl/7.74.0 Accept: */*

Server Response

HTTP/1.1 999 Request Denied Cache-Control: no-cache, no-store Pragma: no-cache …

From The Same Category

110 Response Is Stale
111 Revalidation Failed
112 Disconnected Operation
113 Heuristic Expiration
199 Miscellaneous Warning
214 Transformation Applied
299 Miscellaneous Persistent Warning

Join SEO Meetup and Learn Online Marketing

Источник

As mentioned in our ultimate guide to HTTP status codes, the HTTP 999 is unofficial, but it still plays an important role in the flow of our online journey in unexpected ways — and here’s why.

So what is it for?

We all know that the digital highway is held together by something called the Hypertext Transfer Protocol, or HTTP.

“Status codes” are like little green or red traffic signals telling your computer if a web page request is good to go, or if there’s a problem you need to address.

But in this orderly world of codes and signals, there’s a rebel: the HTTP 999 status code.

HTTP Status Code 999: Unofficial but Significant

So, why isn’t 999 an official HTTP status code?

The reason is pretty simple — it’s not listed in the Status Code Definitions outlined by the Internet Engineering Task Force (IETF), which is the closest thing to an official rulebook for HTTP status codes.

The IEFT lists a bunch of three-digit status codes, each with its own meaning, but HTTP 999 is not part of the list.

Despite its unofficial status, the HTTP 999 status code is used by some websites as a way to control how much traffic they receive from a single source, such as an IP address.

This usage of the 999 code became more prevalent after experts figured out that rate limiting —basically, setting a limit to how much traffic a server can handle from a single source— provides many benefits, including:

protection against data scraping or DDoS (Distributed Denial-of-Service) attacks (attackers cannot overwhelm a network with a high volume of requests)
improving user experience by reducing delays
lowering costs (you don’t have to pay for additional server capacity)

What does the HTTP 999 Status Code Measure?

Simply put, HTTP 999 measures the frequency of requests coming from an IP address within a certain period.

When a system detects an overwhelming amount of requests, it responds with the 999 status code, basically telling the client, “You’re overdoing it. Slow down!”

Think of it as a speed limit for data seekers.

Now, this limit isn’t a one-size-fits-all scenario — it changes based on the server. As a general rule, “with a single CPU core, a web server can handle around 250 concurrent requests at one time, so with 2 CPU cores, your server can handle 500 visitors at the same time, etc.”

That might sound like a good number, but to give you an idea of the reality of big websites, Google processes approximately 63,000 requests every second, or 5.6 billion searches per day.

Small to medium-sized website servers, however, can handle a lot less before they start throwing up rate limit codes, including our 999.

You might think this can’t happen to you if you have a small website, but a large percentage of internet traffic is made up of web crawlers, both good (like those from search engines) and bad (such as scrapers used by hackers).

Given such high traffic, it’s easy to see how servers would often resort to using the 999 code to keep things running smoothly.

Because the 999 code isn’t understood the same way by every website, the response to too many requests can vary — some servers might block the IP address that’s firing off too many requests or they might choose to slow down the response times as a way of protecting themselves.

Where Can You Find HTTP 999 Status Code?

HTTP 999 has become a standard response for some popular websites to protect their data, including LinkedIn.

When LinkedIn detects that a lot of requests are being made from the same IP address or if the requests seem to be automated (as would be the case with a web scraping tool or a bot), it may return an HTTP 999 response to stop these requests. It’s a measure LinkedIn uses to control access to its site, preserve its resources, and protect the data of its users.

Web crawlers that do not respect Linkedin’s robots.txt have to deal with Linkedin’s 999 HTTP response code.

SOURCE: Excellent Web Check

However, the response you get when you’re blocked is not necessarily always 999 — LinkedIn could also return an HTTP 429 error, which is a standard status code for too many requests.

Conclusion

Though unofficial, the HTTP 999 status code serves as a unique solution to an evolving problem, helping you manage web traffic and protect server resources.

However, its usage also highlights the need for a standardized approach to handle similar scenarios.

Are you keeping track of the health of your website and potential HTTP status codes?

Start monitoring website

UptimeRobot can scan your website for error status codes, like 404 (Not Found) or 500 (Internal Server Error) and alert you right away so you can address any issues and minimize downtime.

Источник

HTTP response status code 999 Request Denied is an unofficial HTTP status code that is returned by the server as a generic, or catch-all error code. The reason for the HTTP response varies based on the service or host.

Usage

When the 999 Request Denied status code is received, the meaning and consequences depend on the host. This is used by some social media sites to limit or prevent web crawlers.

The HTTP response can be intermittent and sometimes is only returned for a limited time. It can vary based on the source-IP address, the user agent string, and the type of HTTP request. One service that returns this HTTP status code is LinkedIn.

When sending an HTTP GET or HEAD request to LinkedIn for a specific profile, such as the page for a company or an individual, the 999 Request Denied status code response is returned based on, among other things, the user agent. It will also be returned if there are too many HTTP requests in a single day. This is similar to the HTTP 429 Too Many Requests error message.

Example

In this example, the client is sending an HTTP GET request to the following URL:

https://www.linkedin.com/in/filwiese

Visited using a web browser, this will return the profile for Fili (the creator of this website) on the LinkedIn social networking site. However, using a tool such as curl returns the 999 Request Denied status code.

curl Command

curl -I --url http://www.linkedin.com/in/filwiese

This command will create the following HTTP GET request, depending on the version of the curl utility.

Request

GET /in/filwiese HTTP/1.1
Host: www.linkedin.com
User-Agent: curl/7.74.0
Accept: */*

Response

HTTP/1.1 999 Request Denied
Cache-Control: no-cache, no-store
Pragma: no-cache
… <additional headers including Content-Length, Content-Type, etc>

Takeaway

The 999 Request Denied status code is a generic error message that has different consequences depending on the server. The LinkedIn social media site returns this HTTP status code for different reasons, including submitting too many HTTP requests in a day, as well as the user agent that is making the HTTP request.

While it seems to work locally (Mac), when we attempt the request from any of our Ubuntu servers, LinkedIn returns a 999 status code. Not an API request, just a simple curl like we do for every other link. We’ve tried on a few different machines and tried altering the user agent, but no dice. How do I modify our curl so that working links return a 200?

A sample HEAD request:

curl -I --url https://www.linkedin.com/company/linkedin

Sample Response on Ubuntu machine:

HTTP/1.1 999 Request denied
Date: Tue, 18 Nov 2014 23:20:48 GMT
Server: ATS
X-Li-Pop: prod-lva1
Content-Length: 956
Content-Type: text/html

To respond to @alexandru-guzinschi a little better. We’ve tried masking the User Agents. To sum up our trials:

Mac machine + Mac UA => works
Mac machine + Windows UA => works
Ubuntu remote machine + (no UA change) => fails
Ubuntu remote machine + Mac UA => fails
Ubuntu remote machine + Windows UA => fails
Ubuntu local virtual machine (on Mac) + (no UA change) => fails
Ubuntu local virtual machine (on Mac) + Windows UA => works
Ubuntu local virtual machine (on Mac) + Mac UA => works

So now I’m thinking they block any curl requests that dont provide an alternate UA and also block hosting providers?

Is there any other way I can check if a link to linkedin is valid or if it will lead to their 404 page, from an Ubuntu machine using PHP?

Источник

I am trying to scrape some web pages from LinkedIn using BeautifulSoup and I keep getting error «HTTP Error 999: Request denied». Is there a way around to avoid this error. If you look at my code, I have tried Mechanize and URLLIB2 and both are giving me the same error.

from __future__ import unicode_literals
from bs4 import BeautifulSoup
import urllib2
import csv
import os
import re
import requests
import pandas as pd
import urlparse
import urllib
import urllib2
from BeautifulSoup import BeautifulSoup
from BeautifulSoup import BeautifulStoneSoup
import urllib
import urlparse
import pdb
import codecs
from BeautifulSoup import UnicodeDammit
import codecs
import webbrowser
from urlgrabber import urlopen
from urlgrabber.grabber import URLGrabber
import mechanize

fout5 = codecs.open('data.csv','r', encoding='utf-8', errors='replace')

for y in range(2,10,1):


    url = "https://www.linkedin.com/job/analytics-%2b-data-jobs-united-kingdom/?sort=relevance&page_num=1"

    params = {'page_num':y}

    url_parts = list(urlparse.urlparse(url))
    query = dict(urlparse.parse_qsl(url_parts[4]))
    query.update(params)

    url_parts[4] = urllib.urlencode(query)
    y = urlparse.urlunparse(url_parts)
    #print y



    #url = urllib2.urlopen(y)
    #f = urllib2.urlopen(y)

    op = mechanize.Browser() # use mecahnize's browser
    op.set_handle_robots(False) #tell the webpage you're not a robot
    j = op.open(y)
    #print op.title()


    #g = URLGrabber()
    #data = g.urlread(y)
    #data = fo.read()
    #print data

    #html = response.read()
    soup1 = BeautifulSoup(y)
    print soup1

MattDMo

101k21 gold badges241 silver badges231 bronze badges

asked May 17, 2015 at 15:34

You should be using the LinkedIn REST API, either directly or using python-linkedin. It allows for direct access to the data, instead of attempting to scrape the JavaScript-heavy web site.

answered May 17, 2015 at 15:51

MattDMoMattDMo

101k21 gold badges241 silver badges231 bronze badges

Try to set up User-Agent header. Add this line after op.set_handle_robots(False)

op.addheaders = [('User-Agent': "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36")]

Edit: If you want to scrape web-sites, first check if it has API or library, that deals with API.

answered May 17, 2015 at 15:52

f43d65f43d65

2,26411 silver badges15 bronze badges

Источник

USE CASE

Example

From The Same Category

Join SEO Meetup and Learn Online Marketing

HTTP Status Code 999: Unofficial but Significant

What does the HTTP 999 Status Code Measure?

Where Can You Find HTTP 999 Status Code?

Conclusion

Usage

LinkedIn

Example

Takeaway

See also

Интересное по теме: