The above Idea is good but I had problem with that. My json Sting consisted only one additional double quote in it.
So, I made a fix to the above given code.
The jsonStr was
{
"api_version": "1.3",
"response_code": "200",
"id": "3237490513229753",
"lon": "38.969916127827",
"lat": "45.069889625267",
"page_url": null,
"name": "ATB",
"firm_group": {
"id": "3237499103085728",
"count": "1"
},
"city_name": "Krasnodar",
"city_id": "3237585002430511",
"address": "Turgeneva, 172/1",
"create_time": "2008-07-22 10:02:04 07",
"modification_time": "2013-08-09 20:04:36 07",
"see_also": [
{
"id": "3237491513434577",
"lon": 38.973110606808,
"lat": 45.029031222211,
"name": "Advance",
"hash": "5698hn745A8IJ1H86177uvgn94521J3464he26763737242Cf6e654G62J0I7878e",
"ads": {
"sponsored_article": {
"title": "Center "ADVANCE",
"text": "Business.English."
},
"warning": null
}
}
]
}
The fix is as follows:
import json, re
def fixJSON(jsonStr):
# Substitue all the backslash from JSON string.
jsonStr = re.sub(r'\\', '', jsonStr)
try:
return json.loads(jsonStr)
except ValueError:
while True:
# Search json string specifically for '"'
b = re.search(r'[\w|"]\s?(")\s?[\w|"]', jsonStr)
# If we don't find any the we come out of loop
if not b:
break
# Get the location of \"
s, e = b.span(1)
c = jsonStr[s:e]
# Replace \" with \'
c = c.replace('"',"'")
jsonStr = jsonStr[:s] + c + jsonStr[e:]
return json.loads(jsonStr)
This code also works for JSON string mentioned in problem statement
OR you can also do this:
def fixJSON(jsonStr):
# First remove the " from where it is supposed to be.
jsonStr = re.sub(r'\\', '', jsonStr)
jsonStr = re.sub(r'{"', '{`', jsonStr)
jsonStr = re.sub(r'"}', '`}', jsonStr)
jsonStr = re.sub(r'":"', '`:`', jsonStr)
jsonStr = re.sub(r'":', '`:', jsonStr)
jsonStr = re.sub(r'","', '`,`', jsonStr)
jsonStr = re.sub(r'",', '`,', jsonStr)
jsonStr = re.sub(r',"', ',`', jsonStr)
jsonStr = re.sub(r'\["', '\[`', jsonStr)
jsonStr = re.sub(r'"\]', '`\]', jsonStr)
# Remove all the unwanted " and replace with ' '
jsonStr = re.sub(r'"',' ', jsonStr)
# Put back all the " where it supposed to be.
jsonStr = re.sub(r'\`','\"', jsonStr)
return json.loads(jsonStr)
JSON, which stands for JavaScript Object Notation, is a popular format for representing data in web applications. Python provides an easy way to work with JSON data, but parsing errors can be frustratingly difficult to debug. This guide will provide a comprehensive look at various types of JSON parsing errors in Python and how to effectively debug them.
Understanding JSON Syntax
Before diving into JSON parsing errors, it’s important to understand the syntax of a well-formed JSON object. A JSON object consists of one or more key-value pairs, enclosed in curly braces ({}
). Keys are strings enclosed in quotes (""
) followed by a colon (:
) and a value. Values can be one of several types: a string (also enclosed in quotes), a number, a boolean, an array (enclosed in square brackets []
) or another JSON object.
Here’s an example of a simple JSON object:
{
"name": "John",
"age": 30,
"isMarried": false,
"hobbies": ["reading", "running", "cooking"],
"address": {
"street": "123 Main St",
"city": "New York",
"state": "NY"
}
}
Common JSON Parsing Errors
SyntaxError
A SyntaxError is raised when the JSON object is not properly formatted. This can occur for several reasons, such as missing quotes, incorrect commas or incorrect nesting of curly braces. Here’s an example of a JSON object that raises a SyntaxError:
{
"name": "John",
"age": 30,
"isMarried": false,
"hobbies": ["reading", "running", "cooking"]
"address": {
"street": "123 Main St",
"city": "New York",
"state": "NY"
}
}
Note the missing comma after the "hobbies"
array, which causes a SyntaxError.
To fix a SyntaxError, carefully check the JSON syntax to identify and correct the issue.
ValueError
A ValueError is raised when the JSON object contains an invalid value. This can happen if a value is not enclosed in quotes (for example, "age": 30
instead of "age": "30"
), or if a value is not a valid JSON type. Here’s an example of a JSON object that raises a ValueError:
{
"name": "John",
"age": "thirty",
"isMarried": false,
"hobbies": ["reading", "running", "cooking"],
"address": {
"street": "123 Main St",
"city": "New York",
"state": "NY"
}
}
Note the "age"
value is a string ("thirty"
) instead of a number. This causes a ValueError.
To fix a ValueError, carefully check the JSON values for correct typing and make necessary corrections.
KeyError
A KeyError is raised when attempting to retrieve a non-existent key from a JSON object. This can happen when the key is misspelled or when the JSON object is not constructed as expected. Here’s an example of a JSON object that raises a KeyError:
{
"name": "John",
"age": 30,
"isMarried": false,
"hobbies": ["reading", "running", "cooking"],
"address": {
"street": "123 Main St",
"city": "New York",
"state": "NY"
}
}
person = json.loads(my_json)
print(person["lastname"])
Note that there is no "lastname"
key in the JSON object, so attempting to retrieve it will raise a KeyError.
To fix a KeyError, ensure that the correct key is being used to access the JSON object.
Debugging JSON Parsing Errors
Debugging JSON parsing errors can be a time-consuming process. Here are some tips to help make the process easier:
- Use a JSON validator such as JSONLint to verify that the JSON syntax is well-formed.
- Use
try...except
blocks to handle parsing errors gracefully and provide better user feedback. - Use
print()
statements to inspect the JSON object and identify the location of parsing errors. - Break down the JSON object into smaller components to pinpoint where the error may be occurring.
With these tips and a solid understanding of JSON syntax and common parsing errors, you’ll be well-equipped to debug JSON parsing errors in Python.
Why do you have a list of both numbers and some other kind of object? It seems like you’re trying to compensate for a design flaw.
As a matter of fact, I want it work this way because I want to keep data that is already encoded in JsonedData(), then I want json module to give me some way to insert a ‘raw’ item data rather than the defaults, so that encoded JsonedData could be reuseable.
here’s the code, thanks
import json
import io
class JsonedData():
def __init__(self, data):
self.data = data
def main():
try:
for chunk in json.JSONEncoder().iterencode([1,2,3,JsonedData(u'4'),5]):
print chunk
except TypeError: pass# except come method to make the print continue
# so that printed data is something like:
# [1
# ,2
# ,3
# ,
# ,5]
asked Oct 18, 2011 at 2:52
tdihptdihp
2,3292 gold badges23 silver badges40 bronze badges
2
put the try
/except
inside the loop around the json.JSONEncoder().encode(item)
:
print "[",
lst = [1, 2, 3, JsonedData(u'4'), 5]
for i, item in enumerate(lst):
try:
chunk = json.JSONEncoder().encode(item)
except TypeError:
pass
else:
print chunk
finally:
# dont print the ',' if this is the last item in the lst
if i + 1 != len(lst):
print ","
print "]"
answered Oct 18, 2011 at 3:01
chownchown
51.9k16 gold badges134 silver badges170 bronze badges
Use the skipkeys
option for JSONEncoder()
so that it skips items that it can’t encode. Alternatively, create a default
method for your JsonedData
object. See the docs.
answered Oct 18, 2011 at 2:59
immimm
5,8471 gold badge26 silver badges32 bronze badges
3
When an invalid instance is encountered, a ValidationError
will be
raised or returned, depending on which method or function is used.
- exception jsonschema.exceptions.ValidationError(message: str, validator=<unset>, path=(), cause=None, context=(), validator_value=<unset>, instance=<unset>, schema=<unset>, schema_path=(), parent=None, type_checker=<unset>)[source]
-
An instance was invalid under a provided schema.
The information carried by an error roughly breaks down into:
What Happened
Why Did It Happen
What Was Being Validated
message
context
cause
instance
json_path
path
schema
schema_path
validator
validator_value
- message#
-
A human readable message explaining the error.
- validator#
-
The name of the failed keyword.
- validator_value#
-
The associated value for the failed keyword in the schema.
- schema#
-
The full schema that this error came from. This is potentially a
subschema from within the schema that was passed in originally,
or even an entirely different schema if a $ref was
followed.
- relative_schema_path#
-
A
collections.deque
containing the path to the failed keyword
within the schema.
- absolute_schema_path#
-
A
collections.deque
containing the path to the failed
keyword within the schema, but always relative to the
original schema as opposed to any subschema (i.e. the one
originally passed into a validator class, notschema
).
- schema_path#
-
Same as
relative_schema_path
.
- relative_path#
-
A
collections.deque
containing the path to the
offending element within the instance. The deque can be empty if
the error happened at the root of the instance.
- absolute_path#
-
A
collections.deque
containing the path to the
offending element within the instance. The absolute path
is always relative to the original instance that was
validated (i.e. the one passed into a validation method, not
instance
). The deque can be empty if the error happened
at the root of the instance.
- json_path#
-
A JSON path
to the offending element within the instance.
- path#
-
Same as
relative_path
.
- instance#
-
The instance that was being validated. This will differ from
the instance originally passed intovalidate
if the
validator object was in the process of validating a (possibly
nested) element within the top-level instance. The path within
the top-level instance (i.e.ValidationError.path
) could
be used to find this object, but it is provided for convenience.
- context#
-
If the error was caused by errors in subschemas, the list of errors
from the subschemas will be available on this property. The
schema_path
andpath
of these errors will be relative
to the parent error.
- cause#
-
If the error was caused by a non-validation error, the
exception object will be here. Currently this is only used
for the exception raised by a failed format checker in
jsonschema.FormatChecker.check
.
- parent#
-
A validation error which this error is the
context
of.
None
if there wasn’t one.
In case an invalid schema itself is encountered, a SchemaError
is
raised.
- exception jsonschema.exceptions.SchemaError(message: str, validator=<unset>, path=(), cause=None, context=(), validator_value=<unset>, instance=<unset>, schema=<unset>, schema_path=(), parent=None, type_checker=<unset>)[source]
-
A schema was invalid under its corresponding metaschema.
The same attributes are present as for
ValidationError
s.
These attributes can be clarified with a short example:
schema = { "items": { "anyOf": [ {"type": "string", "maxLength": 2}, {"type": "integer", "minimum": 5} ] } } instance = [{}, 3, "foo"] v = Draft202012Validator(schema) errors = sorted(v.iter_errors(instance), key=lambda e: e.path)
The error messages in this situation are not very helpful on their own.
for error in errors: print(error.message)
outputs:
{} is not valid under any of the given schemas 3 is not valid under any of the given schemas 'foo' is not valid under any of the given schemas
If we look at ValidationError.path
on each of the errors, we can find
out which elements in the instance correspond to each of the errors. In
this example, ValidationError.path
will have only one element, which
will be the index in our list.
for error in errors: print(list(error.path))
Since our schema contained nested subschemas, it can be helpful to look at
the specific part of the instance and subschema that caused each of the errors.
This can be seen with the ValidationError.instance
and
ValidationError.schema
attributes.
With keywords like anyOf, the ValidationError.context
attribute can be used to see the sub-errors which caused the failure. Since
these errors actually came from two separate subschemas, it can be helpful to
look at the ValidationError.schema_path
attribute as well to see where
exactly in the schema each of these errors come from. In the case of sub-errors
from the ValidationError.context
attribute, this path will be relative
to the ValidationError.schema_path
of the parent error.
for error in errors: for suberror in sorted(error.context, key=lambda e: e.schema_path): print(list(suberror.schema_path), suberror.message, sep=", ")
[0, 'type'], {} is not of type 'string' [1, 'type'], {} is not of type 'integer' [0, 'type'], 3 is not of type 'string' [1, 'minimum'], 3 is less than the minimum of 5 [0, 'maxLength'], 'foo' is too long [1, 'type'], 'foo' is not of type 'integer'
The string representation of an error combines some of these attributes for
easier debugging.
3 is not valid under any of the given schemas Failed validating 'anyOf' in schema['items']: {'anyOf': [{'maxLength': 2, 'type': 'string'}, {'minimum': 5, 'type': 'integer'}]} On instance[1]: 3
ErrorTrees#
If you want to programmatically query which validation keywords
failed when validating a given instance, you may want to do so using
jsonschema.exceptions.ErrorTree
objects.
- class jsonschema.exceptions.ErrorTree(errors=())[source]
-
ErrorTrees make it easier to check which validations failed.
- errors#
-
The mapping of validation keywords to the error objects (usually
jsonschema.exceptions.ValidationError
s) at this level of the tree.
- __contains__(index)[source]
-
Check whether
instance[index]
has any errors.
- __getitem__(index)[source]
-
Retrieve the child tree one level down at the given
index
.If the index is not in the instance that this tree corresponds
to and is not known by this tree, whatever error would be raised
byinstance.__getitem__
will be propagated (usually this is
some subclass ofLookupError
.
- __init__(errors=())[source]
- __iter__()[source]
-
Iterate (non-recursively) over the indices in the instance with errors.
- __len__()[source]
-
Return the
total_errors
.
- __repr__()[source]
-
Return repr(self).
- __setitem__(index, value)[source]
-
Add an error to the tree at the given
index
.
- property total_errors
-
The total number of errors in the entire tree, including children.
Consider the following example:
schema = { "type" : "array", "items" : {"type" : "number", "enum" : [1, 2, 3]}, "minItems" : 3, } instance = ["spam", 2]
For clarity’s sake, the given instance has three errors under this schema:
v = Draft202012Validator(schema) for error in sorted(v.iter_errors(["spam", 2]), key=str): print(error.message)
'spam' is not of type 'number' 'spam' is not one of [1, 2, 3] ['spam', 2] is too short
Let’s construct an jsonschema.exceptions.ErrorTree
so that we
can query the errors a bit more easily than by just iterating over the
error objects.
from jsonschema.exceptions import ErrorTree tree = ErrorTree(v.iter_errors(instance))
As you can see, jsonschema.exceptions.ErrorTree
takes an iterable of ValidationError
s when constructing a tree so you can directly pass it the return value of a validator’s jsonschema.protocols.Validator.iter_errors
method.
ErrorTree
s support a number of useful operations. The first one we
might want to perform is to check whether a given element in our instance
failed validation. We do so using the in
operator:
>>> 0 in tree True >>> 1 in tree False
The interpretation here is that the 0th index into the instance ("spam"
)
did have an error (in fact it had 2), while the 1th index (2
) did not (i.e.
it was valid).
If we want to see which errors a child had, we index into the tree and look at
the ErrorTree.errors
attribute.
>>> sorted(tree[0].errors) ['enum', 'type']
Here we see that the enum and type keywords failed for
index 0
. In fact ErrorTree.errors
is a dict, whose values are the
ValidationError
s, so we can get at those directly if we want them.
>>> print(tree[0].errors["type"].message) 'spam' is not of type 'number'
Of course this means that if we want to know if a given validation
keyword failed for a given index, we check for its presence in
ErrorTree.errors
:
>>> "enum" in tree[0].errors True >>> "minimum" in tree[0].errors False
Finally, if you were paying close enough attention, you’ll notice that
we haven’t seen our minItems error appear anywhere yet. This is
because minItems is an error that applies globally to the instance
itself. So it appears in the root node of the tree.
>>> "minItems" in tree.errors True
That’s all you need to know to use error trees.
To summarize, each tree contains child trees that can be accessed by
indexing the tree to get the corresponding child tree for a given
index into the instance. Each tree and child has a ErrorTree.errors
attribute, a dict, that maps the failed validation keyword to the
corresponding validation error.
best_match and relevance#
The best_match
function is a simple but useful function for attempting
to guess the most relevant error in a given bunch.
>>> from jsonschema import Draft202012Validator >>> from jsonschema.exceptions import best_match >>> schema = { ... "type": "array", ... "minItems": 3, ... } >>> print(best_match(Draft202012Validator(schema).iter_errors(11)).message) 11 is not of type 'array'
- jsonschema.exceptions.best_match(errors, key=<function by_relevance.<locals>.relevance>)[source]
-
Try to find an error that appears to be the best match among given errors.
In general, errors that are higher up in the instance (i.e. for which
ValidationError.path
is shorter) are considered better matches,
since they indicate “more” is wrong with the instance.If the resulting match is either oneOf or anyOf, the
opposite assumption is made – i.e. the deepest error is picked,
since these keywords only need to match once, and any other errors
may not be relevant.- Parameters:
-
-
errors (collections.abc.Iterable) – the errors to select from. Do not provide a mixture of
errors from different validation attempts (i.e. from
different instances or schemas), since it won’t produce
sensical output. -
key (collections.abc.Callable) – the key to use when sorting errors. See
relevance
and
transitivelyby_relevance
for more details (the default is
to sort with the defaults of that function). Changing the
default is only useful if you want to change the function
that rates errors but still want the error context descent
done by this function.
-
- Returns:
-
the best matching error, or
None
if the iterable was empty
Note
This function is a heuristic. Its return value may change for a given
set of inputs from version to version if better heuristics are added.
- jsonschema.exceptions.relevance(validation_error)
-
A key function that sorts errors based on heuristic relevance.
If you want to sort a bunch of errors entirely, you can use
this function to do so. Using this function as a key to e.g.
sorted
ormax
will cause more relevant errors to be
considered greater than less relevant ones.Within the different validation keywords that can fail, this
function considers anyOf and oneOf to be weak
validation errors, and will sort them lower than other errors at the
same level in the instance.If you want to change the set of weak [or strong] validation
keywords you can create a custom version of this function with
by_relevance
and provide a different set of each.
>>> schema = { ... "properties": { ... "name": {"type": "string"}, ... "phones": { ... "properties": { ... "home": {"type": "string"} ... }, ... }, ... }, ... } >>> instance = {"name": 123, "phones": {"home": [123]}} >>> errors = Draft202012Validator(schema).iter_errors(instance) >>> [ ... e.path[-1] ... for e in sorted(errors, key=exceptions.relevance) ... ] ['home', 'name']
- jsonschema.exceptions.by_relevance(weak=frozenset({‘anyOf’, ‘oneOf’}), strong=frozenset({}))[source]
-
Create a key function that can be used to sort errors by relevance.
- Parameters:
-
-
weak (set) – a collection of validation keywords to consider to be
“weak”. If there are two errors at the same level of the
instance and one is in the set of weak validation keywords,
the other error will take priority. By default, anyOf
and oneOf are considered weak keywords and will be
superseded by other same-level validation errors. -
strong (set) – a collection of validation keywords to consider to be
“strong”
-
JSON (JavaScript Object Notation) — это универсальный формат данных, который широко используется для обмена данными между веб-сервером и клиентом. При работе с
JSON (JavaScript Object Notation) — это универсальный формат данных, который широко используется для обмена данными между веб-сервером и клиентом. При работе с JSON в Python часто возникают проблемы с разбором данных.
Рассмотрим типичную проблему. Допустим, есть файл с данными в формате JSON:
{
«maps»: [
{
«id»: «blabla»,
«iscategorical»: «0»
},
{
«id»: «blabla»,
«iscategorical»: «0»
}
],
«masks»: [
«id»: «valore»
],
«om_points»: «value»,
«parameters»: [
«id»: «valore»
]
}
И есть скрипт на Python, который пытается прочитать эти данные:
import json from pprint import pprint with open('data.json') as f: data = json.load(f) pprint(data)
В результате выполнения этого скрипта может возникнуть ошибка json.decoder.JSONDecodeError
. Это происходит из-за того, что данные в файле JSON не соответствуют правильному формату. В JSON каждый объект должен быть парой ключ-значение. Но в данном примере в массивах «masks» и «parameters» приведены только значения без ключей.
Чтобы исправить ошибку, необходимо убедиться, что все данные в JSON соответствуют правильному формату. В данном случае, исправленный файл может выглядеть так:
{
«maps»: [
{
«id»: «blabla»,
«iscategorical»: «0»
},
{
«id»: «blabla»,
«iscategorical»: «0»
}
],
«masks»: [
{
«id»: «valore»
}
],
«om_points»: «value»,
«parameters»: [
{
«id»: «valore»
}
]
}
Таким образом, при возникновении ошибок при разборе JSON в Python важно внимательно проверять соответствие данных правильному формату JSON.