PyMongo And Key Order In Subdocuments

Or, "Why does my query work in the shell but not PyMongo?"

Variations on this question account for a large portion of the Stack Overflow questions I see about PyMongo, so let me explain once for all.

MongoDB stores documents in a binary format called BSON. Key-value pairs in a BSON document can have any order (except that _id is always first).

The mongo shell preserves key order when reading and writing data. Observe that "b" comes before "a" when we create the document and when it is displayed:

> // mongo shell.
> db.collection.insert( {
...     '_id' : 1,
...     'subdocument' : { 'b' : 1, 'a' : 1 }
... } )
WriteResult({ 'nInserted' : 1 })
> db.collection.find()
{ '_id' : 1, 'subdocument' : { 'b' : 1, 'a' : 1 } }

PyMongo represents BSON documents as Python dicts by default, and the order of keys in dicts is not defined. That is, a dict declared with the "a" key first is the same, to Python, as one with "b" first:

>>> print {'a': 1.0, 'b': 1.0}
{'a': 1.0, 'b': 1.0}
>>> print {'b': 1.0, 'a': 1.0}
{'a': 1.0, 'b': 1.0}

Therefore, Python dicts are not guaranteed to show keys in the order they are stored in BSON. Here, "a" is shown before "b":

>>> print collection.find_one()
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}

To preserve order when reading BSON, use the SON class, which is a dict that remembers its key order. First, get a handle to the collection, configured to use SON instead of dict. In PyMongo 3.0 do this like:

>>> from bson import CodecOptions, SON
>>> opts = CodecOptions(as_class=SON)
>>> opts
CodecOptions(as_class=<class 'bson.son.SON'>,
>>> collection_son = collection.with_options(codec_options=opts)

Now, documents and subdocuments in query results are represented with SON objects:

>>> print collection_son.find_one()
SON([(u'_id', 1.0), (u'subdocument', SON([(u'b', 1.0), (u'a', 1.0)]))])

The subdocument’s actual storage layout is now visible: "b" is before "a".

Because a dict’s key order is not defined, you cannot predict how it will be serialized to BSON. But MongoDB considers subdocuments equal only if their keys have the same order. So if you use a dict to query on a subdocument it may not match:

>>> collection.find_one({'subdocument': {'a': 1.0, 'b': 1.0}}) is None

Swapping the key order in your query makes no difference:

>>> collection.find_one({'subdocument': {'b': 1.0, 'a': 1.0}}) is None

… because, as we saw above, Python considers the two dicts the same.

There are two solutions. First, you can match the subdocument field-by-field:

>>> collection.find_one({'subdocument.a': 1.0,
...                      'subdocument.b': 1.0})
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}

The query matches any subdocument with an "a" of 1.0 and a "b" of 1.0, regardless of the order you specify them in Python or the order they are stored in BSON. Additionally, this query now matches subdocuments with additional keys besides "a" and "b", whereas the previous query required an exact match.

The second solution is to use a SON to specify the key order:

>>> query = {'subdocument': SON([('b', 1.0), ('a', 1.0)])}
>>> collection.find_one(query)
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}

The key order you use when you create a SON is preserved when it is serialized to BSON and used as a query. Thus you can create a subdocument that exactly matches the subdocument in the collection.

For more info, see the MongoDB Manual entry on subdocument matching.

Reference: PyMongo And Key Order In Subdocuments from our WCG partner Jesse Davis at the A. Jesse Jiryu Davis blog.

Jesse Davis

Jesse is a senior engineer at MongoDB in New York City. He specializes in Python, MongoDB drivers, and asynchronous frameworks.
Notify of

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Inline Feedbacks
View all comments
Back to top button