[Mongoose] Trouble with Stream Processing Only Default batchSize When Sorting by Unindexed Field
I’ll introduce a troublesome issue in Mongoose where specifying an unindexed field for sort and retrieving with stream only processes the default batchSize of documents and then completes normally.
Case of Bulk Retrieval with find()
When retrieving all documents with find(), the first argument err in the callback contained no value, but the last element of the array contained $err and code.
User.find().sort({
createdAt: -1
}).setOptions({
noCursorTimeout: true
}).exec(function(err, docs) {
console.log(err);
console.log(docs.length);
console.log(docs[docs.length - 1]);
/**
Console.log output results below:
null
1001
{ '$err': 'getMore runner error: Overflow sort stage buffered data usage of 33555763 bytes exceeds internal limit of 33554432 bytes',
code: 17406,
...
**/
});
Case of Retrieving All Documents with stream()
The same applies when retrieving with stream() - the last retrieved document contained $err and code.
var stream = User.find()
.sort({ createdAt: -1 })
.setOptions({ noCursorTimeout: true })
.stream();
stream.on('data', function(user){
/**
Only retrieves default batchSize 1000 documents, gets data containing $err and code as the 1001st document, and completes normally
{ '$err': 'getMore runner error: Overflow sort stage buffered data usage of 33555763 bytes exceeds internal limit of 33554432 bytes',
code: 17406,
...
**/
});
stream.on('error', function(err){
// Error event is not called
});
Solutions
Create an index for the field specified in sort
Since the lack of an index is the root problem, creating an index will solve it.
User.path('createdAt').index(true);
Specify a field with an existing index for sort
If you don’t want to create an index due to index size concerns, you can specify the default _id for sort.
User.find().sort({
_id: -1
}).setOptions({
noCursorTimeout: true
}).exec(function(err, docs) {});
Thoughts
Unless you notice that only 1000 documents are being retrieved, you won’t realize an error has occurred, which seems like poor error handling implementation in Mongoose…
Reference Information
That’s all from the Gemba.