Issue
We have a MongoDB collection that has duplicate revisions for the same transaction.
Our ask is to cleanup this collection and retain only the latest revision of each transaction. I wrote a script that does remove the duplicates but for some records it does not keep the latest revision. The script is available here for reference: https://gist.github.com/praveenkrjha/1f02ed675e297a2c85e9020e39320a41
Initial record (tid is the transaction id and rv is the revision number):
_id: 61b994a48d546d882462da25, tid: b1def2a8-01e0-4f8c-a859-7cd63287617c, rv: 42
_id: 61b994a48d546d882462da27, tid: b1def2a8-01e0-4f8c-a859-7cd63287617c, rv: 42
_id: 61b994a48d546d882462da29, tid: b1def2a8-01e0-4f8c-a859-7cd63287617c, rv: 42
_id: 61b994a48d546d882462da2b, tid: d9b9e197-6f91-480c-8757-2a24330bc4c5, rv: 21
_id: 61b994a48d546d882462da2d, tid: d9b9e197-6f91-480c-8757-2a24330bc4c5, rv: 22
_id: 61b994a48d546d882462da2f, tid: df0f1252-51d3-4471-be68-a81e8b9a11e2, rv: 31
_id: 61b994a48d546d882462da31, tid: df0f1252-51d3-4471-be68-a81e8b9a11e2, rv: 32
_id: 61b994a48d546d882462da33, tid: bc92c1ee-91b4-4eea-b9ad-752f8dba6c54, rv: 11
_id: 61b994a48d546d882462da35, tid: bc92c1ee-91b4-4eea-b9ad-752f8dba6c54, rv: 12
_id: 61b994a48d546d882462da37, tid: 2a131e9a-07f1-413f-a21f-4f1bbbb3d460, rv: 1
_id: 61b994a48d546d882462da39, tid: e2c7fb60-a1e2-46ea-8d6a-e94e14ded2d3, rv: 1
_id: 61b994a48d546d882462da3b, tid: 314f7cb2-1aad-4add-b2e1-17e910d97713, rv: 1
Final result after clean-up:
_id: 61b994a48d546d882462da25, tid: b1def2a8-01e0-4f8c-a859-7cd63287617c, rv: 42
_id: 61b994a48d546d882462da2b, tid: d9b9e197-6f91-480c-8757-2a24330bc4c5, rv: 21
_id: 61b994a48d546d882462da2f, tid: df0f1252-51d3-4471-be68-a81e8b9a11e2, rv: 31
_id: 61b994a48d546d882462da33, tid: bc92c1ee-91b4-4eea-b9ad-752f8dba6c54, rv: 11
_id: 61b994a48d546d882462da37, tid: 2a131e9a-07f1-413f-a21f-4f1bbbb3d460, rv: 1
_id: 61b994a48d546d882462da39, tid: e2c7fb60-a1e2-46ea-8d6a-e94e14ded2d3, rv: 1
_id: 61b994a48d546d882462da3b, tid: 314f7cb2-1aad-4add-b2e1-17e910d97713, rv: 1
As you can see from final result, it deletes the latest revision for some transaction, which should not be done. Any pointers on how that can be achieved?
Solution
Try this one:
db.collection.aggregate([
{ $sort: { rv: -1 } },
{ $group: { _id: "$tid", data: { $first: "$$ROOT" } } },
{ $replaceWith: "$data" }
])
or
db.collection.aggregate([
{ $sort: { rv: 1 } },
{ $group: { _id: "$tid", data: { $last: "$$ROOT" } } },
{ $replaceWith: "$data" }
])
Answered By - Wernfried Domscheit
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.