linermost.blogg.se

Movie collector duplicate records
Movie collector duplicate records










movie collector duplicate records
  1. Movie collector duplicate records movie#
  2. Movie collector duplicate records update#
  3. Movie collector duplicate records free#
movie collector duplicate records

Find, delete or remove duplicates and even more.ĪllDup helps you to find and remove duplicate files.

Movie collector duplicate records free#

Free up new space on your computer, laptop or flash drive - in one click with AllDup the automatic duplicate file remover. Find and Remove Duplicate Files Find and remove duplicate files easily? Find and remove duplicate music files, remove duplicate MP3 files, remove duplicate photos, remove duplicate files of ANY type - automatically right now. Therefore ,aggregate More appropriate, pymongo file 1, pymongo file 2, MongoDB file. In addition, When the amount of data is large ,distinct Will report a mistake distinct too big, 16mb cap. It directly returns the de duplicated data ( And can only return all the values of a field, I don't know if I can return the values of all fields, That is to return all the data ), But the original duplicate data in the database is still. You may think of MySQL and MongoDB Of distinct, but pymongo Of distinct Returns all the different values for a field, if there be 3 Data, Of each name The values of the fields are Zhang San 、 Li Si 、 Zhang San, Then set. There is already some data in the database, And I'm not sure if there are duplicate data, If there are duplicate data, You need to delete the duplicate data first, Insert new data.

Movie collector duplicate records movie#

Notes : In fact, the movie may have the same name, However, the crawling data in this scenario does not have the same name, Of course, the more important thing here is to realize MongoDB The de duplication operation of.

Movie collector duplicate records update#

In the method, We call update_one Method, The first parameter is the query condition, According to name The query The second parameter is data Object itself, It's all the data, Here we use $set The operator represents the update operation The third parameter is critical, This is actually upsert Parameters, If you set this to True, Then you can do exist and update, There is no plug in function, The update will be set according to the first parameter name Field, Therefore, this can prevent movie data with the same name from appearing in the database. Here we declare a save_data Method, It receives a data Parameters, That is, the movie details we just extracted. In addition ,MongoDB Self generated _id, Behind in use _id When inquiring, Need to be from bson.objectid import ObjectId, The query criteria are written as, upsert=True) , You can also use this directly id As _id, But with this id If there is duplicate data, It's better to cover, Because of the same article id identical, But if the content of the article is updated, This data cannot be ignored when crawling again, Should cover. If the data returned by the interface comes with id( or URL There is id, Such as csdn There is a link to the current article id, namely /article/details/ And then a string of numbers ), Because of this id Is the only one. , That is, there will be no other film with the current film name and categories and score Same time ( Select the appropriate field according to the actual situation ), So you can generate in this way md5 As _id Value, So as to realize de duplication when inserting data. Take crawling movie information as an example, The assumption here is based on name and categories and score Generated md5 Is the only one. MongoDB When inserting data, it will be automatically based on _id The value of determines whether it is duplicate data, That is, whether there is a piece of data in the database _id And the data inserted this time _id identical, If duplicate data is found, An error will be reported in this insertion operation DuplicateKeyError. MongoDB Of _id The value of the field is unique ( similar MySQL Primary key of ), If not assigned manually, Will be automatically generated during database insertion. If exist, You can ignore this insertion, Or overwrite data If it does not exist, The insert. Weight removal at this time, When inserting data, Judge the data inserted this time, Whether it already exists in the database. The database is new, There is no data in it.

movie collector duplicate records

There are three ways, For different situations.












Movie collector duplicate records