METHOD, DEVICE AND COMPUTER PROGRAM PRODUCT FOR CLASSIFYING AN OBSCURED OBJECT IN AN IMAGE
20220301324 · 2022-09-22
Assignee
Inventors
Cpc classification
G06V10/768
PHYSICS
G06V30/248
PHYSICS
G06V20/647
PHYSICS
G06T2200/08
PHYSICS
International classification
G06V10/74
PHYSICS
G06V10/75
PHYSICS
Abstract
The disclosure relates to image recognition, in particular it relates to a method for classifying an obscured object, by identifying an object in an image as an obscured object, calculating a 3D space for the image, defining a 3D coordinate for the obscured object, retrieving a plurality of 3D models from a first database, rendering a 2D model of each one of the retrieved 3D models, calculating a similarity score between the rendered 2D representation and the obscured object, and classifying the obscured object as the object of the 3D model for which a highest similarity score was determined. The disclosure further relates to a device and a computer readable program for carrying out such a method.
Claims
1. A method for classifying an obscured object in a single 2D image, said obscured object being an object being partly hidden or not fully visible, the method comprising the steps of: identifying an obscured object in the image by: classifying objects in the image using an image search algorithm having an accuracy threshold value; and identifying the obscured object as an object falling below the accuracy threshold value; calculating a 3D coordinate space of the image; defining a 3D coordinate for the obscured object using the 3D coordinate space of the image; retrieving a plurality of 3D models of objects from a first database; for each 3D model of the plurality of 3D models: defining a value for a translation parameter and for a scale parameter for the 3D model corresponding to the 3D coordinate of the obscured object in the 3D coordinate space of the image; for a plurality of values for a rotation parameter of the 3D model: rendering a 2D representation of the 3D model having the defined values of the translation parameter and the scale parameter and the value of the rotation parameter; and calculating a similarity score between the rendered 2D representation and the obscured object; determining a highest similarity score calculated for the plurality of 3D models; and upon determining that the highest similarity score exceeds a threshold similarity score, classifying the obscured object as the object of the 3D model for which the highest similarity score was determined.
2. The method according to claim 1, further comprising the steps of: verifying (S14) the classification of the obscured object by: inputting the 2D representation of the 3D model resulting in the highest similarity score image to the image search algorithm; upon the 2D representation exceeding the accuracy threshold value, verifying the classification of the obscured object; and upon the 2D representation being below the accuracy threshold value, not verifying the classification of the obscured object.
3. The method according to claim 1, further comprising the step of determining an object type of the obscured object.
4. The method according to claim 3, wherein the image depicts a scene, and wherein the method further comprises the step of determining a context for said depicted scene, and wherein the object type is determined based on the context.
5. The method according to claim 4, wherein the object type is further determined based on the 3D coordinate of the obscured object in the depicted scene.
6. The method according to claim 3, wherein the object type is determined based on the size of the obscured object, the color of the obscured object, or the shape of the obscured object.
7. The method according to claim 3, wherein the step of retrieving the plurality of 3D models comprises filtering the first database to retrieve a selected plurality of 3D models corresponding to the determined object type.
8. The method according to claim 3, further comprising: requesting input from a user pertaining to the object type of the obscured object; and receiving an input from the user, and wherein the step of determining the object type is based on the input.
9. The method according to claim 1, further comprising: receiving, from the image search algorithm, a list of one or more possible classifications for the obscured object, each possible classification having an associated accuracy below the accuracy threshold value, each possible classification having an associated object type, wherein the step of retrieving the plurality of 3D models comprises filtering the first database to retrieve a selected plurality of 3D models corresponding to one or more of the object types of the possible classifications.
10. The method according to claim 1, wherein the plurality of values for a rotation parameter of the 3D model defines a rotation of the 3D model around a single axis in the 3D coordinate space.
11. The method according to claim 10, wherein the axis is determined by calculating a plane in the 3D coordinate space of the image on which the obscured object is placed; and defining the axis as an axis being perpendicular to said plane.
12. The method according to claim 1, further comprising extracting image data corresponding to the obscured object from the image, adding the extracted image data as an image to be used by the image search algorithm, the added image being associated with the object of the 3D model for which the highest similarity score was determined.
13. The method according to claim 1, wherein the image search algorithm uses a second database comprising a plurality of 2D images, each 2D image depicting one of the objects of the 3D models comprised in the first database, wherein the image search algorithm maps image data extracted from the image and defining an object to the plurality of 2D images in the second database to classify objects in the image, each classification having an accuracy value.
14. The method according to claim 1, wherein the image search algorithms take the image or part(s) of the image as input, and provides as output: one or more identified object in the image or part(s) of the image, wherein each object is associated with metadata comprising: a list of one or more possible classifications of the identified object, and for each possible classification, an accuracy of the classification, wherein the step of identifying the obscured object comprises identifying an object among the one or more identified objects where the highest accuracy of the possible classifications in the associated metadata is below the accuracy threshold value.
15. A device for classifying an obscured object in a single 2D image, said obscured object being an object being partly hidden or not fully visible, the device comprising one or more processors configured to: identify an obscured object in the image by: classify objects in the image using an image search algorithm having an accuracy threshold value; and identify the obscured object as an object falling below the accuracy threshold value; calculate a 3D coordinate space of the image; define a 3D coordinate for the obscured object using the 3D coordinate space of the image; retrieve a plurality of 3D models of objects from a first database; for each 3D model of the plurality of 3D models: define a value for a translation parameter and for a scale parameter for the 3D model corresponding to the 3D coordinate of the obscured object in the 3D coordinate space of the image; for a plurality of values for a rotation parameter of the 3D model: render a 2D representation of the 3D model having the defined values of the translation parameter and the scale parameter and the value of the rotation parameter; and calculate a similarity score between the rendered 2D representation and the obscured object; determine a highest similarity score calculated for the plurality of 3D models; and upon determining that the highest similarity score exceeds a threshold similarity score, classify the obscured object as the object of the 3D model for which the highest similarity score was determined.
16. The device of claim 15, further comprising a transceiver configured to: receive an image from a mobile device, wherein the transceiver is further configured to, upon determining, by the one or more processors, that the highest similarity score exceeds the threshold similarity score, transmit data indicating the classification of the obscured object to the mobile device, wherein the transceiver is further configured to, upon determining, by the one or more processors, that the highest similarity score does not exceed the threshold similarity score, transmit data indicating unsuccessful classification of the obscured object.
17. A computer program product comprising computer-readable program code to be executed by one or more processors when retrieved from a non-transitory computer-readable medium, the program code including instructions to: identify an obscured object, being an object that is partly hidden or not fully visible, in a single 2D image, by: classify objects in the image using an image search algorithm having an accuracy threshold value; and identify the obscured object as an object falling below the accuracy threshold value; calculate a 3D coordinate space of the image; define a 3D coordinate for the obscured object using the 3D coordinate space of the image; retrieve a plurality of 3D models of objects from a first database; for each 3D model of the plurality of 3D models: define a value for a translation parameter and for a scale parameter for the 3D model corresponding to the 3D coordinate of the obscured object in the 3D coordinate space of the image; and for a plurality of values for a rotation parameter of the 3D model: render a 2D representation of the 3D model having the defined values of the translation parameter and the scale parameter and the value of the rotation parameter; and calculate a similarity score between the rendered 2D representation and the obscured object; determine a highest similarity score calculated for the plurality of 3D models; and upon determining that the highest similarity score exceeds a threshold similarity score, classify the obscured object as the object of the 3D model for which the highest similarity score was determined.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0088] The above, as well as additional objects, features and advantages of the present invention, will be better understood through the following illustrative and non-limiting detailed description of preferred embodiments of the present invention, with reference to the appended drawings, where the same reference numerals will be used for similar elements.
[0089]
[0090]
[0091]
[0092]
[0093]
[0094]
[0095]
[0096]
DESCRIPTION OF EMBODIMENTS
[0097] The present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which currently preferred embodiments of the invention are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided for thoroughness and completeness, and fully convey the scope of the invention to the skilled person.
[0098] It will be appreciated that the present invention is not limited to the embodiments shown. Several modifications and variations are thus conceivable within the scope of the invention which thus is exclusively defined by the appended claims.
[0099] Image recognition is a common tool for searching and scanning images to identify and classify objects within said image. The aim of an image recognition algorithm (sometimes called image classification algorithms) is to return information about different objects that are present in an image. As previously mentioned, there are limitations to the typically used methods and programs for classifying objects within an image. Some objects may not be fully visible in the image and are thus difficult to classify due to the distortion.
[0100] When an object in an image is not fully visible it may be partly hidden, such an object is said to be an obscured object. Since part of the object is not visible from the viewpoint, it has to be taken into consideration that the object may not look as it is perceived from the viewpoint.
[0101] The method will hereafter be described with reference to
[0102] With reference to
[0103] The image 100 further comprises a plurality of objects, free standing objects and obscured objects. A first obscured object 102 is placed on the table 106 behind a visible object, here a bowl 118. A vase 120 is another free standing visible object placed on the table 106. A second obscured object 104, a chair, is placed behind the table 106. A third obscured object 103 is placed on the table 106 partly hidden behind the vase 120.
[0104] To classify the obscured objects as a specific object, a device comprising one or more processors can be used. The one or more processors may be configured to execute a computer program product comprising code sections having instructions for a method of how to classify an obscured object.
[0105] In order to classify and determine what kind of object the first, second, and third obscured objects 102, 104, 103 are, the first, second, and third obscured objects 102, 104, 103 are first to be identified as being obscured objects. Such a method will now be described in conjunction with
[0106] The step of identifying the obscured object 102, 103, 104 comprises identifying an object among the one or more identified objects 102, 106, 118, 120, 104, 103 where the highest accuracy of the possible classifications in the associated metadata is below the accuracy threshold value.
[0107] Objects that fall above the accuracy threshold value are considered visible objects and are classified according to the output from the image search algorithm. As described above, there are many different image search algorithms that may be used.
[0108] The objects falling below the accuracy threshold are identified as being obscured objects. Thereafter, the process of classifying the obscured object takes place.
[0109] Based on the identified and classified objects, in some embodiments a context of the image is determined. The context may for instance be a living room given that the identified objects are for example a sofa, an arm chair, a rug, and a lamp etc. If the identified objects are a shower, a sink and a toilet, the context may be determined to be a bathroom. The context of the scene of the image may constitute the determination S07 of an object type for the obscured object. It is to be noted that there are many different options for how to determine an object type. By determining S07 an object type, the program may need less processing power in order to accurately classify the obscured object 102, 104, 103.
[0110] The image 100 may be provided as a 2D image. To classify S12 the identified obscured object, a 3D coordinate space for the image is calculated S04.
[0111] To obtain a high accuracy classification of the obscured object, a 3D coordinate space of the image along a X, Y Z plane/direction is thus calculated S04. The 3D coordinate space may be determined S04 through applying an algorithm to the image. It is to be noted that there are many algorithms that may be suitable for calculating S04 a 3D coordinate space. By way of example, the 3D coordinate space may be calculated S04 by applying a Plane detection algorithm, or a RANSAC algorithm, or a Hough algorithm, etc., to the image 100.
[0112] With the use of the 3D coordinate space of the image, a 3D coordinate for the obscured object is defined S06, for example using any one of the above example algorithms. The 3D coordinate contains information regarding the location of the obscured object in the image 100. The 3D coordinate may contain information relating to which object type the obscured object is. The 3D coordinate may contain information regarding size of the obscured object. The 3D coordinate of the obscured object may be used to determine S07 the object type. By way of example, the 3D coordinate may contain information regarding the obscured object being placed in a single plane of the 3D coordinate space of the image 100. The obscured object may be in the plane of a wall; thus the object type is an object that is suited to be on a wall. If the obscured object 104 is determined to be placed on a floor, the program will not consider the obscured object as for example a painting or a ceiling lamp. Accordingly, the processing time of the method for classifying an obscured object may be reduced. In some embodiments, the object type is determined S07 based on the size of the obscured object, the color of the obscured object, or the shape of the obscured object. For example, if the obscured object is determined to be a large sized object, the object type may be determined as furniture.
[0113] In some embodiments, the program requests an input from a user. The input may be requested with the intention to obtain a user input pertaining to the object type of the obscured object. Accordingly, the device may receive the input made by a user regarding the object type of the obscured object. The user input may be used to determine S07 the object type of the obscured object. By way of example, the user may input that the obscured object is of a ‘cup type’, or ‘suitable to place on a table’, etc. The input may in some embodiments pertain to the context of the depicted scene. By way of example, the user may input that the context of the image is a living room, a bed room or a hall way.
[0114] The classification of the obscured object is done by comparing the obscured object to a first database (reference 606 in
[0115] The first database 606 may be filtered such that a plurality of 3D models corresponding to the determined object type is retrieved S08 therefrom. The first database 606 may thus be filtered based on the object type, and/or the context and/or an input by user. It is to be noted that the first database 606 may be filtered in many ways. For example, the first database may be filtered to only output a selected plurality of 3D models corresponding to one or more object types. The object type(s) to use for filtering may be determined as described above. In one embodiment, a list of one or more possible classifications for the obscured object is received from (outputted by) the image search algorithm, where each possible classification having an associated accuracy below the accuracy threshold value. Each possible classification may have an associated object type which then may be used for filtering.
[0116] Thus, a selected plurality of 3D models may be retrieved S08. This may reduce the needed processing power of the program and processor executing the program code. The selected plurality of 3D models may as described above be based on the context of the image, or the 3D coordinate of the obscured object, etc.
[0117] After the plurality of 3D models is retrieved S08, for each 3D model of the plurality of 3D models, the program defines a value for a translation parameter and for a scale parameter for the 3D model corresponding to the 3D coordinate of the obscured object in the 3D coordinate space of the image. The translation parameter relates to how the 3D model can be moved around in space to match the location of the obscured object. The scale parameter relates to the size of the 3D model in relation to the obscured object. A plurality of values for a rotation parameter of each 3D model in the plurality of 3D model is further determined. The plurality of values for a rotation parameter of the 3D model may define a rotation of the 3D model around a single axis in the 3D coordinate space. The axis may be determined by calculating a plane in the 3D coordinate space of the image on which the obscured is placed and defining the axis as an axis being perpendicular to said plane. In other embodiments, a plurality of axes is used as basis for defining the plurality of values for the rotation parameter.
[0118] Turning to
[0119] For each 3D model of the retrieved S08 plurality of 3D models, a value for a translation parameter and for a scale parameter for the 3D model corresponding to the 3D coordinate of the obscured object 102, 103, 104 in the 3D coordinate space are defined. Then, for a plurality of rotation parameters, the program renders a 2D representation of said 3D model having the different parameters. The 2D representation rendered of the 3D model has the defined values of the translation parameter and the scale parameter and the value of the rotation parameter. By comparing the rendered 2D representation with the obscured object, a similarity score between the rendered 2D representation and the obscured object is calculated. For each 3D model of the plurality of 3D models, a highest similarity score is determined. A high similarity score between the obscure object and the 3D model means a better correlation between the obscured object and the 3D model and thus improves the chance of an accurate classification according to the class/definition/product name/etc. of the 3D model. A low similarity score points to the fact that the 3D model does not correspond to the obscured object. The highest similarity score for each 3D model is then used for determining S10 a highest similarity score calculated for the plurality of 3D models.
[0120] It should be noted that the above process of calculating a similarity score for each of the retrieved 3D models may be performed in parallel by the device, using parallel computing, or be performed in a distributed manner using a plurality of sub-devices (not shown in
[0121] By way of example using the first obscured 102 object of
[0122] By way of another example using the third obscured object 103 of
[0123] By comparing and calculating a similarity score between the obscured object and the first database 606 having a vast amount of 3D models, the classification may be done independently of the field of view of the image, and the position/rotation of the obscured object in the 3D coordinate space of the image.
[0124] Upon determining S11 that the highest similarity score for all of the retrieved 3D objects 402-408 exceeds a threshold similarity score, the obscured object is classified S12 as the object of the 3D model for which the highest similarity score was determined S10. The obscured object may for example be classified as a certain product ID or product name that corresponds to the selected 3D model. In other words, the obscured object is classified as the 3D model having the highest similarity score. The threshold similarity score determines whether it is likely that the 3D model is a match to the obscured object. A similarity score below the threshold value represents that it is not likely of the 3D model corresponding to the obscured object.
[0125] Image data corresponding to the obscured object may be extracted S16 from the image. This image data may be added S18 as an image to be used by the image search algorithm. In such case, the added image may be associated with the object of the 3D model for which the highest similarity score was determined S10. The image search algorithm may use a second database 608 comprising a plurality of 2D images. Each 2D image may depict one of the objects of the 3D models comprised in the first database 606. It is preferred that for each 3D model, the second database 608 comprises at least a minimum number of different images, such as at least 100, 130, 200, etc., images. When using the second database 608 with the image search algorithm, the image search algorithm maps the image data extracted from the image and defining an object to the plurality of 2D images in the second database 608 to classify objects in the image, each classification having an accuracy value.
[0126] In some embodiments, the program comprises code segments that may verify S14 the classification of the obscured object. In such embodiments, the image search algorithm is used to verify S14 the classification of the obscured object. The 2D representation of the 3D model having the highest similarity score is input into the image search algorithm. If the 2D representation exceeds the accuracy threshold value, the object classification is verified. If the 2D representation falls below the accuracy threshold value, the classification of the obscured object is not verified. In some embodiments the device, or classifying device, 600 comprising one or more processors 602 for performing the method described above further comprises a transceiver 604. The transceiver 604 is configured to receive an image from a mobile device 602 capturing the image. The transceiver 604 is configured to transmit data indicating the classification of the obscured object to the mobile device 602. The transceiver 604 transmits such data upon determining, by the one or more processors, that the highest similarity score exceeds the threshold similarity score. When the highest similarity score does not exceed the threshold similarity score, the transceiver 604 transmits data to the mobile device 602 indicating that the classification of the obscured object was unsuccessful. Thus, the transceiver 604 sends a message to the mobile device 602 containing indications that there was no match for the obscured object in the first database 606 and no classification of the obscured object was achieved. It is to be noted that the transceiver 604 may transmit the data to the mobile device 602 and the first/second database 606, 608 through a wired or through a wireless connection. The transceiver 604 may comprise a plurality of transceivers, or a plurality of separate receivers and transmitters, for communication with the different entities of the system described in
[0127] The person skilled in the art realizes that the present invention by no means is limited to the preferred embodiments described above. On the contrary, many modifications and variations are possible within the scope of the appended claims. For example, step S07 in
[0128] Additionally, variations to the disclosed embodiments can be understood and effected by the skilled person in practicing the claimed invention, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measured cannot be used to advantage.
[0129] The systems and methods disclosed hereinabove may be implemented as software, firmware, hardware or a combination thereof. In a hardware implementation, the division of tasks between functional units referred to in the above description does not necessarily correspond to the division into physical units; to the contrary, one physical component may have multiple functionalities, and one task may be carried out by several physical components in cooperation. Certain components or all components may be implemented as software executed by a digital signal processor or microprocessor, or be implemented as hardware or as an application-specific integrated circuit. Such software may be distributed on computer readable media, which may comprise computer storage media (or non-transitory media) and communication media (or transitory media). As is well known to a person skilled in the art, the term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information, and which can be accessed by a computer. Further, it is well known to the skilled person that communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.