METHOD AND APPARATUS FOR DEPTH-MAP ESTIMATION OF A SCENE
20220005214 · 2022-01-06
Inventors
- Manu Alibay (Alfortville, FR)
- Olivier Pothier (Sceaux, FR)
- Victor Macela (Paris, FR)
- Alain Bellon (Grenoble, FR)
- Arnaud Bourge (Paris, FR)
Cpc classification
G06T7/521
PHYSICS
H04N2013/0081
ELECTRICITY
H04N13/254
ELECTRICITY
H04N13/271
ELECTRICITY
International classification
G06T7/521
PHYSICS
H04N13/254
ELECTRICITY
Abstract
The method of determination of a depth map of a scene comprises generation of a distance map of the scene obtained by time of flight measurements, acquisition of two images of the scene from two different viewpoints, and stereoscopic processing of the two images taking into account the distance map. The generation of the distance map includes generation of distance histograms acquisition zone by acquisition zone of the scene, and the stereoscopic processing includes, for each region of the depth map corresponding to an acquisition zone, elementary processing taking into account the corresponding histogram.
Claims
1. A device comprising: a time of flight sensor configured to generate a distance map of a scene, the time of flight sensor being configured to generate a corresponding distance histogram for each acquisition zone of the scene; and a stereoscopic image acquisition device configured to acquire two images of the scene at two different viewpoints, wherein the device is configured to identify regions of a depth map to be generated from the two images that corresponds to the distance map, generate a range of values of disparities, region by region, from extreme values of the distances of the corresponding histogram, and extrapolate distances of the scene from the disparities between the two images, wherein, for each region, the extrapolation of the distances of the scene is performed based on the range of values of the disparities for a corresponding region.
2. The device of claim 1, wherein the device comprises a stereoscopic processor and an elementary processor configured to cause the device to identify the regions, generate the range of disparity values, and extrapolate the distances.
3. The device of claim 1, wherein the depth map has a resolution that is at least one thousand times greater than a resolution of the distance map measured by time of flight, the resolution of the distance map measured by time of flight being equal to a total number of acquisition zones.
4. The device of claim 1, wherein the distance map of the scene obtained by time of flight measurements includes from ten to one thousand acquisition zones.
5. The device of claim 1, wherein the device comprises a camera, a mobile telephone, or a touchscreen tablet.
6. A method for determining a depth map of a scene, the method comprising: generating, at a time of flight sensor, a distance map of a scene, the time of flight sensor being configured to generate a corresponding distance histogram for each acquisition zone of the scene; and acquiring, at a stereoscopic image acquisition device, two images of the scene from two different viewpoints; identifying, at the stereoscopic image acquisition device, regions of a depth map to be generated from the two images that corresponds to the distance map; generating, at the stereoscopic image acquisition device, a range of values of disparities, region by region, from extreme values of the distances of the corresponding histogram; and extrapolating, at the stereoscopic image acquisition device, distances of the scene from the disparities between the two images, wherein, for each region, the extrapolating of the distances of the scene is performed based on the range of values of the disparities for a corresponding region.
7. The method of claim 6, further comprising identifying, using a stereoscopic processor and an elementary processor, the regions, generate the range of disparity values, and extrapolate the distances.
8. The method of claim 6, wherein the depth map has a resolution that is at least one thousand times greater than a resolution of the distance map measured by time of flight, the resolution of the distance map measured by time of flight being equal to a total number of acquisition zones.
9. The method of claim 6, wherein the distance map of the scene obtained by time of flight measurements includes from ten to one thousand acquisition zones.
10. The method of claim 6, wherein the time of flight sensor and the stereoscopic image acquisition device are integrated in a device, wherein the device comprises a camera, a mobile telephone, or a touchscreen tablet.
11. A device comprising: a time of flight sensor configured to generate a distance map of a scene, the distance map comprising a plurality of acquisition zones of the time of flight sensor; and a stereoscopic image acquisition device configured to acquire two images of the scene at two different viewpoints, wherein the device is configured to determine a depth map by extrapolating distances of the scene from disparities between the two images of the scene, for each of the plurality of acquisition zones, determine a corresponding region of the depth map, for each of the plurality of acquisition zones, compare depth information from the distance map with depth information from the corresponding region of the depth map to obtain a level of concordance, in response to determining that the level of concordance is below a threshold, determine that the depth information from the corresponding region of the depth map as being unreliable, and in response to determining that the level of concordance is above the threshold, determine that the depth information from the corresponding region of the depth map as being reliable.
12. The device of claim 11, wherein the device comprises a stereoscopic processor and an elementary processor configured to cause the device to determine the depth map, determine the corresponding region of the depth map, compare the depth information, determine that the depth information is unreliable, and determine that the depth information is reliable.
13. The device of claim 11, wherein the device is configured to compare depth information from the distance map with depth information from the corresponding region of the depth map by comparing a contour of a histogram of the acquisition zone with a contour of a histogram of the corresponding region of the depth map.
14. The device of claim 11, wherein the depth map has a resolution that is at least one thousand times greater than a resolution of the distance map measured by time of flight, the resolution of the distance map measured by time of flight being equal to a total number of acquisition zones.
15. The device of claim 11, wherein the distance map of the scene obtained by time of flight measurements includes from ten to one thousand acquisition zones.
16. The device of claim 11, wherein the device comprises a camera, a mobile telephone, or a touchscreen tablet.
17. A method for determining a depth map of a scene, the method comprising: generating, at a time of flight sensor, a distance map of a scene, the distance map comprising a plurality of acquisition zones of the time of flight sensor; acquiring, at a stereoscopic image acquisition device, two images of the scene at two different viewpoints; determining a depth map by extrapolating distances of the scene from disparities between the two images of the scene; for each of the plurality of acquisition zones, determining a corresponding region of the depth map; for each of the plurality of acquisition zones, comparing depth information from the distance map with depth information from the corresponding region of the depth map to obtain a level of concordance; in response to determining that the level of concordance is below a threshold, determining that the depth information from the corresponding region of the depth map as being unreliable; and in response to determining that the level of concordance is above the threshold, determining that the depth information from the corresponding region of the depth map as being reliable.
18. The method of claim 17, further comprising comparing depth information from the distance map with depth information from the corresponding region of the depth map by comparing a contour of a histogram of the acquisition zone with a contour of a histogram of the corresponding region of the depth map.
19. The method of claim 17, wherein the depth map has a resolution that is at least one thousand times greater than a resolution of the distance map measured by time of flight, the resolution of the distance map measured by time of flight being equal to a total number of acquisition zones.
20. The method of claim 17, wherein the distance map of the scene obtained by time of flight measurements includes from ten to one thousand acquisition zones.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0061] Other advantages and features of the invention will become apparent on examining the detailed description of non-limiting embodiments and the appended drawings, in which:
[0062]
[0063]
DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
[0064]
[0065] Stereoscopic acquisition involves acquisition of two images 201, 202 of the scene 10 from two different viewpoints.
[0066] The respective projections 111 and 112 of the same object-point of the scene 10 consequently feature a parallax, i.e., a deviation between their relative positions in the images 201, 202, also termed the disparity 211.
[0067] Also, the method includes generation of a distance map 400 obtained by time of flight measurement.
[0068] The generation of the distance map 400 of the scene 10 moreover includes a time of flight measurement, i.e., a measurement of the time elapsed between the emission of a light signal onto the scene and the reception of that signal when reflected. This time measurement is proportional to the distance between an emitter-receiver 40 and the various objects of the scene 10, which distance is also referred to as the depth of the various objects of the scene.
[0069] This measurement is effected for from 10 to 500 acquisition zones 420 in accordance with a matrix distribution in the field of view of the emitter-receiver 40.
[0070] Geometrical relations between the positions of the viewpoints of the two images 201, 202 and the viewpoint of the time of flight measurement on the one hand and the optical characteristics such as the field of view and the distortions of the various acquisitions on the other hand make it possible to establish precise matches of the various acquisition zones 420 with regions 120 of a depth map 100. In other words, to each acquisition zone 420 of the time of flight measurement there corresponds a region 120 of the depth map.
[0071] In some time of flight measurement devices, also known as “time of flight sensors”, the distribution of the distances in each acquisition zone 420 is communicated in the form of a histogram 410 in addition to an overall measurement of the distance to the zone.
[0072] The generation of the distance map 400 therefore includes generation of a distance histogram 410 for each acquisition zone 420 of the scene.
[0073] Moreover, stereoscopic processing 300 is employed, including extrapolation 210 of the distances of the scene from the disparities 211 between the two images 201, 202.
[0074] The stereoscopic processing 300 includes elementary processing 310 before, during and/or after the distance extrapolation 210.
[0075] For each region 120 of the depth map 100 corresponding to an acquisition zone 420 the elementary processing 310 takes into account the histogram 410 of the acquisition zone 420 and notably makes it possible to limit and/or to improve the reliability of the result of the extrapolation 210 and/or to add depth information additional to that obtained by the extrapolation 210.
[0076] According to a first example, the elementary processing 310 advantageously exploits on the one hand the fact that the maximum distance that can be measured by time of flight measurement is generally greater than a stereoscopic range ceiling value 312 corresponding to the maximum identifiable distance in the extrapolation 210 of the distances of the scene and on the other hand the fact that that minimum distance measurable by time of flight measurement is generally less than a stereoscopic range floor value 313 corresponding to the minimum identifiable distance in the extrapolation 210 of the distances of the scene.
[0077] As shown in
[0078] In fact, the disparity in stereoscopy is proportional to the reciprocal of the depth of the corresponding object. At a sufficiently great distance a disparity will therefore no longer be detectable, an optical object situated at infinity introducing no parallax.
[0079] Moreover, at sufficiently small distances, an object can introduce a disparity value that is undetectable because it is greater than a maximum disparity or because of optical distortion caused by the viewing angle.
[0080] For this example and hereinafter it is to be understood that an expression of the type “the histogram does not include any distance less than a ceiling value” must be understood as meaning “the histogram does not include any distance less than a ceiling value by an amount sufficient to be pertinent.” In fact, in time of flight sensors uniform noise levels can be present at all distances and spurious occlusions can introduce a limited number of erroneous measurements in the histograms.
[0081] For example, for a fundamental deviation of 12 mm between the respective viewpoints of the images 201 and 202 and with a resolution of 12 megapixels, the stereoscopic range ceiling value 312 can be from approximately 1.5 meters to approximately 2.5 meters, whereas the maximum range of the time of flight measurement can be greater than 4 meters. For its part, the minimum range of the time of flight measurement can be equal to or very close to 0 meter.
[0082] The elementary processing 310 also involves the identification of the corresponding regions 122 termed out-of-range regions.
[0083] Therefore, the extrapolation 210 is then not done in the identified out-of-range region(s) 122, where appropriate (in fact it is not possible for there to be no out-of-range region in the scene 10).
[0084] Moreover, the elementary processing 310 advantageously includes assigning a default depth 314 obtained from the distances of the corresponding histogram 412 to the at least one out-of-range region 122.
[0085] For example, the default depth can be equal to or greater than the greatest distance from the corresponding histogram 412 or equal to the mean value of the distances of the corresponding histogram 412, the median value of the distances of the corresponding histogram 412 or the distance from the corresponding histogram 412 having the largest population.
[0086] Thus in this first example the elementary processing 310 is done before the extrapolation 210 of distances and makes it possible on the one hand to economize on calculation time and resources and on the other hand to add depth information to regions 122 of the depth map that would not have any such information without the time of flight measurement.
[0087] The elementary processing 310 can also include assigning a constant depth 316 to a region 126 termed a plane region. The constant depth 316 is obtained from a histogram 416 representing a measurement of a substantially plane surface substantially perpendicular to the measurement optical axis.
[0088] A histogram 416 representing a situation of this kind includes a single distance measurement or substantially a single distance measurement, i.e., it is a histogram including a single group of distances the mid-height width 417 of which is below a threshold width. The threshold width may be, for example, less than five distance ranges or five bars of the histogram.
[0089] The constant depth 316 may be chosen as the measured largest population distance or the mean value or the median value from the histogram 416.
[0090] Because of the reliability of a measurement of this kind as to the plane nature of the part of the scene corresponding to the corresponding acquisition zone 426, the extrapolation 210 is not needed. Nevertheless, if the group of distances 416 representing a plane surface is not the only measurement present in the histogram (in the sense of measurements obtained in an insufficient quantity to be pertinent), then it will be necessary to carry out the extrapolation 210 in the corresponding region.
[0091]
[0092] In this second embodiment, the elementary processing 310 isolates the extreme values 324 of the distances present in the histograms 410 of the various acquisition zones 420.
[0093] These extreme distance values 324 are translated into extreme disparity values that form a possible range 322 of disparity values for any point of the corresponding region 120.
[0094] In fact, as a disparity value can be extrapolated to a depth, a depth measurement enables calculation of the equivalent disparity.
[0095] This possible range of disparity values 322 therefore enables use of the extrapolation 210 of the distances of the scene over disparity values limited to this range 322 of possible disparity values.
[0096] In other words, the elementary processing 310 includes in this example the generation of a range of disparity values 322 region by region from extreme values 324 of the distances from the corresponding histogram 410. The extrapolation 210 of the distances of the scene is then done for each region based on disparity values 211 included in the corresponding range 322.
[0097] This embodiment is advantageous on the one hand in terms of the amount of calculation used in each extrapolation 210 of the distances of the scene 10 per region 420.
[0098] On the other hand, this embodiment makes it possible to enhance the reliability of the determination of the depth map, notably under conditions of a surface without texture or a surface with repetitive patterns, for which the identification of the projections 111, 112 of the same object 11 of the scene is difficult and often leads to errors.
[0099] Moreover, the resolution (in the sense of the quantisation step) of a time of flight measurement is constant relative to the distance of the measured object.
[0100] On the other hand, the disparity values are proportional to the reciprocal of the corresponding distance. An elementary disparity variation will therefore induce a resolution (in the sense of the quantization step) of the estimated distance of decreasing accuracy as the distance of the measured object increases.
[0101] Therefore, thanks to the range 322 of possible disparity values, it is possible to evaluate a disparity step corresponding to the accuracy of the time of flight measurement, for example, 2 cm, for the range 324 of measured distances. The match between the two projections 111, 112 of the same object 11 in the two images 201, 202 will then be achieved in accordance with this disparity step, enabling optimisation of the extrapolation calculations region by region 120 for a given order of magnitude of distance.
[0102]
[0103] In this embodiment the depth information obtained by stereoscopy and by time of flight measurement is compared in order to generate a measurement evaluation criterion, in this instance a level of concordance between two histogram contours.
[0104] The elementary processing 310, here carried out after the stereoscopic extrapolation 210 of depths, therefore reconstructs region 120 by region a histogram 330 of the extrapolated distances.
[0105] For each acquisition zone 420 and corresponding region 120, the contour 334 of the histogram 330 of the extrapolated distances is compared to the contour 414 of the histogram 410 of the distances generated by time of flight measurement. Here by contour is to be understood an envelope of the histogram that can be obtained by interpolation of the values. This makes it possible to circumvent the different resolutions of the histograms 330 and 410.
[0106] This comparison can, for example, be carried out by means of a standard comparison method, such as the least squares method.
[0107] The result of the comparison yields a level of concordance that enables evaluation of the similarity between the two types of depth measurement for the same scene.
[0108] A level of concordance close to 1 bears witness to successful stereoscopic processing whereas a level of concordance below a concordance threshold bears witness to a divergence between the two measurement methods and therefore a relatively unreliable depth map.
[0109]
[0110] The device DIS includes stereoscopic image acquisition device 20 configured to implement stereoscopic acquisition as described above, a time of flight sensor 40 configured to generate a distance map 400 of the scene in a manner such as described above, a stereoscopic processor 30 configured to implement the stereoscopic processing 300 in a manner such as described above, and elementary processor 31 configured to implement the elementary processing 310 in a manner such as described above.
[0111] For example, the stereoscopic acquisition device 20 includes two lenses with a focal length between 20 mm and 35 mm inclusive, a field of view between 60° and 80° inclusive, and parallel optical axes, as well as two 12 megapixel image sensors defining two viewpoints aligned horizontally and spaced by a fundamental deviation of 12 mm.
[0112] For example, the time of flight sensor 40 is between the two viewpoints and is of the compact all-in-one sensor type. The time of flight sensor 40 can function in the infrared spectrum at 940 mm, have a field of view compatible with that of the stereoscopic acquisition device 20, a range of 3.5 m, an accuracy of 2 cm, low energy consumption (20 W when idle and 35 mW in operation), a matrix of 5*3 acquisition zones or 15*9 acquisition zones, and an autonomous 32-bit calculation unit.
[0113] The stereoscopic processor 30 and the elementary processor 31 can optionally be integrated into the same microcontroller type calculation unit integrated circuit.
[0114] Moreover, the invention is not limited to these embodiments but encompasses all variants thereof, for example, the various embodiments described above can be adapted to the constraints of a particular stereoscopic processing and can moreover be combined with one another. Moreover, the quantitative information given, such as that relating to the performance of the various equipment, is given by way of example in the framework of a definition of a technological context.