SYSTEM AND METHODS FOR AGGREGATING FEATURES IN VIDEO FRAMES TO IMPROVE ACCURACY OF AI DETECTION ALGORITHMS

Abstract

Methods and systems are provided for aggregating features in multiple video frames to enhance tissue abnormality detection algorithms, wherein a first detection algorithm identifies an abnormality and aggregates adjacent video frames to create a more complete image for analysis by an artificial intelligence detection algorithm, the aggregation occurring in real time as the medical procedure is being performed.

Claims

1. A system for identifying tissue abnormalities in video data generated by an optical endoscopy machine, the endoscopy machine outputting real-time images of an interior of an organ as video frames, the system comprising: at least one video monitor operably coupled to the endoscopy machine to display the video frames output by the endoscopy machine; a memory for storing non-volatile programmed instructions; and a processor configured to accept the video frames output by the endoscopy machine and to store the video frames in the memory, the processor further configured to execute the non-volatile programmed instructions to: analyze a first video frame using artificial intelligence to determine if any part of a first tissue abnormality is visible within the first video frame, and if the first video frame is determined to include the first tissue abnormality, analyze adjacent video frames to locate other parts of the first tissue abnormality; generate a reconstructed image of the first tissue abnormality that spans the first video frame and adjacent video frames in which the other parts of the first tissue abnormality are located; analyze, using artificial intelligence, the reconstructed image to classify the first tissue abnormality; and display on the at least one video monitor a bounding box surrounding a portion of the reconstructed image that is visible in a current video frame.

2. The system of claim 1, wherein the programmed instructions, when executed by the processor, generate the reconstructed image of the first tissue abnormality by aggregating at least one of the following in the first video frame and the adjacent video frames: a boundary of the first tissue abnormality, a color of the first tissue abnormality, and a texture of the first tissue abnormality.

3. The system of claim 1, wherein the programmed instructions, when executed by the processor, generate and display on the at least one video monitor a textual description of a type of the first tissue abnormality.

4. The system of claim 1, wherein the programmed instructions, when executed by the processor, provide that if analysis of the adjacent video frames does not locate other parts of the first tissue abnormality, the first video frame is analyzed using artificial intelligence to classify the first tissue abnormality and a bounding box is displayed on the at least one video monitor surrounding the first tissue abnormality.

5. The system of claim 4, wherein the programmed instructions, when executed by the processor, generate and display on the at least one video monitor a textual description of a type of the first tissue abnormality.

6. The system of claim 1, wherein the processor further is configured to execute the programmed instructions to: analyze the reconstructed image to estimate a degree of completeness of the reconstructed image, and display on the at least one video monitor the estimate of the degree of completeness of the reconstructed image.

7. The system of claim 6, wherein the processor further is configured to execute the programmed instructions to: determine a direction of movement of a camera of the colonoscopy machines to acquire additional video frames for use in generating the reconstructed image; and display on the at least one video monitor an indicator of the direction of movement.

8. The system of claim 1, wherein the processor further is configured to execute the programmed instructions to: if analysis of the adjacent video frames detects a second tissue abnormality different from the first tissue abnormality, analyze the adjacent video frames to locate other parts of the second tissue abnormality.

9. The system of claim 1, wherein the programmed instructions, when executed by the processor, generate a reconstructed image of the first tissue abnormality by adding adjacent features extracted from the adjacent video frames to features extracted from the first video frame.

10. The system of claim 1, wherein the programmed instructions that implement the artificial intelligence includes a machine learning capability.

11. A method of identifying tissue abnormalities in video data generated by an optical endoscopy machine, the endoscopy machine outputting real-time images of an interior of an organ as video frames, the method comprising: acquiring the video frames output by the endoscopy machine; analyzing a first video frame using artificial intelligence to determine if any part of a first tissue abnormality is visible within the first video frame, and if the first video frame is determined to include the first tissue abnormality, analyzing adjacent video frames to locate other parts of the first tissue abnormality; generating a reconstructed image of the first tissue abnormality that spans the first video frame and adjacent video frames in which the other parts of the first tissue abnormality are located; analyzing, using artificial intelligence, the reconstructed image to classify the first tissue abnormality; and displaying on at least one video monitor the real time images from the endoscopy machine and a bounding box surrounding a portion of the reconstructed image that is visible in a current video frame.

12. The method of claim 11, wherein generating the reconstructed image of the first tissue abnormality comprises aggregating at least one of the following in the first video frame and the adjacent video frames: a boundary of the first tissue abnormality, a color of the first tissue abnormality, and a texture of the first tissue abnormality.

13. The method of claim 11, further comprising generating and displaying on the at least one video monitor a textual description of a type of the first tissue abnormality.

14. The method of claim 1, further comprising, if analysis of the adjacent video frames does not locate other parts of the first tissue abnormality: analyzing the first video frame using artificial intelligence to classify the first tissue abnormality; and displaying a bounding box on the at least one video monitor surrounding the first tissue abnormality.

15. The method of claim 14, further comprising generating and displaying on the at least one video monitor a textual description of a type of the first tissue abnormality.

16. The method of claim 11, further comprising: analyzing the reconstructed image to estimate a degree of completeness of the reconstructed image, and displaying on the at least one video monitor the estimate of the degree of completeness of the reconstructed image.

17. The method of claim 16, further comprising: determining a direction of movement of a camera of the colonoscopy machines to acquire additional video frames for use in generating the reconstructed image; and displaying on the at least one video monitor an indicator of the direction of movement.

18. The method of claim 11, further comprising, if analysis of the adjacent video frames detects a second tissue abnormality different from the first tissue abnormality, analyzing the adjacent video frames to locate other parts of the second tissue abnormality.

19. The method of claim 11, further comprising generating a reconstructed image of the first tissue abnormality by adding adjacent features extracted from the adjacent video frames to features extracted from the first video frame.

20. The method of claim 11, further comprising implementing the artificial intelligence to include a machine learning capability.

Description

V. BRIEF DESCRIPTION OF THE DRAWINGS

[0031] FIG. 1 is a schematic depicting an exemplary configuration of a system incorporating the principles of the present invention.

[0032] FIG. 2 is an exemplary flowchart depicting data processing in the inventive system.

[0033] FIG. 3 is a schematic depicting how an endoscopist might see a display screen showing just frame-based AI module detection predictions.

[0034] FIG. 4 depicts how an AI module configured in accordance with the principles of the present invention combines information for a single lesion over multiple frames.

[0035] FIG. 5 is a schematic depicting a two display screen arrangement showing how an endoscopist might see the outcome on the monitor for a system employing the multiple frame AI module of the present invention.

VI. DETAILED DESCRIPTION OF THE INVENTION

[0036] The present invention is directed to systems and methods for analyzing multiple video frames imaged by an endoscope with an artificial intelligence (“AI”) software module running on a general purpose or purpose-built computer to aggregate information about a potential tissue feature or abnormality, and to indicate to the endoscopist the location and extent of that feature or abnormality on a display viewed by the endoscopist. In accordance with the principles of the present invention, the AI module is programmed to make a preliminary prediction based on initially available information within a video frame, to aggregate additional information for a feature from additional frames, and preferably, to provide guidance to the endoscopist to direct him or her to move the imaging end of the endoscope to gather additional video frames that will enhance the AI module detection prediction.

[0037] Referring to FIG. 1, exemplary colonoscopy system 10 configured in accordance with the principles of the present invention is described. Patient P may be lying on an examination table (not shown) for a colonoscopy procedure using conventional colonoscope 11 and associated colonoscope CPU 12, which receives the image signals from the camera on board colonoscope 11 and generates video output 13, which may be displayed on monitor 14 located so as to be visible to the endoscopist. Video output 13 also is provided to computer 15, which is programmed with an AI module configured in accordance with the principles of the present invention as described below. Computer 15, which may be a general purpose or purpose-built computer, includes one of more processors, volatile and non-volatile memory, input and output ports, and is programmed to process video output 13 to generate AI augmented video output 16. The details of a colonoscopy procedure, including patient preparation and examination, and manipulation of colonoscope are well known to those skilled in the art.

[0038] Colonoscope 11 acquires real-time video of the interior of the patient's colon and large intestine from a camera disposed at the distal tip of the colonoscope once it is inserted in the patient. Data from colonoscope 11, including real-time video, is processed by computer to generate video output 13. As shown in FIG. 1, one output of computer 12 displayed in a first window on monitor 14 as real-time video of the colonoscopy procedure. Video output 13 also is provided to computer 15, which preferably generates an overlay on the video indicating areas of interest detected in displayed image identified by the inventive AI module running on computer 14, e.g., a polyp, lesion or tissue abnormality. In accordance with one aspect of the present invention, computer 15 also may display in a second window on monitor 14 information about the area of interest and the quality of the aggregated frames analyzed by the AI module to identify the area of interest. The AI software module running on computer 15 may be of many types, but preferably includes artificial intelligence decision-making ability and machine learning capability.

[0039] Referring now to FIG. 2, information flow in the inventive system is described. Video data captured from by a colonoscope of the interior of colon and large intestine of patient P is processed by colonoscopy computer 21 (corresponding to components 11 and 12 of FIG. 1). Each video frame from the live video feed is sent to computer 15 of FIG. 1, which performs steps 22-29 of FIG. 2. In particular, each video frame, labelled F.sub.0, from colonoscopy machine 21 is acquired at step 22 and analyzed by the processor of computer 15 at step 23. If the AI module detects a lesion at step 24 (“Yes” branch from decision box 24), additional frames of the video stream are analyzed, at step 25, to determine if the lesion is the same lesion as identified in the previous video frame. If the lesion in the current frame is determined to be a new lesion than previously identified (“No” branch from decision box 25), a new identifier (“ID”) is assigned to that new lesion at step 28 and additional frames are analyzed to extract data for that new lesion.

[0040] If at step 25 the lesion in the additional video frames is adjudged to be the same lesion identified in previous frames, at step 25, features for the lesion are extracted and aggregated by combining information from the previous frame with information from the new frame at step 26. The AI module then reanalyzes the aggregated data for the lesion and updates its detection prediction analysis, at step 27. Specifically, at step 26, the software extracts features from the current video frame and compares that data with previously detected features for that same lesion. If the newly extracted data from the current frames add additional detail, that information then is combined together with the data from the prior frame or frames. If the AI module determines that additional images are required, it may issue directions, via the second window, to reposition the colonoscope camera to obtain additional video frames for analysis at step 29. Further details of that process are described below with respect to FIG. 4.

[0041] The foregoing process described with respect to FIG. 2 is similar to analogous to stitching together multiple adjacent or overlapping images to form a panoramic image. In this case, however, the aggregation is done algorithmically, using the AI module, to analyze images derived from different planes and/or different angles, rather than a single plane as would commonly be the case for panoramic imaging or macroscopic photography. In addition, because the additional video frames adding more information to about the previous detection for the same areas of interest, the AI module does not simply analyze the new information from the newly acquired frame, but instead preferably reanalyzes the lesion detection prediction using all of the available information, including the current and past video frames, and thus is expected to provide greater detection accuracy.

[0042] Still referring to FIG. 2, once the AI module has analyzed the aggregated data at step 27, it may display in the second display window a progress indicator that informs the endoscopist regarding how much data has been aggregated and analyzed. This indicator will aid the endoscopist in assessing whether additional effort should be made to examine an area of interest, thus yielding more data for the AI module and potentially improving the examination procedure. As noted above, the AI module, at step 29, also could suggest a direction to move the endoscope to collect additional information needed to complete the analysis of an area of interest, for example, by displaying directional arrows or text.

[0043] In one preferred embodiment, the AI module may use landmarks identified by a machine learning algorithm to provide registration of images between multiple frames. Such anatomical landmarks may include tissue folds, discolored areas of tissue, blood vessels, polyps, ulcers or scars. Such landmarks may be used by the feature extraction algorithms, at step 26, to help determine if the new image(s) provide additional information for analysis or may be used at step 25 to determine whether a current lesion is the same lesion as the a previous frame or a new lesion, which is assigned a new identifier at step 28.

[0044] Referring now to FIG. 3, monitor 31 displays a live feed from the colonoscope along with a real time frame-based AI module detection prediction 32, as described, for example in commonly assigned U.S. Pat. No. 10,67,934, the entirety of which is incorporated herein by reference. The display shows the real time video output of the colonoscope including bounding box 33 determined as an output of an AI module that highlights an area of interest as potentially including a tissue feature or lesion for the endoscopist's attention. In accordance with the principles of the present invention, the AI module prediction accuracy is enhanced by including multiple video frames of the same tissue feature or lesion in the analysis, and by directing the endoscopist to redirect the camera of the endoscope to obtain further images of an area of interest.

[0045] With respect to FIG. 4, operation of the AI module to aggregate data from additional video frames is now described. A lesion in real life is a three dimensional body. Due to the limitations of camera technology, the three-dimensional interior tissue wall of a colon and large intestine of a patient will be seen in a two dimensional space projected. The type of image acquired by the colonoscope camera therefore is highly dependent on the ability of the endoscopist to manipulate the colonoscope. Accordingly, a single lesion may be only partially visible in one or multiple frames. In accordance with the present invention, however, the AI module is programmed to analyze each frame of the video stream to extract particular features of an area of interest, e.g., a lesion or polyp, to reconstruct a higher quality representation of the lesion that then may be analyzed by detection and characterization algorithms of the AI module.

[0046] More specifically, in FIG. 4, three dimension lesion 41 is located on the interior wall of a patient's colon or large intestine. During a colonoscopy examination, the endoscopist manipulates the proximal end of the colonoscope to redirect the camera at the distal tip of the colonoscope to image adjacent portions of the organ wall. In this way, video frames 42, 43, 44 and 45 are generated, each of which frames includes a partial view of lesion 41. Image frames I, I+1, I+2, I+3 are analyzed by partial lesion/feature detector AI module 46. Module 46 analyzes the partial views of the lesion in each of the multiple frames to determine whether the lesions are separate and unrelated or form part of a larger lesion, e.g., by matching up adjacent tissue boundaries in the various frames to piece together an aggregate image of the lesion. This aggregation process is concluded when, as indicated at step 47, feature boundaries in multiple images can be matched and stitched together with a degree of confidence greater than a threshold value to generate a reconstructed lesion. Techniques for matching features from adjacent video frames may include color matching, matching of adjacent tissue boundaries or tissue textures, or other techniques known to those of skill in the art of image manipulation. If during this assembly process the AI module determines, e.g., by disrupted boundary profiles, that one or more portions of the image whole is missing, the AI module may compute an estimate of the completeness of the image, and/or prompt the endoscopist to reposition the colonoscope to acquire additional image frames.

[0047] Once multiple frames of data are assemble to reconstruct a tissue feature, it is analyzed by feature detection algorithms of AI module 48, to generate a prediction and classification for the tissue feature or lesion. If the partial lesion/feature detector of the AI module indicates that additional image frames are required, the process of reconstructing and analyzing the data (now including additional image frames) is repeated, as described with respect to FIG. 2. By iteratively acquiring additional information that is presented in real-time or near real-time to the endoscopist, the ADR rate advantageously is expected to be improved. For example, a small tissue discoloration or polyp visible in a single frame might correspond to a benign growth. However, the ability of AI module 48 to detect and aggregate adjacent patches of similar tissue discoloration or pendunculations in successive video frames may result in a determination of a possibly malignant tumor, a much more critical determination for the endoscopist's consideration.

[0048] Referring now to FIG. 5, an arrangement for displaying results of the present invention using two monitors is described. Monitor 50 is similar to the monitor of FIG. 3, and displays the real time image from the colonoscope 51 on which bounding box 52 is overlaid, indicating the presence of a potential lesion. If the entire lesion, as determined by the AI module, is not visible in the current video frame displayed on monitor 50, bounding box 52 is overlaid on as much the potential lesion is visible in the displayed video frame. In accordance with one aspect of the invention, second monitor 55 includes a display that may include a partial view of area of interest 56 and text 57 indicating the AI modules' estimate of the completeness of the area of interest. If the AI module determines that additional information is required to assess an area of interest, it may overlay arrow 58 on the real time video image 51 to prompt the endoscopist to obtain additional video frames in that direction.

[0049] In the alternative, or in addition, second monitor 55 may include as indicator of the completeness of the image acquisition, a progress bar, or other visual form of progress report, informing the endoscopist about the quality and quantity of data analyzed by the detection and characterization algorithms of the AI module. Second monitor 55 also may include a display including an updated textual classification of an area highlighted in bounding box 52, including a confidential level of that prediction based on the aggregated image data. For example, in FIG. 5, second monitor reports that the feature located within bounding box 52 is concluded by the AI module to be a malignant adenoma with 60% confidence, based on the estimated 50% of the lesion that is observable in the acquired video stream.

[0050] Although preferred illustrative embodiments of the present invention are described above, it will be evident to one skilled in the art that various changes and modifications may be made without departing from the invention. It is intended in the appended claims to cover all such changes and modifications that fall within the true spirit and scope of the invention.

SYSTEM AND METHODS FOR AGGREGATING FEATURES IN VIDEO FRAMES TO IMPROVE ACCURACY OF AI DETECTION ALGORITHMS

Assignee

Inventors

Cpc classification

Classification Explorer

G06T2207/30028

PHYSICS

Classification Explorer

G06V2201/031

PHYSICS

Classification Explorer

H04N19/115

ELECTRICITY

Classification Explorer

G06T2207/10016

PHYSICS

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

A61B1/31

HUMAN NECESSITIES

Classification Explorer

G06T2207/30032

PHYSICS

Classification Explorer

G06T7/0014

PHYSICS

Classification Explorer

H04N19/186

ELECTRICITY

Classification Explorer

G06T2207/30096

PHYSICS

Classification Explorer

A61B1/000096

HUMAN NECESSITIES

Classification Explorer

G06V20/41

PHYSICS

Classification Explorer

G06F18/24

PHYSICS

Classification Explorer

G06N5/04

PHYSICS

Classification Explorer

G06T7/0012

PHYSICS

Classification Explorer

G06T19/00

PHYSICS

Classification Explorer

G06V10/25

PHYSICS

Classification Explorer

A61B1/000094

HUMAN NECESSITIES

Classification Explorer

G06T7/70

PHYSICS

Classification Explorer

A61B1/0005

HUMAN NECESSITIES

Classification Explorer

G06T2207/30101

PHYSICS

Classification Explorer

G06T17/00

PHYSICS

Classification Explorer

G06T2219/028

PHYSICS

Classification Explorer

G06V10/62

PHYSICS

Classification Explorer

G06T2200/24

PHYSICS

Classification Explorer