G06V10/464

User-specific text record-based format prediction

A method identifies a text region in an electronic document. The method determines that the text region includes a candidate text portion that is a candidate for applying a formatting suggestion based on a comparison of the text region with predetermined patterns. The method identifies a stored text record that corresponds to the candidate text portion. The method confirms whether the formatting type is appropriate for the candidate text portion based on individual word matches between the candidate text portion and the stored text record. The method notifies a user of the electronic document of the formatting suggestion according to the formatting type.

Center-biased machine learning techniques to determine saliency in digital images
11663463 · 2023-05-30 · ·

A location-sensitive saliency prediction neural network generates location-sensitive saliency data for an image. The location-sensitive saliency prediction neural network includes, at least, a filter module, an inception module, and a location-bias module. The filter module extracts visual features at multiple contextual levels, and generates a feature map of the image. The inception module generates a multi-scale semantic structure, based on multiple scales of semantic content depicted in the image. In some cases, the inception block performs parallel analysis of the feature map, such as by parallel multiple layers, to determine the multiple scales of semantic content. The location-bias module generates a location-sensitive saliency map of location-dependent context of the image based on the multi-scale semantic structure and on a bias map. In some cases, the bias map indicates location-specific weights for one or more regions of the image.

Visual-Inertial Positional Awareness for Autonomous and Non-Autonomous Tracking
20230071839 · 2023-03-09 · ·

The described positional awareness techniques employing visual-inertial sensory data gathering and analysis hardware with reference to specific example implementations implement improvements in the use of sensors, techniques and hardware design that can enable specific embodiments to provide positional awareness to machines with improved speed and accuracy.

INFORMATION GENERATING METHOD AND APPARATUS, DEVICE, STORAGE MEDIUM, AND PROGRAM PRODUCT
20230103340 · 2023-04-06 ·

An information generating method is performed by a computer device. The method includes: obtaining a target image; extracting a semantic feature set and a visual feature set of the target image; performing attention fusion on semantic features and visual features of the target image at n time steps to obtain caption words of the target image at the n time steps by processing the semantic feature set and the visual feature set of the target image through an attention fusion network in an information generating model; and generating image caption information of the target image based on the caption words of the target image at the n time steps. Through the foregoing method, an advantage of the visual feature in generating visual vocabulary and an advantage of the semantic feature in generating a non-visual feature are combined, thereby improving the image caption's accuracy.

SALIENCY-BASED INPUT RESAMPLING FOR EFFICIENT OBJECT DETECTION

A processor-implemented method of video processing using includes receiving, via an artificial neural network (ANN), a video including a first frame and a second frame. A saliency map is generated based on the first frame of the video. The second frame of the video is sampled based on the saliency map. A first portion of the second frame is sampled at a first resolution and a second portion of the second frame is sampled at a second resolution. The first resolution is different than the second resolution. A resampled second frame is generated based on the sampling of the second frame. The resampled second frame is processed to determine an inference associated with the video.

PERSONALIZED SUMMARY GENERATION OF DATA VISUALIZATIONS

Various embodiments are generally directed to systems for summarizing data visualizations (i.e., images of data visualizations), such as a graph image, for instance. Some embodiments are particularly directed to a personalized graph summarizer that analyzes a data visualization, or image, to detect pre-defined patterns within the data visualization, and produces a textual summary of the data visualization based on the pre-defined patterns detected within the data visualization. In various embodiments, the personalized graph summarizer may include features to adapt to the preferences of a user for generating an automated, personalized computer-generated narrative. For instance, additional pre-defined patterns may be created for detection and/or the textual summary may be tailored based on user preferences. In some such instances, one or more of the user preferences may be automatically determined by the personalized graph summarizer without requiring the user to explicitly indicate them. Embodiments may integrate machine learning and computer vision concepts.

TEXTURE EVALUATION SYSTEM

The present disclosure attempts to evaluate how the texture of an object is perceived based on visual features of the topological skeleton of the object. A camera S1 obtains a color image by taking an image of an object, which serves as an evaluation target. Within the image obtained, a visual feature area, which is likely to strike a person's eye when the person claps his/her eyes on the object, and an intensity of a visual stimulus of each pixel of the visual feature area are extracted. Visual skeleton features of each pixel of the image are determined within a contour region which is composed of the visual feature areas extracted. The visual skeleton features determined are shown on a display.

Automatic ground truth generation for medical image collections

Methods and arrangements for automatic ground truth generation of medical image collections. Aspects include receiving a plurality of imaging studies, wherein each imaging study includes one or more images and a textual report associated with the one or more images. Aspects also include selecting a key image from each of the one or more images from each of the plurality of imaging studies and extracting one or more discriminating image features from a region of interest within the key image. Aspects further include processing the textual report associated with the one or more images to detect one or more concept labels, assigning an initial label from the one or more concept labels to the one or more discriminating image features, and learning an association between each of the one or more discriminating image features and the one or more concept labels.

SYSTEM AND METHOD FOR THE FUSION OF BOTTOM-UP WHOLE-IMAGE FEATURES AND TOP-DOWN ENTTIY CLASSIFICATION FOR ACCURATE IMAGE/VIDEO SCENE CLASSIFICATION

Described is a system and method for accurate image and/or video scene classification. More specifically, described is a system that makes use of a specialized convolutional-neural network (hereafter CNN) based technique for the fusion of bottom-up whole-image features and top-down entity classification. When the two parallel and independent processing paths are fused, the system provides an accurate classification of the scene as depicted in the image or video.

Explainable artificial intelligence (AI) based image analytic, automatic damage detection and estimation system

An Artificial Intelligence (AI) based automatic damage detection and estimation system receives images of a damaged object. The images are converted into monochrome versions if needed and analyzed by an ensemble machine learning (ML) cause prediction model that includes a plurality of sub-models that are each trained to identify a cause of damage to a corresponding portion for the damaged object from a plurality of causes. In addition, an explanation for the selection of the cause from the plurality of causes is also provided. The explanation includes image portions and pixels of images that enabled the cause prediction model to select the cause of damage. An ML parts identification model is also employed to identify and labels parts of the damaged object which are repairable and parts that are damaged and need replacement. The cost estimation for the repair and restoration of the damaged object can also be generated.