G06V10/464

IMAGE SEARCHING APPARATUS, CLASSIFIER TRAINING METHOD, AND RECORDING MEDIUM
20220129702 · 2022-04-28 · ·

An image searching apparatus includes: a processor; and a memory, wherein the processor is configured to attach, to an image with a first correct label attached thereto, a second correct label, the first correct label being a correct label attached to each image included in an image dataset for training for use in supervised training, the second correct label being a correct label based on a degree of similarity from a predetermined standpoint; execute main training processing to train a classifier by using the images and one of the first correct label and the second correct label, fine-tune a training state of the classifier; trained by the main training processing, by using the images and the other one of the first correct label and the second correct label; and search, by using the classifier that is fine-tuned, for images similar to a query image.

Virtual user input controls in a mixed reality environment
11720223 · 2023-08-08 · ·

A wearable display system can automatically recognize a physical remote or a device that the remote serves using computer vision techniques. The wearable system can generate a virtual remote with a virtual control panel viewable and interactable by user of the wearable system. The virtual remote can emulate the functionality of the physical remote. The user can select a virtual remote for interaction, for example, by looking or pointing at the parent device or its remote control, or by selecting from a menu of known devices. The virtual remote may include a virtual button, which is associated with a volume in the physical space. The wearable system can detect that a virtual button is actuated by determining whether a portion of the user's body (e.g., the user's finger) has penetrated the volume associated with the virtual button.

CONTENT EXTRACTION BASED ON GRAPH MODELING
20220129688 · 2022-04-28 ·

Methods and systems are presented for extracting categorizable information from an image using a graph that models data within the image. Upon receiving an image, a data extraction system identifies characters in the image. The data extraction system then generates bounding boxes that enclose adjacent characters that are related to each other in the image. The data extraction system also creates connections between the bounding boxes based on locations of the bounding boxes. A graph is generated based on the bounding boxes and the connections such that the graph can accurately represent the data in the image. The graph is provided to a graph neural network that is configured to analyze the graph and produce an output. The data extraction system may categorize the data in the image based on the output.

Diagnostic tool for deep learning similarity models

A diagnostic tool for deep learning similarity models and image classifiers provides valuable insight into neural network decision-making. A disclosed solution generates a saliency map by: receiving a baseline image and a test image; determining, with a convolutional neural network (CNN), a first similarity between the baseline image and the test image; based on at least determining the first similarity, determining, for the test image, a first activation map for at least one CNN layer; based on at least determining the first similarity, determining, for the test image, a first gradient map for the at least one CNN layer; and generating a first saliency map as an element-wise function of the first activation map and the first gradient map. Some examples further determine a region of interest (ROI) in the first saliency map, cropping the test image to an area corresponding to the ROI, and determine a refined similarity score.

Methods and arrangements for identifying objects

In some arrangements, product packaging is digitally watermarked over most of its extent to facilitate high-throughput item identification at retail checkouts. Imagery captured by conventional or plenoptic cameras can be processed (e.g., by GPUs) to derive several different perspective-transformed views—further minimizing the need to manually reposition items for identification. Crinkles and other deformations in product packaging can be optically sensed, allowing such surfaces to be virtually flattened to aid identification. Piles of items can be 3D-modelled and virtually segmented into geometric primitives to aid identification, and to discover locations of obscured items. Other data (e.g., including data from sensors in aisles, shelves and carts, and gaze tracking for clues about visual saliency) can be used in assessing identification hypotheses about an item. Logos may be identified and used—or ignored—in product identification. A great variety of other features and arrangements are also detailed.

Analyzing content of digital images

Methods, apparatuses, and embodiments related to analyzing the content of digital images. A computer extracts multiple sets of visual features, which can be keypoints, based on an image of a selected object. Each of the multiple sets of visual features is extracted by a different visual feature extractor. The computer further extracts a visual word count vector based on the image of the selected object. An image query is executed based on the extracted visual features and the extracted visual word count vector to identify one or more candidate template objects of which the selected object may be an instance. When multiple candidate template objects are identified, a matching algorithm compares the selected object with the candidate template objects to determine a particular candidate template of which the selected object is an instance.

Extracting motion saliency features from video using a neurosynaptic system

Embodiments of the invention provide a computer-readable medium of visual saliency estimation comprising receiving an input video of image frames. Each image frame has one or more channels, and each channel has one or more pixels. The computer-readable medium further comprises, for each channel of each image frame, generating corresponding neural spiking data based on a pixel intensity of each pixel of the channel, generating a corresponding multi-scale data structure based on the corresponding neural spiking data, and extracting a corresponding map of features from the corresponding multi-scale data structure. The multi-scale data structure comprises one or more data layers, wherein each data layer represents a spike representation of pixel intensities of a channel at a corresponding scale. The computer-readable medium further comprises encoding each map of features extracted as neural spikes.

METHODS AND SYSTEMS FOR PROCESSING DOCUMENTS WITH TASK-SPECIFIC HIGHLIGHTING
20210357634 · 2021-11-18 ·

Methods and systems for automatically processing a document may include classifying a document, such as a medical document, as one or more document types based at least in part on one or more machine learning models and one or more tokens extracted from the medical document, determining a token contribution weight of each token towards the classification, modifying the medical document based on the token contribution weights of the one or more tokens, and displaying the modified medical document on a display to a user.

Adaptive pattern recognition for a sensor network

Embodiments match sensor data output by a sensor to a trained pattern. Embodiments form a plurality of windows of an identified pattern from the sensor data, each of the plurality of windows having a substantially equal window length to a length of the trained pattern. For each of the windows, embodiments generate a corresponding first Symbolic Aggregate approximation (“SAX”) word, determine a Hamming distance between the first SAX word and a second SAX word corresponding to the trained pattern, and determine a final distance score based on coefficients between the first SAX word and the second SAX word. For each of the windows, embodiments determine a number of positions in the first SAX word that do not contribute to the final distance score, update the Hamming distance after eliminating the number of positions and determine an average distance based on the final distance score and the updated Hamming distance.

USER-SPECIFIC TEXT RECORD-BASED FORMAT PREDICTION
20210342517 · 2021-11-04 ·

A method identifies a text region in an electronic document. The method determines that the text region includes a candidate text portion that is a candidate for applying a formatting suggestion based on a comparison of the text region with predetermined patterns. The method identifies a stored text record that corresponds to the candidate text portion. The method confirms whether the formatting type is appropriate for the candidate text portion based on individual word matches between the candidate text portion and the stored text record. The method notifies a user of the electronic document of the formatting suggestion according to the formatting type.