G06V10/464

Image recognition result culling
09830631 · 2017-11-28 · ·

Various embodiments enable an image recognition system reduce the number image match candidates before running a full-fledged pair-wise match on all image match candidates. In order to accomplish this, each inventory image can be assigned to a group. For example, a title for a book sold by an electronic marketplace could be available in multiple languages, in multiple bindings, and the book could be available in print, audio book, or electronic book. Each one of these variations could be associated with its own similarly looking inventory image, each of which could be returned as a valid match to a query image for the book. Accordingly, the inventory images for these variations could be assigned to a group for the book and, instead of geometrically processing an image for each variation, the image match system can process a single image representing all of the variations.

IMAGE RECOGNITION USING DESCRIPTOR PRUNING

The present disclosure relates to image recognition or image searching. More precisely, the present disclosure relates to pruning local descriptors extracted from an input image. The present disclosure proposes a system, method and device directed to the pruning of local descriptors extracted from image patches of an input image. The present disclosure prunes local descriptors assigned to a codebook cell, based on a relationship of the local descriptor and the assigned codebook cell. The present disclosure includes assigning a weight value for use in pruning based on the relationship of the local descriptor and the assigned codebook cell. This weight value is then used during the encoding of the local descriptors for use in image searching or image recognition.

MITIGATING PEOPLE DISTRACTORS IN IMAGES

Systems, methods, and software are described herein for removing people distractors from images. A distractor mitigation solution implemented in one or more computing devices detects people in an image and identifies salient regions in the image. The solution then determines a saliency cue for each person and classifies each person as wanted or as an unwanted distractor based at least on the saliency cue. An unwanted person is then removed from the image or otherwise reduced from the perspective of being an unwanted distraction.

MULTIMODAL AND REAL-TIME METHOD FOR FILTERING SENSITIVE MEDIA

A multimodal and real-time method for filtering sensitive content, receiving as input a digital video stream, the method including segmenting digital video into video fragments along the video timeline; extracting features containing significant information from the digital video input on sensitive media; reducing the semantic difference between each of the low-level video features, and the high-level sensitive concept; classifying the video fragments, generating a high-level label (positive or negative), with a confidence score for each fragment representation; performing high-level fusion to properly match the possible high-level labels and confidence scores for each fragment; and predicting the sensitive time by combining the labels of the fragments along the video timeline, indicating the moments when the content becomes sensitive.

METHODS AND ARRANGEMENTS FOR IDENTIFYING OBJECTS

In some arrangements, product packaging is digitally watermarked over most of its extent to facilitate high-throughput item identification at retail checkouts. Imagery captured by conventional or plenoptic cameras can be processed (e.g., by GPUs) to derive several different perspective-transformed views—further minimizing the need to manually reposition items for identification. Crinkles and other deformations in product packaging can be optically sensed, allowing such surfaces to be virtually flattened to aid identification. Piles of items can be 3D-modelled and virtually segmented into geometric primitives to aid identification, and to discover locations of obscured items. Other data (e.g., including data from sensors in aisles, shelves and carts, and gaze tracking for clues about visual saliency) can be used in assessing identification hypotheses about an item. Logos may be identified and used—or ignored—in product identification. A great variety of other features and arrangements are also detailed.

EXTRACTING SALIENT FEATURES FROM VIDEO USING A NEUROSYNAPTIC SYSTEM

Embodiments of the invention provide a method of visual saliency estimation comprising receiving an input sequence of image frames. Each image frame has one or more channels, and each channel has one or more pixels. The method further comprises, for each channel of each image frame, generating corresponding neural spiking data based on a pixel intensity of each pixel of the channel, generating a corresponding multi-scale data structure based on the corresponding neural spiking data, and extracting a corresponding map of features from the corresponding multi-scale data structure. The multi-scale data structure comprises one or more data layers, wherein each data layer represents a spike representation of pixel intensities of a channel at a corresponding scale. The method further comprises encoding each map of features extracted as neural spikes.

IMAGE PROCESSING APPARATUS AND NON-TRANSITORY COMPUTER READABLE MEDIUM
20170243077 · 2017-08-24 · ·

An image processing apparatus includes a unifying unit, a memory, a storing unit, a setting unit, a selecting unit, an extracting, and a determining unit. The unifying unit unifies images of identification target regions cut out from, a learning image. The memory stores a learning model. The storing unit stores identification target images converted into images of different image sizes. The setting unit sets a position and a size of a candidate region which is likely to include an identification target object of an identification target image. The selecting unit selects an identification target image of an image size with which the size of the cut-out candidate region is closest to the fixed size. The extracting unit extracts the information. The determining unit determines a target object included in the image of the candidate region.

Feature interpolation

Feature interpolation techniques are described. In a training stage, features are extracted from a collection of training images and quantized into visual words. Spatial configurations of the visual words in the training images are determined and stored in a spatial configuration database. In an object detection stage, a portion of features of an image are extracted from the image and quantized into visual words. Then, a remaining portion of the features of the image are interpolated using the visual words and the spatial configurations of visual words stored in the spatial configuration database.

METHODS AND SYSTEMS FOR DETECTING TOPIC TRANSITIONS IN A MULTIMEDIA CONTENT
20170228614 · 2017-08-10 ·

According to embodiments illustrated herein there is provided a method for detecting one or more topic transitions in a multimedia content. The method includes identifying, one or more frames from a plurality of frames of the multimedia content based on a comparison between one or more content items in a first frame of the plurality of frames, and the one or more content items in a first set of frames of the plurality of frames. The method further includes determining at least a first score, and a second score for each of the one or more frames. Additionally, the method includes determining a likelihood for each of the one or more frames based at least on the first score, and the second score, wherein the likelihood is indicative of a topic transition among the one or more frames.

Intelligent determination of aesthetic preferences based on user history and properties

Techniques for selecting a digital image are disclosed. The techniques may include receiving a first set of digital images, analyzing the first set of digital images to extract first image features from each of the first set of digital images, accessing a user profile, comparing the extracted first image features to a preset list of image features, ranking each digital image of the first set, selecting each digital image having a ranking that exceeds a threshold, assigning a category to each selected digital image based on a comparison of each selected digital image to a category database of digital image categories, displaying each selected digital image with the assigned category, receiving an input from the user in response to the displaying, updating the user profile and the category database based on the input, and selecting at least one subsequent digital image based on the updated user profile and category database.