Patent classifications
G01S3/8006
Voice input device and method for estimation of utterance direction
The present technology relates to a voice input device and method that facilitate estimation of an utterance direction. The voice input device includes a fixed part disposed at a predetermined position, a movable part movable with respect to the fixed part, a microphone array attached to the fixed part, an utterance direction estimation unit that estimates an utterance direction on the basis of a voice from an utterer that is input from the microphone array, and a driving unit that drive the movable part according to the estimated utterance direction. The voice input device can be used by installation in, for example, a smart speaker, a voice agent, a robot, and the like.
Detection and classification of siren signals and localization of siren signal sources
In an embodiment, a method comprises: capturing, by one or more microphone arrays of a vehicle, sound signals in an environment; extracting frequency spectrum features from the sound signals; predicting, using an acoustic scene classifier and the frequency spectrum features, one or more siren signal classifications; converting the one or more siren signal classifications into one or more siren signal event detections; computing time delay of arrival estimates for the one or more detected siren signals; estimating one or more bearing angles to one or more sources of the one or more detected siren signals using the time delay of arrival estimates and a known geometry of the microphone array; and tracking, using a Bayesian filter, the one or more bearing angles. If a siren is detected, actions are performed by the vehicle depending on the location of the emergency vehicle and whether the emergency vehicle is active or inactive.
Directional infrasound sensing
A method and apparatus for determining a direction of infrasound. Infrasound is received by a directional infrasound sensor comprising a plurality of channels and a plurality of sensor devices. Each channel in the plurality of channels comprises a single opening at a first end of the channel and a closed end opposite the opening. The opening of each channel in the plurality of channels is pointed in a different direction from the opening of each other channel in the plurality of channels. The plurality of sensor devices includes a sensor device at the closed end of each channel in the plurality of channels. Each sensor device in the plurality of sensor devices is configured to generate a sensor signal in response to pressure. The sensor signals generated by the plurality of sensor devices are processed to determine the direction of the infrasound received by the directional infrasound sensor.
ELECTRONIC DEVICE FOR SPEECH RECOGNITION AND CONTROL METHOD THEREOF
An electronic device for speech recognition includes a multi-channel microphone array required for remote speech recognition. The electronic device improves efficiency and performance of speech recognition of the electronic device in a space where noise other than speech to be recognized exists. A control method includes receiving a plurality of audio signals output from a plurality of sources through a plurality of microphones and analyzing the audio signals and obtaining information on directions in which the audio signals are input and information on input times of the audio signals. A target source for speech recognition among the plurality of sources is determined on the basis of the obtained information on the directions in which the plurality of audio signals are input, and the obtained information on the input times of the plurality of audio signals, and an audio signal obtained from the determined target source is processed.
MULTIPLE-SOURCE TRACKING AND VOICE ACTIVITY DETECTIONS FOR PLANAR MICROPHONE ARRAYS
Embodiments described herein provide a combined multi-source time difference of arrival (TDOA) tracking and voice activity detection (VAD) mechanism that is applicable for generic array geometries, e.g., a microphone array that lies on a plane. The combined multi-source TDOA tracking and VAD mechanism scans the azimuth and elevation angles of the microphone array in microphone pairs, based on which a planar locus of physically admissible TDOAs can be formed in the multi-dimensional TDOA space of multiple microphone pairs. In this way, the multi-dimensional TDOA tracking reduces the number of calculations that was usually involved in traditional TDOA by performing the TDOA search for each dimension separately.
SOUND SOURCE SEPARATION SYSTEM, SOUND SOURCE POSITION ESTIMATION SYSTEM, SOUND SOURCE SEPARATION METHOD, AND SOUND SOURCE SEPARATION PROGRAM
A sound source separation system includes: a controller that: acquires pieces of sound collection data with microphones that collect sounds output from first and second sound sources. The first sound source is at a first position at which effective distances from the microphones are equal and the second sound source is at a different position. The controller further acquires, based on the sound collection data, frequency spectra in two dimensions of a circumferential direction of a circle and a time direction. The first position is a center of the circle and each of the effective distances is a radius of the circle. The controller further separates, from the frequency spectra, a first sound source spectrum and a second sound source spectrum.
Information processing device and information processing method
An information processing device including an acquisition unit that acquires a sound collection result of a sound from each of one or more sound sources obtained by a sound collection portion of which positional information indicating at least one of a position and a direction is changed and an estimation unit that estimates a direction of each of the one or more sound sources on a basis of a change in a frequency of a sound collected by the sound collection portion in association with a change in the positional information of the sound collection portion.
Localization of sound sources in a given acoustic environment
Processing acoustic signals to detect sound sources in a sound scene. The method includes: obtaining a plurality of signals representative of the sound scene, captured by a plurality of microphones of predefined positions; based on the signals captured by the microphones and on the positions of the microphones, applying a quantization of directional measurements of sound intensity and establishing a corresponding acoustic activity map in a sound source localization space, the space being of dimension N; constructing at least one vector basis of dimension less than N; projecting the acoustic activity map onto at least one axis of the vector basis; and searching for at least one local peak of acoustic activity in the map projection, an identified local peak corresponding to the presence of a sound source in the scene.
Facilitation of efficient signal source location employing a coarse algorithm and high-resolution computation
Facilitation of determination of detailed location of a source signal is provided. In one embodiment, a device comprises a memory that stores computer executable components; and a processor that executes computer executable components stored in the memory. The computer executable components can comprise: a low-resolution computation logic component that implements a coarse algorithm and determines an approximate direction of arrival (DOA) of a source signal of an input signal, wherein the coarse algorithm uses both a coarse spatial grid and input data received from the input signal to determine the approximate DOA; and an error estimation logic component that estimates an estimation error of the coarse algorithm, and wherein the error estimation logic component uses the estimation error and the approximate DOA to determine a spatial interval range.
DETECTION AND CLASSIFICATION OF SIREN SIGNALS AND LOCALIZATION OF SIREN SIGNAL SOURCES
In an embodiment, a method comprises: capturing, by one or more microphone arrays of a vehicle, sound signals in an environment; extracting frequency spectrum features from the sound signals; predicting, using an acoustic scene classifier and the frequency spectrum features, one or more siren signal classifications; converting the one or more siren signal classifications into one or more siren signal event detections; computing time delay of arrival estimates for the one or more detected siren signals; estimating one or more bearing angles to one or more sources of the one or more detected siren signals using the time delay of arrival estimates and a known geometry of the microphone array; and tracking, using a Bayesian filter, the one or more bearing angles. If a siren is detected, actions are performed by the vehicle depending on the location of the emergency vehicle and whether the emergency vehicle is active or inactive.