Patent classifications
G01S3/8006
Device and method for determining a sound source direction
A device for determining a sound source direction determines a direction in which a source of a reached sound exists, based on at least one of a sound pressure difference between a first sound pressure that is a sound pressure of a first frequency component of a first part of the reached sound acquired by a first microphone and a second sound pressure that is a sound pressure of the first frequency component of a second part of the reached sound acquired by a second microphone, and a phase difference between a first phase that is a phase of a second frequency component of the first part of the reached sound and a second phase that is a phase of the second frequency component of the second part of the reached sound.
Sound source localization method and sound source localization apparatus based coherence-to-diffuseness ratio mask
Provided is a sound source localization method including steps of: (a) receiving a mixed signal of a target sound source signal and noise and echo signals through multiple microphones including at least two microphones; (b) generating a binarized mask based on a diffuseness by using a coherence-to-diffuseness ratio CDR, which is information on the target sound source and the noise source, by using the input signal; (c) pre-processing an input signal to multiple microphones by using the generated binarized mask; and (d) performing a predetermined algorithm such as the GCC-PHAT or the SRP-PHAT on the pre-processed input signal to estimate a direction of the target sound source.
AUDIO PROCESSING DEVICE, AUDIO PROCESSING METHOD, AND PROGRAM
An audio processing device includes: a sound source localizing unit configured to determine a localized sound source direction, which is a direction of a sound source, on the basis of audio signals of a plurality of channels acquired from M (here, M is an integer equal to or greater than 3) sound receiving units of which positions are different from each other; and a sound source position estimating unit configured to, for each set of two sound receiving units, estimate a midpoint of a segment perpendicular to both of half lines directed in estimated sound source directions, which are directions from the sound receiving units to an estimated sound source position of the sound source, as the estimated sound source position.
DEVICE AND METHOD FOR ESTIMATING DIRECTION OF ARRIVAL
A device for estimating Direction of Arrival (DOA) of sound from Q1 sound sources is provided. The device is configured to obtain a phase difference matrix, which includes measured phase difference values, each of the measured phase difference values being a measured value of a phase difference between two microphone units for a frequency bin in a range of frequencies of the sound. The device is further configured to generate a replicated phase difference matrix by replicating the measured phase difference values to other potential sinusoidal periods, calculate a DOA value for each phase difference value in the replicated phase difference matrix, and determine, as Q DOA results, the Q most prominent peak values in a histogram generated based on the calculated DOA values.
Multiple-source tracking and voice activity detections for planar microphone arrays
Embodiments described herein provide a combined multi-source time difference of arrival (TDOA) tracking and voice activity detection (VAD) mechanism that is applicable for generic array geometries, e.g., a microphone array that lies on a plane. The combined multi-source TDOA tracking and VAD mechanism scans the azimuth and elevation angles of the microphone array in microphone pairs, based on which a planar locus of physically admissible TDOAs can be formed in the multi-dimensional TDOA space of multiple microphone pairs. In this way, the multi-dimensional TDOA tracking reduces the number of calculations that was usually involved in traditional TDOA by performing the TDOA search for each dimension separately.
Device control method and apparatus
Provided are a device control method and apparatus. The method is applied to an audio device, and includes: receiving an acoustic signal set, determining a propagation characteristic of an acoustic signal in the acoustic signal set, determining, according to the propagation characteristic, a device parameter associated with audio play quality to be used by the audio device, and controlling the audio device to play audio with the device parameter.
ARTIFICIAL INTELLIGENCE APPARATUS AND METHOD FOR ESTIMATING SOUND SOURCE LOCALIZATION THEREOF
An artificial intelligence (AI) apparatus including a memory and a processor configured to estimate a sound source localization based on at least one of image information, sound source information, and sensor information stored in the memory. The processor is configured to pre-process at least one of the image information, the sound source information, or the sensor information to generate test data, input the test data into a pre-trained AI model to estimate the sound source localization, calculate a sound source localization estimation evaluation score of the AI model for the test data, classify the test data into validation data based on the calculated sound source localization estimation evaluation score, change the AI model based on the classified validation data, and input the test data into the changed AI model to update the AI model.
Azimuth estimation method, device, and storage medium
Embodiments of this application discloses an azimuth estimation method performed at a computing device, the method including: obtaining, in real time, multi-channel sampling signals and buffering the multi-channel sampling signals; performing wakeup word detection on one or more sampling signals of the multi-channel sampling signals, and determining a wakeup word detection score for each channel of the one or more sampling signals; performing a spatial spectrum estimation on the buffered multi-channel sampling signals to obtain a spatial spectrum estimation result, when the wakeup word detection scores of the one or more sampling signals indicates that a wakeup word exists in the one or more sampling signals; and determining an azimuth of a target voice associated with the multi-channel sampling signals according to the spatial spectrum estimation result and a highest wakeup word detection score, thereby improving the accuracy of the azimuth estimation in a voice interaction process.
System and method for speech enhancement in multisource environments
A method, computer program product, and computer system for receiving, by a computing device, a first signal emitted from one or more sources. A second signal may be received emitted from the one or more sources. A first confidence level that the wake-up-word is included in the first signal may be determined. A second confidence level that the wake-up-word is included in the second signal may be determined. It may be identified that the wake-up-word originated from a first source of the one or more sources based upon, at least in part, the first and second confidence levels. The first source may be enabled to participate in a dialog phase.
Audio recognition method, method, apparatus for positioning target audio, and device
Embodiments of this application disclose method and apparatus for positioning a target audio signal by an audio interaction device, and an audio interaction device The method includes: obtaining audio signals in a plurality of directions in a space, and performing echo cancellation on the audio signal, the audio signal including a target-audio direct signal; obtaining weights of a plurality of time-frequency points in the audio signals, a weight of each time-frequency point indicating, at the time-frequency point, a relative proportion of the target-audio direct signal in the audio signals; weighting time-frequency components of the audio signal at the plurality of time-frequency points separately for each of the plurality of directions by using the weights of the plurality of time-frequency points, to obtain a weighted audio signal energy distribution; and obtaining a sound source azimuth corresponding to the target-audio direct signal in the audio signals accordingly.