G01S3/8006

SOUND SOURCE LOCALIZATION USING PHASE SPECTRUM

An array of microphones placed on a mobile robot provides multiple channels of audio signals. A received set of audio signals is called an audio segment, which is divided into multiple frames. A phase analysis is performed on a frame of the signals from each pair of microphones. If both microphones are in an active state during the frame, a candidate angle is generated for each such pair of microphones. The result is a list of candidate angles for the frame. This list is processed to select a final candidate angle for the frame. The list of candidate angles is tracked over time to assist in the process of selecting the final candidate angle for an audio segment.

SOUND SIGNAL PROCESSING DEVICE, SOUND SIGNAL PROCESSING METHOD, AND PROGRAM
20170047079 · 2017-02-16 ·

A device and a method for determining a speech segment with a high degree of accuracy from a sound signal in which different sounds coexist are provided. Directional points indicating the direction of arrival of the sound signal are connected in the temporal direction, and a speech segment is detected. In this configuration, pattern classification is performed in accordance with directional characteristics with respect to the direction of arrival, and a directionality pattern and a null beam pattern are generated from the classification results. Also, an average null beam pattern is also generated by calculating the average of the null beam patterns at a time when a non-speech-like signal is input. Further, a threshold that is set at a slightly lower value than the average null beam pattern is calculated as the threshold to be used in detecting the local minimum point corresponding to the direction of arrival from each null beam pattern, and a local minimum point equal to or lower than the threshold is determined to be the point corresponding to the direction of arrival.

AUDIO PROCESSING APPARATUS AND AUDIO PROCESSING METHOD
20170040030 · 2017-02-09 ·

An audio processing apparatus includes a first-section detection unit configured to detect a first section that is a section in which the power of a spatial spectrum in a sound source direction is higher than a predetermined amount of power on the basis of an audio signal of a plurality of channels, a speech state determination unit configured to determine a speech state on the basis of an audio signal within the first section, a likelihood calculation unit configured to calculate a first likelihood that a type of sound source according to an audio signal within the first section is voice and a second likelihood that the type of sound source is non-voice, and a second-section detection unit configured to determine whether or not a second section in which power is higher than average the power of a speech section is a voice section on the basis of the first likelihood and the second likelihood within the second section.

Direction of arrival estimation
12276741 · 2025-04-15 · ·

A system configured to determine an estimated angle of arrival in reverberant environments. When a first device detects a calibration tone generated by a second device, the first device may generate multichannel audio representing the calibration tone and process the multichannel audio using a combination of detection filtering and subspace processing to determine a relative direction of the second device. For example, the first device may perform matched filtering to isolate a direct-path peak for the calibration tone, and then may sweep through all potential azimuth directions to identify an azimuth value corresponding to the direct-path peak. In some examples, the first device identifies a steering vector associated with a particular direction (e.g., signal subspace) that minimizes components in all other directions (e.g., noise subspace). The device may determine this steering vector independently for each frequency band and calculate the estimated angle of arrival by averaging results across frequency bands.

SIMULTANEOUS ACOUSTIC EVENT DETECTION ACROSS MULTIPLE ASSISTANT DEVICES
20250131913 · 2025-04-24 ·

Implementations can detect respective audio data that captures an acoustic event at multiple assistant devices in an ecosystem that includes a plurality of assistant devices, process the respective audio data locally at each of the multiple assistant devices to generate respective measures that are associated with the acoustic event using respective event detection models, process the respective measures to determine whether the detected acoustic event is an actual acoustic event, and cause an action associated with the actional acoustic event to be performed in response to determining that the detected acoustic event is the actual acoustic event. In some implementations, the multiple assistant devices that detected the respective audio data are anticipated to detect the respective audio data that captures the actual acoustic event based on a plurality of historical acoustic events being detected at each of the multiple assistant devices.

SYSTEMS AND METHODS FOR SOURCE SIGNAL SEPARATION
20170004844 · 2017-01-05 ·

A method includes receiving an input signal comprising an original domain signal and creating a first window data set and a second window data set from the signal, wherein an initiation of the second window data set is offset from an initiation of the first window data set, converting the first window data set and the second window data set to a frequency domain and storing the resulting data as data in a second domain different from the original domain, performing complex spectral phase evolution (CSPE) on the second domain data to estimate component frequencies of the first and second window data sets, using the component frequencies estimated in the CSPE, sampling a set of second-domain high resolution windows to select a mathematical representation comprising a second-domain high resolution window that fits at least one of the amplitude, phase, amplitude modulation and frequency modulation of a component of an underlying signal wherein the component comprises at least one oscillator peak, generating an output signal from the mathematical representation of the original signal as at least one of: an audio file; one or more audio signal components; and one or more speech vectors and outputting the output signal to an external system.

AUDIO PROCESSING

A method for audio focusing comprises: receiving a multi-channel audio signal that represents sounds in sound directions that correspond to respective positions in an image area of an image; receiving an indication of an audio focus direction that corresponds to a first position in the image area; selecting a primary sound direction from a plurality of different available candidate directions, wherein said different available candidate directions comprise said audio focus direction and one or more offset candidate directions and wherein each offset candidate direction corresponds to a respective candidate offset from said first position in the image area; and deriving, based on said multi-channel audio signal in dependence of the selected primary sound direction, an output audio signal where sounds in sound directions defined via the selected primary sound direction are emphasized in relation to sounds in sound directions other than those defined via the selected primary sound direction.

System for receiving communications
12379444 · 2025-08-05 ·

Methods and systems for spatial filtering transmitters and receivers capable of simultaneous communication with one or more receivers and transmitters, respectively, the receivers capable of outputting source directions to humans or devices. The methods and systems use spherical wave field partial wave expansion (PWE) models for transmitted and received fields at antennas and for waves generated by contributing sources. The source PWE models have expansion coefficients expressed as functions of directional coordinates of the sources. For spatial filtering receivers a processor uses the output signals from at least one sensor outputting signals consistent with Nyquist criteria representative of the wave field and the source PWE model to determines directional coordinates of sources (wherein the number of floating point operations are reduced) and outputs the directional coordinates and communications to a reporter configured for reporting information to humans. For spatial filtering transmitters a processor uses known receiver directions and source partial wave expansions to generate signals for transducers producing a composite total wave field conveying communications to the specified receivers. The methods and communications reduce the processing required for transmitting and receiving spatially filtered communications.

Audio recognition method, method, apparatus for positioning target audio, and device

This application discloses a method for positioning a target audio signal by a computer device. The method includes: performing echo cancellation on the audio signals collected in a plurality of directions in a space, the audio signals comprising a target-audio direct signal; obtaining weights of a plurality of time-frequency points in the echo-canceled audio signals, a weight of each time-frequency point indicating a relative proportion of the target-audio direct signal in the echo-canceled audio signals at the time-frequency point; obtaining a weighted audio signal energy distribution of the audio signals in the plurality of directions by using the weights of the plurality of time-frequency points in the echo-canceled audio signals; and obtaining a sound source azimuth corresponding to the target-audio direct signal in the audio signals by using the weighted audio signal energy distribution of the audio signals in the plurality of directions.

Sound source enumeration and direction of arrival estimation using a bayesian framework

One embodiment provides a method of sound source enumeration and direction of arrival (DoA) estimation. The method, the method includes estimating, by an enumeration module, a number of sound sources associated with an acoustic signal. The estimating includes selecting a specific parametric model from a generalized model. The generalized model is related to a microphone array architecture used to capture the acoustic signal. The method further includes estimating, by a DoA module, a direction of arrival of each sound source of the number of sound sources based, at least in part, on the selected model. The estimating the number of sound sources and estimating the DoA of each sound source are performed using a Bayesian framework.