MIRCROPHONE ASSEMBLY AND METHOD FOR PROVIDING HEARING ASSISTANCE
20240414484 · 2024-12-12
Inventors
Cpc classification
G01S3/802
PHYSICS
G01S3/8006
PHYSICS
H04R25/407
ELECTRICITY
International classification
Abstract
A microphone assembly includes at least three microphones to capture an input audio signal; an audio signal processing unit for processing the input audio signals to generate an output audio signal with a directivity; an audio source direction estimation unit; and a control unit. The audio source direction estimation unit includes first and second direction of arrival estimation modules for estimating first and second angles of incidence between an audio source and first and second axes defined by first and second pairs of the microphones, respectively; and an elevation estimation module for estimating an angle of elevation of the audio source with regard to a microphone plane based on the estimated first angle of incidence and the estimated second angle of incidence. The control unit uses the estimated angle of elevation to control a directivity parameter of the audio signal processing unit.
Claims
1. A microphone assembly comprising: at least three spaced apart microphones defining a microphone plane, each configured to capture an input audio signal from an audio source; an audio signal processing unit for processing the input audio signals in a manner so as to generate an output audio signal with a directivity; an audio source direction estimation unit including: a first direction of arrival estimation module for estimating a first angle of incidence between the audio source and a first axis defined by a first pair of the microphones with regard to a base point centered between the microphones of the first pair of microphones; a second direction of arrival estimation module for estimating a second angle of incidence between the audio source and a second axis defined by a second pair of the microphones with regard to a base point centered between the microphones of the second pair of microphones, the second pair of microphones being different from the first pair of microphones and the first axis and the second axis having different directions; and an elevation estimation module for estimating an angle of elevation of the audio source with regard to the microphone plane based on the estimated first angle of incidence and the estimated second angle of incidence; and a control unit for using the estimated angle of elevation to control a directivity parameter of the audio signal processing unit on which the directivity of the output audio signal depends.
2. The microphone assembly of claim 1, wherein the control unit is configured to control said directivity parameter such that the directivity of the output audio signal is reduced during times when the estimated angle of elevation is found to be above a threshold.
3. The microphone assembly of claim 2, wherein the control unit in configured to switch the audio signal processing unit to an omnidirectional mode during times when the estimated angle of elevation is found to be above a threshold.
4. The microphone assembly of claim 2, wherein: the audio signal processing unit includes an adaptive beamformer unit using a beamformer adaptation parameter varying between a lower limit and an upper limit; the beamformer adaptation parameter determines a denoising performance of the beamformer unit and an attenuation of the audio source by the beamformer unit; and the control unit is configured to adapt the lower limit and the upper limit of the beamformer adaptation parameter based on the estimated angle of elevation.
5. The microphone assembly of claim 4, wherein: the control unit is configured to define a low elevation range in which the estimated angle of elevation is below a first threshold value, a high elevation range in which the estimated angle of elevation is above a second threshold value higher than the first threshold value and a transition range in which the estimated angle of elevation is between the first threshold value and the second threshold value; in the low elevation range the lower limit and the upper limit of the beamformer adaptation parameter are kept substantially constant at values allowing optimal denoising performance of the beamformer unit; in the high elevation range the lower limit and the upper limit of the beamformer adaptation parameter are kept substantially constant at values allowing minimizing of the attenuation of the audio source signal by the beamformer unit; and in the transition range the lower limit and the upper limit of the beamformer adaptation parameter monotonically change from the value of the low elevation range to the values of the high elevation range as a function of the estimated angle of elevation.
6. The microphone assembly of claim 5, wherein the values of the lower limit and the higher limit are higher or equal in the low elevation range than in the high elevation range.
7. The microphone assembly of claim 2, wherein: the audio signal processing unit includes a postfilter unit using a postfilter adaptation parameter which determines the activity of the postfilter unit; the activity of the postfilter unit determines a denoising performance of the postfilter unit and an attenuation of the audio source; and the control unit is configured to change a weight of the postfilter adaptation parameter based on the estimated angle of elevation.
8. The microphone assembly of claim 7, wherein: the control unit is configured to define a low elevation range in which the estimated angle of elevation is below a first threshold value, a high elevation range in which the estimated angle of elevation is above a second threshold value higher than the first threshold value and a transition range in which the estimated angle of elevation is between the first threshold value and the second threshold value; in the low elevation range the postfilter adaptation parameter weight is kept substantially constant at values allowing optimal denoising performance of the postfilter unit; in the high elevation range the postfilter adaptation parameter weight is kept substantially constant at values allowing minimizing of the attenuation of the audio source signal by the postfilter unit; and in the transition range the postfilter adaptation parameter weight monotonically changes from the value of the low elevation range to the value of the high elevation range as a function of the estimated angle of elevation.
9. The microphone assembly of claim 8, wherein in the low elevation range the postfilter adaptation parameter weight is maximal and in the high elevation range the postfilter adaptation parameter weight is minimal.
10. The microphone assembly of claim 1, wherein: the audio source direction estimation unit includes a third direction of arrival estimation module for estimating a third angle of incidence between the audio source and a third axis defined by a third pair of the microphones with regard to a base point centered between the microphones of the third pair of microphones; and the elevation estimation module is configured to estimate the angle of elevation of the audio source based on the estimated first angle of incidence, the estimated second angle of incidence and the estimated third angle of incidence.
11. The microphone assembly of claim 10, wherein the elevation estimation module comprises: a first submodule, a second submodule and a third submodule, each of the submodules configured to provide for a pre-estimate of the angle of elevation of the audio source based on a different pair of the estimated first angle of incidence, the estimated second angle of incidence and the estimated third angle of incidence, and an elevation fusion submodule configured to generate the estimate of the angle of elevation of the audio source from the pre-estimates of the angle of elevation of the audio source.
12. The microphone assembly of claim 1, wherein: each direction of arrival estimation module is configured to provide its estimated angle of incidence between the audio source and its axis as an uncertainty cone around its axis; and the elevation estimation module or submodule, respectively, is configured to provide the estimate or pre-estimate of the angle of elevation, respectively, from an intersection line of the uncertainty cones of the estimated first angle of incidence and the estimated second angle or of the respective pair of the estimated first angle of incidence, the estimated second angle of incidence and the estimated third angle of incidence, respectively.
13. The microphone assembly of claim 1, wherein the microphone assembly is configured to update the estimated angle of elevation used by the control unit for controlling said directivity parameter of the audio signal processing unit only during times when voice activity is detected.
14. A hearing assistance system comprising: the microphone assembly of claim 1, further comprising a wireless interface for transmitting the output audio signal as an audio stream; and a hearing device comprising a wireless interface for receiving the audio stream from the microphone assembly and an output transducer for stimulating a user's hearing according to the audio stream.
15. A method of providing an output audio signal from an audio source, comprising: providing a microphone assembly comprising at least three spaced apart microphones defining a microphone plane; capturing, by each of the microphones, an input audio signal from the audio source; estimating a first angle of incidence between the audio source and a first axis defined by a first pair of the microphones with regard to a base point centered between the microphones of the first pair of microphones; estimating a second angle of incidence between the audio source and a second axis defined by a second pair of the microphones with regard to a base point centered between the microphones of the second pair of microphones, the second pair of microphones being different from the first pair of microphones and the first axis and the second axis having different directions; estimating an angle of elevation of the audio source with regard to the microphone plane based on the estimated first angle of incidence and the estimated second angle of incidence; and processing, by an audio signal processing unit, the input audio signals in a manner so as to generate an output audio signal with a directivity; wherein the estimated angle of elevation is used to control a directivity parameter of the signal processing on which the directivity of the output audio signal depends.
Description
[0024] Hereinafter, examples of the invention will be illustrated by reference to the attached drawings, wherein:
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034] A hearing device as used hereinafter is any ear level element suitable for reproducing sound by stimulating a user's hearing, such as an electroacoustic hearing aid, a bone conduction hearing aid, an active hearing protection device, a hearing prostheses element such as a cochlear implant, a wireless headset, an earbud, an earplug, an earphone, etc.
[0035]
[0036] The table microphone assembly 10 comprises a microphone arrangement 16 for capturing input audio signals from an audio source located close to the table microphone assembly 10 (usually the voice of one of the persons at the table), an audio signal processing unit 18 for processing the captured audio signals and a transmission unit 20 comprising a transmitter 22 and an antenna 24 for transmitting an output audio signal 26 provided by the audio signal processing unit 18 as an audio stream via the wireless link 14 to the hearing device 12. The table microphone assembly 10 further comprises a directivity control unit 60 for estimation an angle of elevation of a sound source, e.g., a talking person, and controlling a directivity of the output audio signal 26 according to the estimated angle of elevation.
[0037] The hearing device 12 comprises a receiver unit 30 including an antenna 32 and a receiver 34 for receiving the audio signals transmitted via the wireless link 14 and for supplying a corresponding audio stream to an audio signal processing unit 36 which typically also receives an audio input from a microphone arrangement 38. The audio signal processing unit 36 generates an audio output which is supplied to an output transducer 40 for stimulating the user's hearing, such as a loudspeaker. According to one example, the hearing device 12 may be a hearing instrument, such as a hearing aid, or an auditory prosthesis, such as a cochlear implant. According to another example, the hearing device 12 may be a wireless earbud or a wireless headset. Typically, the hearing assistance system comprises a plurality of hearing devices 12 which may be grouped in pairs so as to implement binaural arrangements for one or more listeners, wherein each listener wears two of the devices 12.
[0038] The wireless link 14 may be a digital link which, for example, uses carrier frequencies in the 2.4 MHz ISM band. The wireless link 14 may use a standard protocol, such as a Bluetooth protocol, in particular a Bluetooth Low Energy protocol, such as Bluetooth Low Energy Audio, or it may use a proprietary protocol.
[0039] The microphone arrangement 16 of the table microphone assembly 10 comprises at least three microphones M1, M2 and M3 which are arranged in a non-linear manner (i.e., which are not arranged on a straight line) in order to enable the formation of at least two different pairs of microphones defining axes which are angled with regard to each other, thereby defining a microphone plane (indicated at 48 in
[0040] The input audio signal 50-1, 50-2, 50-3 captured by each of the microphones M1, M2 and M3 is supplied both to the audio signal processing unit 18 and to the directivity control unit 60. The directivity control unit 60 controls at least one directivity parameter of the audio signal processing unit 18 on which a directivity of the output audio signal depends according to the estimated angle of elevation of the target audio source (e.g., a talking person).
[0041] The audio signal processing in the table microphone assembly 10 is schematically illustrated in
[0042]
[0043] The output of the direction of arrival estimation modules 62-1 and 62-2 is supplied to an elevation estimation module 64 which estimates an angle of elevation .sub.0 of the audio source with regard to the microphone plane 48 based on the estimated first angle of incidence .sub.0 and the estimated second angle of incidence .sub.1.
[0044]
[0045] The elevation estimation module 64 determines the estimate of the angle of elevation .sub.0 from the upper one of the two intersection lines 49-1, 49-2 of the uncertainty cones of the estimated first angle of incidence .sub.1 and the estimated second angle of incidence .sub.2. It is noted that the elevation estimation module is not able to discriminate a positive from a negative elevation of the same magnitude; however, the target audio source is expected to be elevated above the microphone plane 48, i.e., above the table on which the microphone assembly 10 is placed, so that the upper intersection line/point 49-1 has to be selected. It is also noted that the above approach is based on the assumption that the distance between the two base points O.sub.0, O.sub.1 is small compared to the distances between the each of the base points O.sub.0, O.sub.1 and the audio source S, which condition in practice usually will be fulfilled; i.e., the approach uses a farfield assumption.
[0046] It is noted that in the particular case where the two microphone axes 42, 44 form an angle of 60 the elevation angle .sub.0 can be derived as follows:
wherein .sub.0 and .sub.1 are the angles of incidence.
[0047] It is noted that the above approach can be analogously extended to more than the two pairs of microphones (for example, the microphones M1 and M3 may form a third pair of microphones for which a third angle of incidence is determined with regard to a base point centered between microphones M1, M3, which third base point then also would be considered when determining the base point O as the average of the base points of the microphone pairs).
[0048] The elevation angle estimate produced by the elevation estimation module 64 might undergo post-processing, using metadata from other sub-systems, to become more reliable and robust. In the example of
[0049] For example, the direction of arrival estimate tends to lose its pertinence when speech energy (i.e., the energy of the talker's voice) is too low. Although the direction of arrival estimation modules 62-1, 62-2 should already take this aspect into account, it might be desirable to post-process the estimated elevation angle using speech presence information.
[0050] To address this issue, one may use of information provided by a Voice Activity Detector (VAD) to post-process the elevation estimate improves its stability, reliability and robustness.
[0051] For example, the final elevation estimate may be updated using the elevation estimate .sub.0 from the elevation estimation module 64 only when a VAD signal provided by a VAD module (not shown in
[0052] Another potential issue is due to the fact that the direction of incidence estimation modules 62-1, 62-2 can only consider a single audio source (single talker) by design (i.e., this is a model assumption). Consequently, a situation with multiple talkers, potentially with different elevation incidence angles, will be considered by the entity formed by direction of incidence estimation modules 62-1, 62-2 and the elevation estimation module 64 as a unique source moving quickly both in the horizontal plane and in elevation. To address this situation, the tracking ability of each direction of incidence estimation module 62-1, 62-2 would have to be increased, very likely at the cost of stability.
[0053] To avoid such stability problem a different option may be considered, wherein information about talker activity is detected in different angular regions (i.e., sectors) of the microphone plane 48 and is associated to one of the multiple audio sources. Based on this information, the elevation post-processing module 66 can store an elevation estimate per angular region (sector) and update the elevation estimate only when the angular region is marked as being active. Such a strategy may improve stability, reliability and robustness. Thus, the final elevation estimate is updated by the elevation post-processing module 66 using the estimate .sub.0 from the elevation estimation module 64 based on information about talker activity per angular region of the microphone plane 48, wherein the estimated angle of elevation of a certain angular region is updated only during times when voice activity is detected in that angular region. An example for detecting voice activity per angular region with a non-linear arrangement of three microphones is described, for example, in U.S. Pat. No. 10,735,870 B2.
[0054] The estimation of the elevation angle also could be improved by exploiting an arrangement of the three microphones M1, M2, M3 with maximal symmetry, in which the microphones M1, M2, M3 form an equilateral triangle (in the example illustrated in
[0055] An example of an elevation estimation for such maximally symmetric microphone arrangement is schematically shown in
[0056] Thus, a dedicated elevation estimation submodule is provided for each pair of the direction of arrival estimation module, namely a first elevation estimation submodule 64-1 for the pair formed by the first direction of arrival estimation module 62-1 and the second direction of arrival estimation module 62-2, a second elevation estimation submodule 64-2 for the pair formed by the second direction of arrival estimation module 62-2 and the third direction of arrival estimation module 62-3 and a third elevation estimation submodule 64-3 for the pair formed by the first direction of arrival estimation module 62-1 and the third direction of arrival estimation module 62-3.
[0057] Each of the elevation estimation submodules 64-1, 64-2, 64-3 works in the way it was described for the elevation estimation module of
[0058] In the example of
[0059] Another aspect is the symmetry or homogeneity of the microphone arrangement. While a fully symmetric arrangement like an equilateral triangle, or a highly symmetric arrangement like a an isosceles triangle, simplifies computation for estimation of the elevation angle and may make the system numerically more robust, the microphone arrangement of the example
[0060] The control unit 68 typically controls at least one directivity parameter used in the audio signal processing unit such that the directivity of the output audio signal 26 is reduced during times when the estimated angle of elevation is found to be above a threshold, so as to avoid attenuation of the audio signal from the target audio source resulting from directivity at high elevation angles.
[0061] For example, the control unit 68 may switch the signal processing unit 18 to an omnidirectional mode during times when the estimated angle of elevation is found to be above a threshold.
[0062] According to other examples, directivity may be controlled in a more sophisticated manner, as illustrated in
[0063] The control module is used to directly constrain the internal variables and A.sub.pr of the beamformer and postfilter algorithms respectively.
[0064] The postfilter can be viewed as black-box applying an attenuation gain on the signal. The attenuation gain is called here activity for more generality and labelled A.sub.pf. The control module defines an adaptive scaling factor w.sub.Apf for A.sub.pf, based on an external criterium. In this particular case, this criteria is the elevation.
[0065] The adaptive beamformer unit 52 may use a beamformer adaptation parameter varying between a lower limit .sub.inf and an upper limit .sub.sup, wherein the beamformer adaptation parameter determines a denoising performance of the beamformer and an attenuation of the audio source by the beamformer. For example, the beamformer adaptation parameter may define the directionality of the pattern of the adaptive beam Y by weighting the mixture of static beam patterns C.sub.f (front cardioid) and C.sub.b (back cardioids) according to Y=C.sub.f*C.sub.b. For instance, =0 provides a front cardioid pattern for Y, while =1 provides a dipole pattern for Y. The control unit 68 defines an adaptive range with the lower and upper limits .sub.inf and limit .sub.sup for B, based on an external criterium, such as the elevation.
[0066] In other words, the control unit 68 may adapt the lower limit and the upper limit of the beamformer adaptation parameter based on the estimated angle of elevation. An example of such adaptation is shown
[0067] In the example of
[0068] In the low elevation range the lower limit and the upper limit of the beamformer adaptation parameter are kept substantially constant at values allowing optimal denoising performance of the beamformer (at low elevation angles the directivity of the beamformer does not attenuate the target audio source).
[0069] In the high elevation range the lower limit and the upper limit of the beamformer adaptation parameter are kept substantially constant at values allowing minimizing of the attenuation of the audio source signal by the beamformer.
[0070] In the transition range the lower limit and the upper limit of the beamformer adaptation parameter monotonically (e.g., linearly) change from the value of the low elevation range to the values of the high elevation range as a function of the estimated angle of elevation.
[0071] Both the values of the lower limit and the higher limit of the beamformer adaptation parameter are higher or equal in the low elevation range than in the high elevation range.
[0072] It is noted that the threshold values defining the low elevation range, the high elevation range and the transition range may be different for the lower limit and the upper limit of the beamformer adaptation parameter, as shown in the example of
[0073]
[0074] The postfilter unit 54 uses a postfilter adaptation parameter, namely an activity A.sub.pf, which determines the activity of the postfilter unit 54. The activity of the postfilter unit 54 determines a denoising performance on the output audio signal 26 and, depending on the elevation angle of the target audio source, may result in an attenuation of the audio signal from the target audio source, so that attenuation of the audio signal from the target audio source can be avoided by appropriate control of the activity of the postfilter unit 54 according to the elevation angle of the target audio source. In other words, the activity A.sub.pf may be considered as an attenuation gain applied to the audio signal.
[0075] An example of how an adaptive scaling factor, i.e., a weight, w.sub.Apf. applied to the postfilter adaptation parameter, i.e., to the activity A.sub.pf, used by the postfilter unit 54 may vary as a function of the estimated angle of elevation is illustrated in
[0076] Similar to example of
[0077] In the low elevation range the postfilter adaptation parameter weight is kept substantially constant at values allowing optimal denoising performance of the postfilter unit.
[0078] In the high elevation range the postfilter adaptation parameter weight is kept substantially constant at values allowing minimizing of the attenuation of the audio source signal by the postfilter unit.
[0079] In the transition range the postfilter adaptation parameter weight monotonically (e.g., linearly) changes from the value of the low elevation range to the value of the high elevation range as a function of the estimated angle of elevation.
[0080] In the low elevation range the postfilter adaptation parameter weight is maximal and in the high elevation range the postfilter adaptation parameter weight is minimal.
[0081] It is noted that that the threshold values defining the low elevation range, the high elevation range and the transition range for the postfilter adaptation parameter weight, i.e. for postfilter activity weight, mayund typically willbe different form the respective threshold values defining the low elevation range, the high elevation range and the transition range for the lower limit and the upper limit of the beamformer adaptation parameter.
[0082]
[0083] In summary, the present invention proposes a microphone assembly with a planar microphone arrangement comprising at least three microphones defining a microphone plane. The microphone assembly allows unattenuated sound pickup from a target audio source (e.g. a talking person) independently of the elevation of the target audio source while introducing only minimal denoising performance loss. To this end, the microphone assembly tracks the elevation of the target audio source and automatically applies the corresponding control to at least one directivity parameter of the processed audio signal, so as to apply, for example, the respectively required minimal beamforming and post-filtering mitigations.