POSITIONING SYSTEM AND METHOD WITH STEGANOGRAPHIC ENCODED DATA STREAMS IN AUDIBLE-FREQUENCY AUDIO
20180238994 · 2018-08-23
Inventors
- Diamantino Rui DA SILVA FREITAS (Porto, PT)
- João NEVES MOUTINHO (Porto, PT)
- Rui Manuel ESTEVES ARAÚJO (Porto, PT)
Cpc classification
G10L19/00
PHYSICS
International classification
G01S5/30
PHYSICS
Abstract
A system and method for location positioning with steganographic encoded data streams in audible-frequency range audio is disclosed. The method comprises encoding, modulating and audio-hiding data streams into corresponding audible-frequency range steganographic audio signal; transmitting each audio signal by a corresponding loudspeaker, wherein each data stream includes the geographic location of the corresponding loudspeaker, a time stamp of transmission of periodic frames of the data stream and the like. In addition, a mobile device is used for: acquiring an audio signal that includes the transmitted audio signals; separating, demodulating and decoding the data streams therefrom; calculating the distance between the mobile device and each of the loudspeakers based on the time of flight between transmission and acquisition of each of the audio signals; and estimating the geographic location of the mobile device based on the distance between the mobile device and the loudspeakers and the geographic location of the loudspeakers.
Claims
1. A method for location positioning with steganographic encoded data streams in audible-frequency range audio comprising: encoding, modulating and audio-hiding a data stream into an audible-frequency range steganographic audio signal; transmitting said audio signal by a loudspeaker, wherein said data stream includes the geographic location of the loudspeaker; using a mobile device for: acquiring an audio signal from the acoustic environment that includes the transmitted audio signal; separating, demodulating and decoding the data stream from the acquired audio signal; and estimating a geographic location of the mobile device based on the geographic location of the loudspeaker.
2. The method for location positioning with steganographic encoded data streams in audible-frequency range audio according to claim 1, wherein said data stream includes the geographic location of the loudspeaker and a time stamp of transmission of periodic frames of the data stream and further comprising: calculating, using the mobile device, a distance between the mobile device and the loudspeaker based on a time of flight between transmission and acquisition of the audio signal, the time of flight being obtained from a difference between a time of acquisition and a time of transmission of the audio signal; and estimating, using the mobile device, the geographic location of the mobile device based on the calculated distance between the mobile device and the loudspeaker and on the geographic location of the loudspeaker.
3. The method for location positioning with steganographic encoded data streams in audible-frequency range audio according to claim 1, further comprising: encoding, modulating and audio-hiding two or more data streams each into a corresponding audible-frequency steganographic audio signal; transmitting each said audio signal by a corresponding loudspeaker, wherein each said data stream includes a geographic location of the corresponding loudspeaker and a time stamp of transmission of periodic frames of the data stream; using the mobile device for: acquiring the audio signal from the acoustic environment, wherein the acquired audio signal includes the transmitted audio signals; separating, demodulating and decoding the data streams from the acquired audio signal; calculating a distance between the mobile device and each of the loudspeakers based on a time of flight between transmission and acquisition of each of the audio signals, the time of flight being obtained from difference between a time of acquisition and a time of transmission of each of the audio signals; and estimating the geographic location of the mobile device based on the calculated distance between the mobile device and each of the loudspeakers and on the geographic location of each of the loudspeakers.
4. The method according to claim 3, wherein there are at least three data streams, three corresponding audio signals and three corresponding loudspeakers, and the method further comprising: trilateration of the distances between the mobile device and each of the loudspeakers, wherein trilateration includes calculating a centroid of an intersection of circles centered on the loudspeakers and having a corresponding radius equal to the distance between the mobile device and each of the loudspeakers.
5. The method according to claim 4, wherein the loudspeakers are synchronized in their transmission such that the same corresponding frame of the data streams is transmitted simultaneously by all the loudspeakers.
6. The method according to claim 4, wherein the loudspeakers are not synchronized in their transmission and the same corresponding frame of the data streams is transmitted by the loudspeakers with a delay specific to each loudspeaker, wherein the data stream of each loudspeaker includes the corresponding specific delay and the calculation of the time of flight deducts said specific delay for each loudspeaker.
7. The method according to claim 1, wherein the encoded data stream includes error-checking data or error-correction data.
8. The method according to claim 1, wherein the audio-hiding is echo-hiding when the loudspeakers are transmitting one or more further audio signals and wherein the audio-hiding is spread-spectrum when the loudspeakers are not transmitting any further audio signal.
9. The method according to claim 1, wherein an audible-frequency range steganographic audio signal is an audible-frequency audio signal that is below human perceptual threshold.
10. The method according to claim 1, wherein the encoding is Golay encoding.
11. The method according to claim 1, wherein the data stream includes a local air temperature measured at the loudspeaker.
12. The method according to the claim 1, further comprising: calculating a time of flight by deducting a variation of speed of sound calculated from received temperature data.
13. The method according to claim 1, wherein the data stream includes general interest data to the user of the mobile device, in particular public emergency data.
14. A system for location positioning with steganographic encoded data streams in audible-frequency range audio, comprising: an encoder-modulator for audio-hiding a data stream into an audible-frequency steganographic audio signal for transmitting to a loudspeaker, wherein said data stream includes the geographic location of the loudspeaker.
15. The system according to claim 14, wherein said data stream includes periodic frames each with a time stamp.
16. The system according to claim 14, further comprising: a signal injector for injecting said audio signal into an existing audio signal line; and further in particular comprising a loudspeaker for transmitting said audio signal.
17. A mobile device for location positioning with steganographic encoded data streams in audible-frequency range audio, comprising a processor and code, wherein the processor of the mobile device is configured by the code to: acquire an audio signal from an acoustic environment that includes a transmitted audible-frequency steganographic audio signal transmitted by a loudspeaker, wherein said transmitted audio signal has an encoded, modulated and audio-hidden data stream, wherein said data stream includes a geographic location of the loudspeaker; separate, demodulate, and decode the data stream from the acquired audio signal; and estimate the geographic location of the mobile device based on the geographic location of the loudspeaker.
18. The mobile device for location positioning with steganographic encoded data streams in audible-frequency range audio according to claim 17, wherein the processor of the mobile device is further configured by the code to: acquire the audio signal from the acoustic environment that includes the transmitted audible-frequency steganographic audio signal transmitted by the loudspeaker, wherein said transmitted audio signal has the encoded, modulated and audio-hidden data stream, wherein said data stream includes the geographic location of the loudspeaker and a time stamp of transmission of periodic frames of the data stream; separate, demodulate and decode the data stream from the acquired audio signal; calculate a distance between the mobile device and the loudspeaker based on a time of flight between transmission and acquisition of the audio signal, the time of flight being obtained from difference between a time of acquisition and a time of transmission of the audio signal; and estimate the geographic location of the mobile device based on the distance between the mobile device and the loudspeaker and on the geographic location of the loudspeaker.
19. The mobile device for location positioning with steganographic encoded data streams in audible-frequency audio according to claim 17, wherein the processor of the mobile device is further configured by the code to: acquire the audio signal from the acoustic environment, wherein the acquired audio signal includes two or more transmitted audible-frequency steganographic audio signals each transmitted by a corresponding loudspeaker, wherein each said transmitted audio signal has an encoded, modulated and audio-hidden data stream, wherein said data stream includes a geographic location of the loudspeaker and a time stamp of transmission of periodic frames of the data stream; separate, demodulate and decode the data stream from the acquired audio signal; calculate the distance between the mobile device and each of the loudspeakers based on a time of flight between transmission and acquisition of each of the audio signals, the time of flight being obtained from difference between a time of acquisition and a time of transmission of each of the audio signals; and estimate the geographic location of the mobile device based on the distance between the mobile device and each of the loudspeakers and on the geographic location of each of the loudspeakers.
20. The system according to claim 14, wherein the data stream includes a local air temperature measured at the loudspeaker.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] The following figures provide preferred embodiments for illustrating the description and should not be seen as limiting the scope of invention.
[0012]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
DETAILED DESCRIPTION
[0031] The disclosure is characterized by a set of systems and methods to allow global absolute position estimation in a processing enabled mobile device (such as a smartphone, tablet, etc.) by means of audio signals. These audio signals are transmitted periodically by an infrastructure without any connection to the mobile device and are present all the time even if no mobile device is using them. Although by nature in the audible frequency range, these signals are designed to be barely perceptible to people avoiding disturbing the acoustic environment and people that may be present in the same area. Inside these signals transmitted through air, relevant information concerning the global position determination is transmitted in a one way, non-confirmed, communication channel to the mobile device. The possibly moving mobile device, requires special processing to allow Doppler effect compensation.
[0032] Physically, the system is composed by an infrastructure, the channel and a mobile device (as
[0033] The following pertains to the disclosure's infrastructure. The fixed part of the system, present the physical architecture of the area where localization is performed. As
[0034]
[0035] The following pertains to the disclosure's channel. In this disclosure, the channel is the air in which audio waves travel from infrastructure anchors, typically loudspeakers, to reach a mobile receiving device with a microphone (possibly a smartphone). Reflections, reverberation, multipath and noise, greatly affect successful communication and use of the emitted acoustic channel. Nevertheless, it is not a controllable aspect of the disclosure as it may assume different scenarios (indoor and outdoor, in different physical architectures). The remaining parts (infrastructure and the mobile device) are designed to deal with every possible scenario of air indoor/outdoor transmission.
[0036] The following pertains to the disclosure's mobile device. The mobile device, integrating a microphone and possessing processing capabilities, is responsible for acquiring signals and perform its global position estimation.
[0037] In
[0038] To estimate localization it is necessary to rely on a referential with known position. In this infrastructure-based approach where fixed anchors emit signals to a mobile device, localization can be achieved by measuring the distance between these parts. These distance vectors where d represents the distance to the anchor, can be obtained using the time of arrival (TOA) technique: the time of reception of the signal by the mobile device. Knowing the signal arrival time and subtracting it to the departure time, will provide the duration the signal took to reach the receiver, time of flight (ToF), and allow distance measurements. It is the most efficient and usual choice among the options to infer on distance, as it uses the minimum number of necessary anchors (three for two-dimensional localization). If a signal propagates with a constant velocity v.sub.0, the distance d.sub.i can be calculated by:
d.sub.i=ToFv.(1)
[0039] It is necessary to consider that sound velocity may be influenced by air temperature and humidity. Considering temperature variation, it increases at a 0.6 m.Math.s.sup.1.Math.C..sup.1 rate at 0 C. Sound velocity 17 will therefore be
[0040] where T is temperature in Celsius. Humidity has a small effect on sound speed. It increases it by about 0.1% to 0.6% because oxygen and nitrogen molecules on the air are replaced by lighter molecules of water. This is a simple mixing effect. However, high humidity causes a higher sound attenuation and fading and therefore sound travels smaller distances. Yet, this consideration is not relevant in indoor spaces and can be neglected.
[0041] Distances d.sub.i between anchors and the mobile device, are defined by
d.sub.i=v.sub.0(t.sub.it.sub.0)={square root over ((xX.sub.i).sup.2+(yY.sub.i).sup.2)},(3))
[0042] where X.sub.i and Y.sub.i are the ith beacon's known and fixed coordinates, t.sub.i the arrival time of each beacon's wave and t.sub.0 the simultaneous emission time, common to all the anchors. The variables x and y are the unknown coordinates to be determined. The arrival times t.sub.i of the signals may be estimated using correlation methods as
[0043] ToF measurements for range estimation are the most important information for the localization estimation. If the error is not systematic, localization errors will occur if the t.sub.i values in equation (3) are not well determined. Therefore emphasis should be taken in determining the best possible methodology to evaluate ToF accurately. A comparison between the sent signal and the received one will allow the estimation of the delay and the associated distance. Cross-correlation is the simpler tool to use. Depending on the noise and in signal similarity, it can provide a good enough peak allowing determining delay, as may be read in the following equation:
R.sub.r.sub.
[0044] where R.sub.(r.sub.
D.sub.cc=arg.sub.max [R.sub.r.sub.
[0045] The sharper the peak of R.sub.(r.sub.
R.sub.r.sub.
where G.sub.r.sub.
[0046] The PHAT filter has the effect of removing all energy content from the cross spectrum. Its computational simplicity combined with its adequacy for noisy, reverberant environments like the one in the conducted experiment justifies its use in this experiment.
[0047] If i anchors are transmitting simultaneously, the receiver must be able to identify which anchor signal was received at what t.sub.i time. Also, people can ear in the frequency range where the audio signals operate and therefore acoustic annoyance should be avoided. Considering these requirements, transmission signal is carefully designed to fulfil these demands. Therefore, the transmitted signals were designed to be the most acoustically imperceptible possible to people while allowing good performance in identification. This is achieved by using signals with high autocorrelation and low cross-correlation. With the spread-spectrum encoding technique, a pseudorandom noise (PN) sequence is turned into a low power signal spread across a widespread frequency interval. This is different from schemes which encode their data in the time domain. Each loudspeaker's PN sequence should be statistically uncorrelated so that each anchor signal is correctly identified. Gold codes are a suitable example of a PN for this purpose as the correlation between codes is low and autocorrelation is high. Gold codes have bounded small cross-correlations within a set. A set of Gold code sequences consists of 2.sup.n1 sequences each one with a period of 2.sup.n1. These are constructed by XOR-ing two maximum length sequences of the same length with each other. Gold sequences have better cross-correlation properties than maximum length sequences and therefore its use is more appropriate. Also, a large number of different Gold codes can be generated, and that may be necessary to allow separate identification of a larger set of anchors. Each emitting signal is therefore identified by its code that spreads the data. Direct sequence code division multiple access (DS-CDMA) is then used to transmit the unique wide band coded signal shaped to the acoustic channel to a digital modulation scheme such as binary phase-shift keying (BPSK). It will convey the information contained in the spread spectrum signal by changing, or modulating, the phase in two possible values: 0 and 180. This modulation is the most robust easier to demodulate at reception and decision can only assume two possible decisions and therefore be less influenced by noise.
[0048] Once all d.sub.i vectors are measured, localization may be estimated as illustrated in
[0049] Estimating localization is more difficult than the
[0050] To illustrate this situation,
[0051] Determining t.sub.0 is critical to correctly evaluate distance. However, not knowing t.sub.0 is not critical if emission is simultaneous in all beacons since any over or underestimation in distance affects all the distance vectors d.sub.i with the same error d. In these conditions, this delay may be added/subtracted using a specially developed technique called Circle Shrinking. In this technique, distance is usually overestimated (because of latency in the emission) and one can think the d.sub.i values as the radius of circles, centered in the beacon's positions with radius equal to the overestimated distances as
[0052] The local search halt criterion can be a threshold or simply a stop when there is no interception. However, performing circle shrinking can be computationally demanding. It requires calculation of the interception area at every iteration of this minimization problem. One must take into account the application requirements in precision and accuracy to evaluate what is reasonable. Sometimes, a small estimation error in the distance vectors may be acceptable. The source localization algorithm may deal with it very well. For example, one-sample error in ToF estimation at 44.1 kHz represents less than a centimeter error in a distance vector from a beacon, and an even smaller error in the final position estimation. Depending on the latency variation (t.sub.0) or the application itself, one can also perform this technique only when synchronization between the emitters and the receiver is lost. This will avoid heavier processing and will increase position refresh rate. To sync the infrastructure with the mobile device, a possible approach may be to send the t.sub.0 information on the radiated signal as a time stamp. In a scheme where DS-CDMA is used, the signal information can be the exact time of emission spreaded with a code interpreted in the receiver. Another possibility is to use a clock (sync) signal together with the signals at every cycle. A previous work has used a dedicated microphone in a known position to calculate the delay each time. It is a simple possible solution but it requires additional hardware with implied additional costs. In the conducted experiment presented ahead, a sound board with a fixed latency was used to avoid the use of such calibration microphone. Assuring fixed latency in sound emission, and therefore a constant delay, will allow to use circle shrinking only once for the first delay measurement. From that point beyond, delay is considered constant and is simply subtracted resulting t.sub.0=0. This strategy avoids the need for additional hardware and does not increase computational complexity.
[0053] Localization can also be determined by a source localization algorithm that considers an error minimization approach. A non-linear optimization method can be used to estimate (x,y) by minimizing the following objective function concerning the error:
min f(x,y)=.sub.i[{square root over ((xX.sub.i).sup.2+(yY.sub.i).sup.2)}v.sub.0.Math.t.sub.i].sup.2,(5)
[0054] where f represents the error function, and one considers the typical constrains in the variable's domains.
[0055] Iterative nonlinear least square estimation methods like the Newton-Raphson, Gauss-Newton or Steepest Descent appear in the literature to provide alternatives to this problem.
[0056] Having more anchors than the mandatory three may seem unnecessary, however, redundancy may increase robustness. It may be useful to rely on extra anchors in case some physical obstruction occurs. Thus, redundant anchors may be employed creating an overdetermined equation system. Due to the presence of noise d.sub.i in the d.sub.i measurements, the desired and unknown mobile device's position (x,y) can't be obtained just by solving the system of equations. Thus, the need for an algorithm that considers an error minimization approach.
[0057] There are many advantages in using a passive localization method. The most relevant ones are related to security, privacy and autonomy. The typical GNSS is an example, as the satellite constellation is not aware of the activity of the receiver. A simple GNSS receiver achieves global positioning just by having satellites line-of-sight and, similarly, an indoor mobile device may do so just using signals already available in that space, with similar advantages. However, a reliable one-way communication between anchor(s) and a mobile device through a shared multi-use noisy channel (many times with impulsive background noise) with strong fading and multipath and populated with persons is not a simple task to achieve.
[0058] In the presented technology, in which information concerning the anchor's position travels through the channel embedded in the signal, successful data transmission is critical. Even if the MDP with respect to the anchors is precisely determined, if the anchor positions are wrong due to bad reception this will result in bad positioning, localization estimation can be the wrong indoor infrastructure. Therefore, the data transmission problem must be assumed as one of the most important parts of this global localization system. Therefore redundancy, error detection/correction and filtering techniques are employed to avoid significant errors.
[0059] The chosen position format to transmit global position was the Universal Transverse Mercator (UTM), typically described by a grid with latitude and longitude in meters. This rectangular format was chosen for being the most universally accepted by Localization-Based Applications (LBA) and one that provides faster and simpler calculations. The MDP can be estimated by the NLS method just by adding the rectangular components of the range vector to the anchor's position.
[0060] Since it is not possible to generate the error signal and request retransmission, as is done in many difficult communication channels, simple error detection is not enough. Therefore it is very important to employ other solutions and the use of Forward Error Correction (FEC) appears to be very convenient. To do so, Golay codes are used to encode the data allowing error detection and correction to a significant extent. In this application, where a processing may probably be held by a device with limited computation/battery autonomy, Golay codes are the preferable choice among other error correcting tools like, for example, Reed Solomon codes, due to their relatively small computational complexity of O(n). Golay codes therefore handle random bit errors as they tolerate three bit errors per 24 bits (a codeword)a 12.5% bit error rate compensating the fact that data retransmissions cannot be requested by the receivers operating passively.
[0061] To avoid people's perception of added audio signals, spread spectrum and echo hiding techniques are used. The audio mix is them transmitted by the loudspeakers to the channel (an indoor area). Mobile devices will be responsible for receiving the signals broadcasted in the acoustic environment and interpreting them to determine the localization, just as Global Navigation Satellite Systems do.
[0062] Considering a room in a building with a pre-existent public address sound system, the only necessary addition would be an appliance between the original sound source (possibly a mixer for a music player and voice) and the sound transducers. This appliance, illustrated in
[0063] The entitled steganographer block is responsible for choosing the best transmission scenario depending on the current condition of the public address sound system. It will perform leveling and will choose the most suitable masking technique.
[0064] In a scenario where there is no audio signal being reproduced a spread spectrum noise-like transmission is used by assuring a noise power level below the environmental noise.
[0065] The use of Spread Spectrum allows the transmitted signal to have a low power density due to the fact that the transmitted energy is spread over a wide band, and therefore, the amount of energy per specific frequency is lower as
[0066] The term comprising whenever used in this document is intended to indicate the presence of stated features, integers, steps, components, but not to preclude the presence or addition of one or more other features, integers, steps, components or groups thereof.
[0067] It will be appreciated by those of ordinary skill in the art that unless otherwise indicated herein, the particular sequence of steps described is illustrative only and can be varied without departing from the disclosure. Thus, unless otherwise stated the steps described are so unordered meaning that, when possible, the steps can be performed in any convenient or desirable order.
[0068] It is to be appreciated that certain embodiments of the disclosure as described herein may be incorporated as code (e.g., a software algorithm or program) residing in firmware and/or on computer useable medium having control logic for enabling execution on a computer system having a computer processor, such as any of the servers described herein. Such a computer system typically includes memory storage configured to provide output from execution of the code which configures a processor in accordance with the execution. The code can be arranged as firmware or software, and can be organized as a set of modules, including the various modules and algorithms described herein, such as discrete code modules, function calls, procedure calls or objects in an object-oriented programming environment. If implemented using modules, the code can comprise a single module or a plurality of modules that operate in cooperation with one another to configure the machine in which it is executed to perform the associated functions, as described herein.
[0069] The disclosure should not be seen in any way restricted to the embodiments described and a person with ordinary skill in the art will foresee many possibilities to modifications thereof. The above described embodiments are combinable. The following claims further set out particular embodiments of the disclosure.