UAV detection

Abstract

A system for detecting, classifying and tracking unmanned aerial vehicles (UAVs) comprising: at least one microphone array arranged to provide audio data; at least one camera arranged to provide video data; and at least one processor arranged to generate a spatial detection probability map comprising a set of spatial cells. The processor assigns a probability score to each cell as a function of: an audio analysis score generated by comparing audio data to a library of audio signatures; an audio intensity score generated by evaluating a power of at least a portion of a spectrum of the audio data; and a video analysis score generated by using an image processing algorithm to analyse the video data. The system is arranged to indicate that a UAV has been detected in one or more spatial cells if the associated probability score exceeds a predetermined detection threshold.

Claims

1. A system for detecting, classifying and tracking unmanned aerial vehicles in a zone of interest, the system comprising: at least one microphone array including a plurality of microphones, the at least one microphone array being arranged to provide audio data; at least one camera arranged to provide video data; and at least one processor arranged to process the audio data and the video data to generate a spatial detection probability map comprising a set of spatial cells, wherein the processor assigns a probability score to each cell within the set of spatial cells, said probability score being a function of: an audio analysis score generated by an audio analysis algorithm, said audio analysis algorithm comprising comparing the audio data corresponding to the spatial cell to a library of audio signatures; an audio intensity score generated by evaluating an amplitude of at least a portion of a spectrum of the audio data corresponding to the spatial cell; and a video analysis score generated by using an image processing algorithm to analyse the video data corresponding to the spatial cell, wherein the system is arranged to indicate that an unmanned aerial vehicle has been detected in one or more spatial cells within the zone of interest if the probability score assigned to said one or more spatial cells exceeds a predetermined detection threshold.

2. The system as claimed in claim 1, comprising a plurality of cameras and wherein audio data from the at least one microphone array is used to enhance depth detection carried out using the plurality of cameras.

3. The system as claimed in claim 1, comprising a plurality of microphone arrays wherein every microphone array includes a camera.

4. The system as claimed in claim 1, wherein at least two microphone arrays and/or cameras are mapped to one another using a known spatial relationship between the physical locations of the microphone array(s) and/or camera(s), such that said microphone array(s) and/or camera(s) share a common coordinate system.

5. The system as claimed in claim 1, wherein the system comprises a peripheral sensor subsystem, wherein the peripheral sensor subsystem comprises at least one from the group comprising: a global navigation satellite system sensor; a gyroscope; a magnetometer; an accelerometer; a clock; an electronic anemometer; and a thermometer.

6. The system as claimed in claim 5, wherein the peripheral sensor subsystem is integrated into one or more microphone arrays.

7. The system as claimed in claim 1, wherein the set of cells is generated automatically.

8. The system as claimed in claim 1, wherein the processor is arranged selectively to increase a number of spatial cells in at least a subset of said zone of interest if the probability score assigned to one or more spatial cells in said subset exceeds a predetermined cell density change threshold.

9. The system as claimed in claim 8, wherein the cell density change threshold is lower than the detection threshold.

10. The system as claimed in claim 1, wherein the processor is arranged selectively to refine the resolution of at least one microphone array and/or camera if the probability score assigned to said one or more spatial cells exceeds a predetermined resolution change threshold.

11. The system as claimed in claim 10, wherein the resolution change threshold is lower than the detection threshold.

12. The system as claimed in claim 1, wherein at least one camera is arranged to zoom in on an area within the zone of interest if the probability score assigned to said one or more spatial cells exceeds a predetermined zoom threshold.

13. The system as claimed in claim 12, wherein the zoom change threshold is lower than the detection threshold.

14. The system as claimed in claim 1, wherein the set of spatial cells is further mapped to calibration data comprising a plurality of global positioning system coordinates.

15. The system as claimed in claim 14, arranged to generate said calibration data by detecting a known audio and/or visual signature associated with a calibration drone.

16. The system as claimed in claim 1, the set of cells is generated automatically.

17. The system as claimed in claim 1, wherein each of the at least one microphone array(s) and/or camera(s) is time synchronised.

18. The system as claimed in claim 17, wherein the time synchronisation is achieved by sending each microphone array and/or camera a timestamp generated by a central server.

19. The system as claimed in claim 1, wherein audio data from at least one microphone array is used to guide the analysis of video data from at least one camera.

20. The system as claimed in claim 1, wherein video data from at least one camera is used to guide the analysis of audio data from at least one microphone array.

21. The system as claimed in claim 1, wherein the image processing algorithm comprises: calculating a mean frame from a subset of previously received video data frames; subtracting said mean frame from subsequently received video data frames to generate a difference image; and comparing said difference image to a threshold within each visual spatial cell to generate the video analysis score.

22. The system as claimed in claim 1, wherein the library of audio signatures comprises a plurality of audio signatures associated with unmanned aerial vehicles in a plurality of scenarios.

23. The system as claimed in claim 1, wherein the audio analysis algorithm comprises classifying the detected unmanned aerial vehicle based on the closest match to an audio signature in said library.

24. The system as claimed in claim 1, wherein the image processing algorithm comprises classifying the detected unmanned aerial vehicle.

25. The system as claimed in claim 1, wherein the audio analysis algorithm comprises compensating for a predetermined source of noise proximate to the zone of interest.

26. The system as claimed in claim 25, wherein the audio analysis algorithm comprises compensating for the predetermined source of noise automatically.

27. The system as claimed in claim 1, wherein the audio analysis algorithm comprises a gradient algorithm, wherein the gradient algorithm is arranged to measure a relative change in a spatial audio distribution across one or more of the spatial cells.

28. The system as claimed in claim 1, wherein the processor is arranged to process said audio and visual data in a series of repeating timeframes such that it processes data for every spatial cell within each timeframe.

29. The system as claimed in claim 1, wherein the processor is arranged to analyse each spatial cell in parallel.

30. The system as claimed in claim 1, wherein the probability score is a total of the audio analysis score, the audio intensity score, and the video analysis score.

31. The system as claimed in claim 1, wherein the probability score is an average of the audio analysis score, the audio intensity score, and the video analysis score.

32. The system as claimed in claim 31, wherein the probability score is a weighted average of the audio analysis score, the audio intensity score, and the video analysis score.

33. The system as claimed in claim 1, wherein the probability score function is varied dynamically during a regular operation of the system.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) Certain embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings in which:

(2) FIG. 1 shows a typical unmanned aerial vehicle to be detected by the described embodiments of the present invention;

(3) FIG. 2 shows an unmanned aerial vehicle detection system in accordance with an embodiment of the present invention;

(4) FIG. 3 shows a set of spatial cells used by the processor of the detection system of FIG. 2;

(5) FIG. 4 shows the unmanned aerial vehicle of FIG. 1 entering the zone of interest of the detection system of FIG. 2;

(6) FIG. 5 shows a set of spatial cells used by the processor of the detection system of FIG. 2 as the unmanned aerial vehicle enters;

(7) FIG. 6 shows the spatial detection probability map after analysis by the processor;

(8) FIG. 7 shows one example of an audio analysis process using an audio signature library;

(9) FIG. 8 shows the set of spatial cells of FIG. 5 having been refined after the unmanned aerial vehicle has been detected;

(10) FIG. 9 shows an unmanned aerial vehicle detection system in accordance with a further embodiment of the present invention that utilises multiple microphone arrays;

(11) FIG. 10 shows an unmanned aerial vehicle detection system in accordance with a further embodiment of the present invention that utilises multiple cameras;

(12) FIG. 11 shows the viewpoints of the cameras of FIG. 10;

(13) FIG. 12 shows co-registration of the viewpoints of FIG. 11;

(14) FIG. 13 shows the operation of a calibration drone used to map the spatial cells to real world GPS coordinates;

(15) FIG. 14 shows how the spatial cells used by the processor of FIG. 13 are calibrated using the calibration drone;

(16) FIG. 15 shows a constant noise source that can be compensated for in accordance with embodiments of the present invention;

(17) FIG. 16 shows a subset of spatial cells of FIG. 8 having been further refined in the vicinity of the detected unmanned aerial vehicle; and

(18) FIG. 17 shows a block diagram of a further example of an audio analysis process using a feature detection and classification algorithm.

DETAILED DESCRIPTION

(19) FIG. 1 shows a typical unmanned aerial vehicle 50. This particular unmanned aerial vehicle (UAV) 50 has a conventional quadcopter form factor, wherein the body of the UAV 50 is surrounded by four rotors 52A, 52B, 52C, 52D.

(20) These UAVs typically use a gyroscope for stability, using the data from the gyroscope to compensate for any unintended lateral motion. Such a quadcopter-based UAV uses the rotors 52A, 52B, 52C, 52D in two pairs. A first pair comprising rotors 52A, 52D rotate clockwise while the second pair comprising rotors 52B, 52C rotate counter-clockwise. Each rotor 52A, 52B, 52C, 52D can be controlled independently in order to control the flight of the UAV 50. Varying the speeds of each of the rotors 52A, 52B, 52C, 52D allows for the generation of thrust and torque as required for a given flight path.

(21) Such a UAV 50 possesses an audio signature (or set of audio signatures) that is characteristic thereof. For example, the sound of the rotors 52A, 52B, 52C, 52D during flight will contain peaks at specific frequencies within the frequency spectrum. These peaks may vary with particular flight manoeuvres such as: altitude adjustment (by increasing/decreasing the rotation speeds of the rotors 52A, 52B, 52C, 52D equally); pitch or roll adjustment (by increasing the rotation speed of one rotor and decreasing the rotation speed of its diametrically opposite rotor); or yaw adjustment (by increasing the rotation speed of rotors rotating in one direction and decreasing the rotation speed of the rotors rotating in the opposite direction). Different models and designs of such unmanned aerial vehicles will each have different audio signatures and can thus be identified as will be discussed further below.

(22) FIG. 2 shows an unmanned aerial vehicle detection system 2 in accordance with an embodiment of the present invention. For the sake of clarity, this system 2 has only a single microphone array 4 and a single external camera 8. The microphone array 4 and external camera 8 are connected to a processor 10, which in this example is a computer terminal.

(23) The microphone array 4 also has a built-in camera 6. This built-in camera 6 is positioned at the centre of the microphone array 4 and provides video data that corresponds to the same viewpoint as the audio data provided by the microphone array 4. However, it will be appreciated that the built-in camera 6 does not necessarily have to be positioned at the centre of the microphone array 4 and could instead be positioned at any other fixed point on the microphone array 4 or in close proximity to it.

(24) The external camera 8 provides a separate viewpoint of the zone of interest (both due to physical location and different camera properties such as resolution, opening or viewing angles, focal lengths etc.), and does not have any directly related audio data associated with it. However, it should be noted that given the microphone array 4 has a built-in camera 6 (as described in further detail below), the external camera 8 is not strictly necessary, but enhances and augments the capabilities provided by the built-in camera 6.

(25) The microphone array 4 is composed of a two-dimensional grid of microphones (though it will be appreciated that a three-dimensional array of microphones can also be used). Each microphone within the array 4 provides an individual audio channel, the audio produced on which differs slightly from every other microphone within the array 4. For example, because of their different positions, each microphone may receive a sound signal from a sound source (such as a UAV) at a slightly different time and with different phases due to the variation in distance that the sound signal has had to travel from the source to the microphone.

(26) The audio data from the microphone array can then be analysed using beamforming.

(27) Beamforming is used to create a series of audio channels or beams which the processor 10 analyses in order to determine the presence and origin of a received audio signal of interest. If audio data from a particular beam is of interesti.e. a particular sound such as the sound of a drone is detected within the data corresponding to the beam, the angles that form that beam then provide an indication of the direction from which the sound originated, because the beam angles are known a priori for a given spatial cell. The processor is then able to determine that the sound originated from somewhere along the beam in 3D space, i.e. within the region of the zone of interest mapped to the spatial cell corresponding to the beam. It should be noted that beamforming itself provides only the direction from which the sound originated and not the distance, although the distance can be determined by embodiments of the present invention using other techniques as will be described further below.

(28) FIG. 3 shows a set of spatial cells 12 used by the processor 10 of the detection system 2 of FIG. 2. As can be seen from the Figure, the processor 10 divides the zone of interest into a set of spatial cells 12, which in this particular embodiment are triangular cells that tessellate to form a mesh.

(29) Each individual cell within the set 12 corresponds to a beam formed by the microphone array 4, and thus the processor is able to determine whether a UAV is present in any given area to a resolution as fine as the size of the mesh permits. While the mesh that forms the set 12 in this particular embodiment is composed of triangular elements, it will be appreciated that the mesh could be formed from other shapes and such meshes are known in the art per se.

(30) Each cell within the set 12 has an associated probability score corresponding to the likelihood of a drone being present in that cell as determined by the processor 10. This probability score is a function of three component scores as will be described below.

(31) The first component score that the probability score is dependent on is an audio analysis score. The audio analysis score is generated by an audio analysis algorithm which compares the audio data corresponding to each spatial cell (and by extension, one microphone array beam) to a library of audio signatures. One possible algorithm is discussed in greater detail with reference to FIG. 7 below, however it will be appreciated that there are a number of such algorithms e.g. feature extraction and selection as outlined in FR2923043 (Orelia SAS), incorporated herein by reference and discussed with reference to FIG. 17 below, which can readily be applied in accordance with the present invention. Cells with sound signals that have a close match in the library of audio signatures will be given a higher audio analysis score than cells that do not produce a close match to any signature in the library.

(32) An audio intensity score is used as a second component score by the processor 10 in determining the probability scores for each cell within the set 12. The audio intensity score is generated by comparing the amplitude of a portion of the spectrum of the audio data corresponding to each spatial cell to a predetermined threshold. Unmanned aerial vehicles have a tendency to produce sounds of relatively high volume, particularly at certain frequencies. This thresholding operation acts to filter out background sound sources that will likely be of lower amplitude in the relevant spectral region than the sound from a UAV that is to be detected. Cells with higher relevant spectral amplitude signals are given a higher audio intensity score than cells with lower relevant spectral amplitude signals. Cells with a higher audio intensity score can be given a high priority during audio analysis, meaning that the these high-scoring cells are analysed for signatures corresponding to a drone before lower-scoring cells.

(33) Each cell within the set 12 is also given a video analysis score which is generated using an image processing algorithm. An image processing or machine vision algorithm is applied to the video data corresponding to each spatial cell and analysed for characteristic properties associated with UAVs. For example, the image processing algorithm might include: colour analysis; texture analysis; image segmentation or clustering; edge detection; corner detection; or any combination of these and/or other image processing techniques that are well documented in the art.

(34) The image processing algorithm in this particular embodiment also includes motion detection. There are a number of motion detection algorithms, such as those that use motion templates, are well documented within the art per se. Exemplary algorithms particularly suitable for this invention include OpenCV and Optical Flow.

(35) A probability score is then calculated for each of the cells based on the individual audio analysis, audio intensity, and video analysis scores, and the probability score is updated after each iteration of audio analysis and classification. There are many different ways in which this probability score might be calculated. For example, the probability score may be a total of the multiple component scores, or may be an average thereof. Alternatively the probability score could be a weighted average where the different component scores are given different weightings which may be set by the designer or varied dynamically by the processor 10.

(36) The set of cells 12 forms a probability heat map, wherein the probability of a UAV being present at any given point within the 2D projection of the 3D zone of interest is represented as a map.

(37) FIG. 4 shows the unmanned aerial vehicle 50 of FIG. 1 having entered the zone of interest of the detection system 2 of FIG. 2. The UAV 50 is thus now visible to the microphone array 4, its associated built-in camera 6 and the external camera 8. As can be seen from FIG. 5, the UAV 50 occupies several of the cells 12.

(38) FIG. 6 shows the spatial detection probability map after analysis by the processor 10. A subset of cells 14 that the UAV 50 occupies is shaded to indicate that their respective probability scores are high in comparison with the remainder of the cells 12. This shading indicates that the processor 10, having carried out the audio and video analysis described above, has calculated the probability scores in this subset 14 is greater than the surrounding cells 12.

(39) In this particular example, the probability scores in each cell within the subset 14 is greater than the detection threshold which is applied by the processor 10. Thus the detection system 2 determines that the UAV 50 is located in the airspace that corresponds to the real locations to which the subset of cells 14 are mapped. The detection system 2 may then raise an alarm to alert a user that the UAV 50 has been detected. The detection system 2 might also begin tracking the movements of the UAV 50.

(40) FIG. 7 shows one example of an audio analysis process using an audio signature library 80. The processor 10 analyses the data from the microphone array 4 to determine whether the sounds that are being received correspond to a UAV and if so, which model of UAV it is likely to be.

(41) The audio data from the microphone array 4 is Fourier transformed in order to produce a frequency spectrum 70 corresponding to the received audio data for a given cell within the set of cells 12 (i.e. the audio corresponding to a particular beam). This frequency spectrum 70 shows the magnitude |A| for each frequency f within a given range. In this particular example, the range is from 100 Hz to 10 kHz. While the frequency spectrum 70 shown here appears to be continuous, the spectra will typically be discrete in real applications due to the finite quantisation levels utilised by the processor 10. It will be understood that other domain transforms related to the Fourier transform known in the art per se such as a discrete cosine transform (DCT) or modified discrete cosine transform (MDCT) could also be readily applied to produce a suitable frequency spectrum.

(42) This frequency spectrum 70 is then compared to a library of audio signatures 80 in order to look for a match. For the sake of clarity, only three stored audio signatures 72, 74, 76 are shown on the Figure; a practical system however will of course have a far more extensive library. The processor 10 determines that the spectrum 70 is not a close match for the spectra associated with two of the audio signatures 72, 76 but does indeed match the spectra of the middle audio signature 74, shown in the Figure by the checkmark. Thus the processor determines through the audio analysis that the spectrum 70 from the associated cell corresponds not only to the presence of the UAV 50 but also indicates what type of UAV it is.

(43) FIG. 8 shows the set of spatial cells 12 of FIG. 5 having been refined after the unmanned aerial vehicle 50 has been detected. While it was described above with reference to FIG. 5 that the cells 14 had an associated probability score that exceeded the detection threshold, it may be the case that while the score was higher than usual, it was not sufficient to state with reasonable certainty that the UAV 50 was present in the zone of interest.

(44) Alternatively, the processor 10 may be reasonably certain that the UAV 50 is in the zone of interest and now wishes to obtain a better estimate of its position and dimensions.

(45) In either case, it may be that the probability score associated with these cells 14 exceeds a resolution change threshold. Once this occurs, the processor can decide to increase the resolution of the mesh, thus producing a refined set of cells 12. As can be seen by comparing the set of cells 12 in FIG. 8 to the set of cells 12 in FIG. 5, the triangular cells have been made smaller and more numerous, i.e. the cell density has been increased. For example, the beams formed using the microphone array 4 may have been separated by 10 angular spacings, but are now spaced only by 1. This may be done across the whole zone of interest or, preferably, only in the vicinity of the increased probability score.

(46) Now that the individual cells are smaller, which of course increases the processing power requirements, the subset of cells 14 which correspond to the position of the UAV 50 provide a tighter fit to the shape of the UAV 50. The increase in shading density also indicates that the probability score associated with each of the cells within the subset 14 is higher than was previously the case in FIG. 5, i.e. the processor 10 is now more certain that the UAV 50 is indeed present in that area.

(47) FIG. 9 shows an unmanned aerial vehicle detection system 2 in accordance with a further embodiment of the present invention that utilises multiple microphone arrays 4, 16. In this embodiment, the system 2 as previously described is provided with an additional microphone array 16. This particular microphone array 16 does not possess a built-in camera like the original array 4, but it will be appreciated by those skilled in the art that any combination of arrays with or without built-in cameras can be added to the system 2 as required by a given application.

(48) In this case, the two microphone arrays 4, 16 can each be used in a beamforming process and each provides audio data to the processor 10. The microphone arrays 4, 16 can provide different viewpoints of the zone of interest. This allows different subzones within the zone of interest to be monitored by each array 4, 16, since each array can only provide a view of a finite area.

(49) Alternatively, if the two arrays 4, 16 are positioned sufficiently close together, they can be combined to provide the functionality of a single, bigger superarray. This superarray then has a greater resolution than a single array.

(50) FIG. 10 shows an unmanned aerial vehicle detection system 2 in accordance with a further embodiment of the present invention that utilises multiple external cameras 8, 18. Similarly to the embodiment described with reference to FIG. 9, those skilled in the art will appreciate that any combination of external cameras, microphone arrays with built-in cameras, and microphone arrays without built-in cameras is contemplated.

(51) The two external cameras 8, 18 are positioned at different locations and each provides a different view of the zone of interest as will be described with reference to FIG. 11 below. The two cameras 8, 18 may have different properties, such as different focal lengths, zoom capabilities, ability to pan and/or tilt etc. or they could be identical, depending on the requirements of the application.

(52) Each camera can be represented by its intrinsic parameters as shown below with reference to Eqn. 1:

(53) $A_{n} = [\begin{matrix} _{x, n} & _{n} & u_{0, n} \\ 0 & _{y, n} & v_{0, n} \\ 0 & 0 & 1 \end{matrix}]$

(54) Eqn. 1: Intrinsic camera parameters

(55) wherein: A.sub.n is the intrinsic camera parameter matrix of the n.sup.th camera; .sub.x,n is the focal length multipled by a scaling factor in the x-direction for the n.sup.th camera; .sub.y,n is the focal length multipled by a scaling factor in the y-direction for the n.sup.th camera; .sub.n is a skew parameter of the n.sup.th camera; and u.sub.0,n, v.sub.0,n is the principle point of the image produced by the n.sup.th camera which is typically but not always the centre of the image in pixel coordinates. It will be appreciated that this is one model of the intrinsic parameters of the camera, and other parameters may be included within the intrinsic parameter matrix such as optical distortionproviding for e.g. barrel distortion, pincushion distortion, mustache distortion, etc.

(56) FIG. 11 shows the viewpoints 20, 22 of the cameras 8, 18 respectively as described above with reference to FIG. 10. The first camera 8 provides a first viewpoint 20 of the zone of interest, which has a certain rotation and skew associated with it due to the position and angle at which the camera 8 is installed. Similarly, the second camera 18 provides a second viewpoint 22 of the zone of interest which has a different rotation and skew to the first viewpoint 20. Each camera 8, 18 therefore has a slightly different view of the zone of interest (e.g. the second camera 18 cannot see the leftmost cloud but the first camera 8 can).

(57) FIG. 12 shows co-registration of the viewpoints 20, 22 as described previously with reference to FIG. 11. As can be seen from FIG. 12, there is an area 21 within the first viewpoint 20 that has a strong correspondence with an area 23 within the second viewpoint 22. Since the positions and properties of the cameras 8, 18 are known, these viewpoints 20, 22 can be directly compared by mapping one to the other. In fact, even if the relative camera positions were not known a priori, there are numerous image processing techniques known in the art per se that could determine the camera-to-camera mapping.

(58) With this knowledge, the two viewpoints 20, 22 can be co-registered and can also be translated to a real world image having depth. The two areas 21, 23 for example can be mapped back to a real world area 24 that looks at the zone of interest face on.

(59) This is achieved by having a matrix C that represents the position or pose of the camera as given in Eqn. 2 below:
C.sub.n=[R.sub.n.sup.TT.sub.n]

(60) Eqn. 2: Extrinsic camera parameters

(61) wherein: C.sub.n is the camera pose matrix of the n.sup.th camera; R.sub.n is a rotation matrix for the n.sup.th camera that translates the rotation of the camera to the common coordinates; and T.sub.n is a translation matrix for the n.sup.th camera that translates the position of the camera to the common coordinates, where the general form of the rotation matrix R.sub.n and translation matrix T.sub.n are known in the art per se.

(62) Mapping a camera's local coordinates to the common coordinate system can be achieved using Euler angles or Tait-Bryan angles to rotate the local coordinates to the common coordinate system, wherein the rotations are around the x-, y- and z-axes. In an example, a right-handed coordinate system is used, e.g. x-axis is positive on the right side, y-axis is positive in the downwards direction, and z-axis is positive along the line of sight. This involves carrying out four distinct rotations, each of which can be represented as a separate rotation matrix, and these four rotation matrices can be combined into a single rotation matrix that provides:

(63) A fixed rotation of 270 around camera's x-axis;

(64) Pan: rotation around camera's y-axis;

(65) Tilt: rotation around camera's x-axis; and

(66) Roll: rotation around camera's z-axis.

(67) The camera coordinate system can therefore be aligned with the common real world coordinate system. In the case of UTM this implies that the camera x-axis is aligned with east, the camera y-axis is aligned with north and the camera z-axis is aligned with height.

(68) The positions and angles corresponding to the microphone array(s) can be mapped to the common coordinates in a similar way and thus all of the audio and video data sources can use a common coordinate system, which is also used by the processor 10 as the basis for the probability map comprising the set of cells 12, 12.

(69) Since there are multiple cameras 8, 18 with an overlapping area 24, and the relationship between said cameras 8, 18 is known, it is possible to determine the depth of an object such as the UAV 50 within said area 24 by comparing the pixels in each image corresponding to the UAV 50 in the two viewpoints 20, 22 using stereoscopy techniques that are known in the art per se. A similar pairing may be made between the built-in camera 6 and either or both of the external cameras 8, 18 to provide further depth information. This depth information may also be augmented by the audio data from the microphone array 4.

(70) FIG. 13 shows the operation of a calibration drone 90 used to map the spatial cells to real world GPS coordinates. The calibration drone 90 is flown throughout the zone of interest that is to be monitored by the detection system 2. The calibration drone is flown by a user (either manually or using a predetermined, automatic flight path) along the path 94.

(71) The calibration drone is also fitted with a global positioning system (GPS) sensor 92. The GPS sensor 92 is used to log the real world coordinates of the calibration drone as it travels along the path 94. The processor 10 has a shared common timestamp with the GPS sensor 92, and thus the GPS data logged by the calibration drone 90 can be compared directly to the audio and video data provided by the microphone array 4, built-in camera 6 and external camera 8. This enables a correspondence between the spatial cells and GPS coordinates to be established as will be described below.

(72) FIG. 14 shows how the spatial cells 12 used by the processor of FIG. 13 are calibrated using the calibration drone 90. Since the GPS sensor 92 and the processor 10 are time synchronised, the processor can compare the times at which the calibration drone 90 traversed each cell with the GPS data from the GPS sensor 92 and obtain a one-to-one calibration mapping from the spatial cells 12 to real world GPS coordinates. Then, during regular operation, a detected UAV such as the UAV 50 can be pinpointed on a real world map since the cells it is detected within have a known position. This can be achieved by translating the coordinates into the correct Universal Transverse Mercator (UTM) coordinates. The coordinates could, of course, be translated into other coordinate systems as required by the end-user.

(73) FIG. 15 shows a constant noise source that can be compensated for by the detection system 2. In this Figure, the detection system 2 has been installed proximate to a wind turbine 100. The wind turbine 100, when in use, produces a relatively constant noise, which may cause difficulty in detecting unmanned aerial vehicle via sound. However, the processor 10 is arranged such that it can be calibrated to ignore such sources of constant noise. This can be achieved by calibrating the system when no drones are in the area, such that any sounds heard during calibration that are later heard during runtime can be subtracted from the runtime sound. This filtering procedure could involve spatial cancellation using beamforming algorithms, time-frequency domain filtering procedures, or a combination of the two. Additionally or alternatively, the processor 10 may be calibrated to ignore certain frequencies of sound that are known to be noise sources e.g. the wind turbine 100 producing a constant 50 Hz noise, or to spatially band-stop the known and stationary position of the unwanted noise.

(74) FIG. 16 shows a subset of spatial cells 14 of FIG. 8 having been further refined in the vicinity of the detected unmanned aerial vehicle. In this particular example, the processor has decided to further increase the resolution of the mesh only in the vicinity of the UAV 50, thus producing a refined set of cells 14. As can be seen by comparing the set of cells 14 in FIG. 8 to the set of cells 14 in FIG. 16, the triangular cells have been made even smaller, i.e. the cell density has been further increased. This new subset of cells 14 provides an even tighter fit to the shape of the UAV 50. It will be appreciated that there may not be an intermediate step of increasing the resolution globally before increasing it only in the vicinity of the drone, and the resolution may only be increased locally depending on processing requirements.

(75) FIG. 17 shows a block diagram of a further example of an audio analysis process using a feature detection and classification algorithm 200. In this algorithm 200, the audio data 202 corresponding to a particular beam is passed through a feature extraction block 204, a feature selection block 206, and a classifier block 208 in order to determine the classification 212 of the audio data 202.

(76) The feature extraction block 204 implements temporal analysis, using the waveform of the audio signal 202 and/or spectral analysis using a spectral representation of the audio signal 202 for analysis. The feature extraction block 204 analyses small segments of the audio signal 202 at a time and looks for certain features such as pitch, timbre, roll-off, number of zero crossings, centroid, flux, beat strength, rhythmic regularity, harmonic ratio etc.

(77) The set of features 205 extracted by the feature extraction block 204 are then input to the feature selection block 206. The feature selection block 206 then selects a specific subset of features 207 that are chosen to be those most indicative of the noise source (e.g. a drone) to be looked for. The subset of features 207 is chosen to provide an acceptable level of performance and high degree of accuracy for classification (e.g. does not provide too many false positives and false negatives) and reduces computational complexity by ensuring the chosen features are not redundanti.e. each chosen feature within the subset 207 provides additional information useful for classification that is not already provided by another feature within the subset 207.

(78) The chosen subset of features 207 is then passed to the classifier block 208. The classifier block 208 then uses a classifier algorithm such as a k-nearest neighbour classifier or a Gaussian mixture classifier. The classifier block 208 may also take statistical models 210 as an input. These statistical models 210 may have been built up based on training data wherein the classification labels (e.g. a specific model of drone) are assigned manually to corresponding audio data and can aid the classifier block 208 in making its determination of what is present within the audio signal 202. The classifier block 208 then outputs a classification label 212 such as drone present, drone not present or it might name a specific model of drone.

(79) Thus it will be seen that distributed, collaborative system of microphone arrays and cameras that uses various statistical analysis, spatial filtering and time-frequency filtering algorithms to detect, classify and track unmanned aerial vehicles over a potentially large area in a number of different environments has been described herein. Although particular embodiments have been described in detail, it will be appreciated by those skilled in the art that many variations and modifications are possible using the principles of the invention set out herein.

UAV detection

Assignee

Inventors

Cpc classification

Classification Explorer

G01S5/0264

PHYSICS

Classification Explorer

G06T2207/10021

PHYSICS

Classification Explorer

G06T2207/30232

PHYSICS

Classification Explorer

G06T2207/10012

PHYSICS

Classification Explorer

G01S5/20

PHYSICS

Classification Explorer

G06V20/52

PHYSICS

Classification Explorer

G06V10/245

PHYSICS

Classification Explorer

G06T7/73

PHYSICS

Classification Explorer

G06V10/764

PHYSICS

Classification Explorer

G06F18/2415

PHYSICS

International classification

Classification Explorer

G10L15/02

PHYSICS

Classification Explorer

G01S5/02

PHYSICS

Classification Explorer

G06K9/00

PHYSICS

Classification Explorer

G01S5/20

PHYSICS

Classification Explorer

G06K9/32

PHYSICS

Classification Explorer

G06T7/73

PHYSICS

Classification Explorer

G06K9/62

PHYSICS

Abstract

Claims

Description