METHOD FOR JITTER COMPENSATION DURING RECEIPT OF VOICE CONTENT OVER IP-BASED NETWORKS AND RECEIVER FOR THAT AND METHOD AND DEVICE FOR SENDING AND RECEIVING VOICE CONTENT WITH JITTER COMPENSATION

Abstract

A jitter compensation during receipt of voice content (1, 2, 3, 4) via IP-based networks in voice packets (110, 120; 210) with a header (H) and payload (PL) is described, wherein a piece of time information of a transmitter time of a transmitter, which indicates the time of sending (ts) of the voice packet (110, 120; 210), is contained in the voice packet (110, 120; 210). It is provided that the receiver initializes a receiver clock using a transmitter time by means of the time information in the initial packet, that the receiver determines a minimum relative packet transmission duration (delta) after the initialization of the receiver clock, adjusts the receiver time as a function of the minimum relative packet transmission duration (delta), during the receipt of a first voice packet (111) with voice content, determines the actual time of receipt of this voice packet and determines the buffer (DJB) as a function of the actual time of receipt (FIG. 4).

Claims

1. A method for jitter compensation of random temporal fluctuations, which are termed jitter (JIT), between the times of receipt (te) of voice packets (110, 120; 210) when receiving voice content (1, 2, 3, 4) via IP-based networks, in which a sequence of digital voice packets (110, 120; 210) with or without voice content (1, 2, 3, 4) is received by a receiver for processing voice content (1, 2, 3 4) contained in the voice packets (110, 120; 210), a voice packet (110; 210) with voice content having a section, termed the header (H), with data for communication control and a section, termed the payload (PL), with digital voice data from part of the voice content (1, 2, 3, 4), a voice packet (120) without voice content having the header (H) without the payload, at least one piece of time information of a transmitter time of a transmitter, which indicates the time of sending (ts) of the voice packet (110, 120; 210), being contained in the voice packet (110, 120; 210), and the receiver waiting to process the voice content (1, 2, 3, 4) for a wait time, termed a buffer (DJB), after receiving a first voice packet (111) with voice content (1, 2, 3, 4), characterized in that, after the receipt of a first voice packet (110, 120; 210), termed the initial packet, during a connection period (30) on a transmission channel (10, 20) of the voice packet (110, 120; 210), the receiver initializes a receiver clock using a transmitter time by means of the time information in the initial packet, so that the receiver time correlates with the transmitter time when sending out the voice packet (110, 120; 210), in that the receiver determines a minimum relative packet transmission duration (delta) after the receiver clock has been initialized, in that during the receipt of subsequent voice packets (110, 120; 210) in each case, the receiver time is compared with the time information, which is contained in the header (H) of the subsequent voice packet (110, 120; 210), at the transmitter time when the voice packet was sent out and a relative packet transmission duration (delta) is determined relative to the receiver time, a resultant minimum relative packet transmission duration (delta) is temporarily stored, and the receiver time dependent on the temporarily stored minimum relative packet transmission duration (delta) is adjusted in such a manner that the receiver time correlates with the transmitter time when sending out the voice packet (110, 120; 210) with the minimum relative packet transmission duration, in that, when receiving a first voice packet (111) with voice content after at least one voice packet (120) without voice content, the receiver determines the actual time of receipt of this voice packet by comparing (deltaJB) the time information in the received voice packet (111) with the receiver time at receipt and determines the buffer (DJB) as a function of the actual time of receipt.

2. The method according to claim 1, characterized in that during the initialization of the receiver clock and during the determination of the relative packet transmission duration (delta), a different length of voice packets (110; 210) with voice content and voice packets (120) without voice content is taken into account.

3. The method according to claim 1 or 2, characterized in that the determination of the minimum packet transmission duration (delta) takes place at multiple intervals during the connection period (30).

4. The method according to claim 1, characterized in that the adjustment of the receiver time as a function of the temporarily stored minimum relative packet transmission duration (delta) takes place in such a manner that the receiver time is advanced by the minimum relative packet transmission duration (delta), if this indicates that a voice packet (110, 120; 210) with a shorter packet transmission time was received.

5. The method according to claim 1, characterized in that the adjustment of the receiver time as a function of the temporarily stored minimum relative packet transmission duration (delta) takes place in such a manner that the receiver time is set back if the temporarily stored minimum relative packet transmission duration (delta) indicates that only voice packets with a longer packet transmission duration were received, wherein the receiver time is set back by a defined or definable duration.

6. The method according to claim 1, characterized in that the determination of the minimum relative packet transmission duration (delta) takes place for voice packets (120) without voice content and voice packets (110; 210) with voice content.

7. The method according to claim 1, characterized in that the buffer (DJB), starting from a maximum buffer, adjusts a maximum jitter to be taken into account on the basis of a delay to receipt (deltaJB) resulting from the comparison of the time information in the first voice packet (111) with voice content and the receiver time when this first voice packet (111) is received.

8. The method according to claim 1, characterized in that the voice packet (110; 210) with identical voice content is received by the receiver via a plurality of different transmission channels (10, 20), wherein a correction of the delay (DDC) of the voice packets (110, 210) with identical voice content via the various transmission channels is applied by the receiver.

9. A receiver for receiving digital voice packets (110, 120; 210) via IP-based networks and for processing voice content (1, 2, 3, 4) contained in the voice packets (110; 210) having a receiving unit, which can be connected to the IP-based network and is set up for receiving voice packets (110, 120; 210) transmitted via the IP-based network, and having an arithmetic logic unit with receiver clock, which is set up to process the received voice packets (110, 120; 210), characterized in that the arithmetic logic unit is set up to execute the method for jitter compensation according to claim 1, when receiving and processing the voice content (1, 2, 3, 4).

10. The receiver according to claim 9, characterized in that the receiver is set up to receive the voice packet (110; 210) with the identical voice content via a plurality of different transmission channels (10, 20) and to apply the method according to claim 8 in the process, wherein the arithmetic logic unit is set up to process the voice content (1, 2, 3, 4) from the received voice packets (110, 120) (110, 120; 210) with the identical voice content from a plurality of the various transmission channels (10, 20).

11. A method for sending and receiving voice content, in which a transmitter converts voice content (1, 2, 3, 4) into a sequence of digital voice packets (110, 120; 210) with or without voice content, a voice packet (110; 210) with voice content having a section, termed the header (H), with data for communication control and a section, termed the payload (PL), with voice data digitized from part of the voice content (1, 2, 3, 4) and a voice packet (120) without voice content having the header (H) without the payload, and each voice packet (110, 120; 210), particularly the header (H) of the voice packet (110, 120; 210), containing at least one piece of time information of a transmitter time, which indicates the time of sending (ts) of the voice packet (110, 120; 210), and sends out the sequence of the voice packets (110, 120; 210) via at least one transmission channel (10, 20) of an IP-based network, and in which a receiver receives the sequence of digital voice packets (110, 120; 210) with or without voice content and processes the voice content (1, 2, 3, 4), characterized in that the method according to claim 1 is applied in the receiver.

12. The method according to claim 11, characterized in that a permanent communication connection is set up in the IP-based network for a connection period (30) between the transmitter and the receiver via at least one communication channel (10, 20), wherein, during the connection period (30), at times when no voice content (1, 2, 3, 4) should be transmitted, voice packets (120) without voice content are exchanged to maintain the connection between the transmitter and receiver, and wherein, at times during the connection period (30) when voice content (1, 2, 3, 4) should be transmitted, the sequence of digital voice packets (110; 210) with the voice content is transmitted from the transmitter to the receiver.

13. The method according to claim 11, characterized in that the transmitter sends out a voice packet (110, 120; 210) via a plurality of different transmission channels (10, 20) and the receiver receives the voice packets (110, 120; 210) sent out via various transmission channels (10, 20).

14. The method according to claim 11, characterized in that the method is used in radio communication, particularly for the transmission of operational ground-to-air communication in CLIMAX operation, in which a plurality of different transmission channels (10, 20) are used for transmitting a radio message as voice content (1, 2, 3, 4).

15. A device for sending and receiving digital voice packets (110, 120; 210) via IP-based networks with voice content (1, 2, 3, 4) contained in the voice packets (110; 210) having a transmitter and a receiver, the transmitter having a voice recording unit for recording the voice content (1, 2, 3 4), and an arithmetic logic unit with transmitter clock, the arithmetic logic unit being set up for processing the voice content (1, 2, 3, 4) according to the method according to one of claims 11 to 14, and the receiver being constructed according to the features according to claim 9.

Description

[0050] In the figures:

[0051] FIG. 1 schematically shows the temporal sequence for the transmission of analogue voice content via an IP-based network (voice over IP-VoIP);

[0052] FIG. 2 schematically shows the temporal sequence for sending out voice packets with identical voice content on two different transmission channels without delay correction;

[0053] FIG. 3 schematically shows the temporal sequence for sending out voice packets with identical voice content on two different transmission channels corresponding to FIG. 2, extended by a delay correction;

[0054] FIG. 4 schematically shows the temporal sequence for sending out voice packets with identical voice content on two different transmission channels corresponding to FIG. 3 with a dynamic jitter compensation according to the invention;

[0055] FIG. 5 schematically shows the temporal sequence for sending out voice packets with identical voice content on two different transmission channels corresponding to FIG. 4 with a dynamic jitter compensation according to the invention, wherein several different voice contents are transmitted consecutively in various voice packet sequences;

[0056] FIG. 6 schematically shows a connection period between transmitter and receiver with the voice packets with and without voice information, which are received by the receiver;

[0057] FIG. 7 shows flow charts for the initialization according to the invention of the receiver clock and the determination according to the invention of a minimum relative packet transmission duration;

[0058] FIG. 8 shows an extract of a log file for the execution of a method according to FIG. 8;

[0059] FIG. 9 shows flow charts for delaying the first voice packet of a radio message by dynamic adjustment of the buffer;

[0060] FIG. 10 shows an extract of a log file for the execution of a method according to FIG. 9;

[0061] Before describing actual embodiments of the invention, the principles of known voice transmission via IP-based networks (voice over IP or VoIP applications) and the terms used in this text should be explained.

[0062] FIG. 1 schematically illustrates the temporal sequence of a method for sending and receiving analogue and/or digital voice content, as is already known in principle in the prior art and is also applied in principle in the context of the method according to the invention. It is noted that the time scale is only to be understood schematically and does not make any statement about the actual time ratios. In particular, the actual transmission times of the voice packets are substantially shorter than illustrated schematically in the sketch.

[0063] When transmitting voice via IP-based networks (Voice over IP/VoIP) e.g. analogue voice content 1, for example recorded spoken text, is digitized in the known manner, for example by means of pulse code modulation (PCM), so that the voice content 1 is present at the latest after the digitization as digital voice data. The digital voice data are also termed voice content 1.

[0064] The digital voice data are combined in data packets (voice packets 110), wherein only a portion of the entire voice content 1 is contained in one voice packet 110. This is clarified by the vertical lines in the voice content 1. Each of the digital voice packets 110 is also termed a voice sample.

[0065] Thus, the voice content 1 is usually divided into a sequence 119 made up of several consecutive voice packets 110. A sequence 119 with four voice packets is shown by way of example in FIG. 1.

[0066] Each voice packet 110 contains digital voice data, which comprise a section of, e.g. several milliseconds of the voice content 1 (that is to say a portion of the voice content 1). Such a section of the (digital or digitized analogue) voice data, which is contained in a voice packet 110, is also termed a payload PL. The voice packet 110 also contains a header H, which contains data for communication control, as an important constituent. For example, the sampling instant of the first voice sample is entered in the header H as timestamp (in the sense of time information specifying the time of sending ts). This is adopted as the time of sending ts of the voice packet 110. Sampling of the voice content 1 is understood to mean sampling analogue voice content 1 with a sampling frequency, for example 8 kHz, during which a digital value is respectively assigned to the voice content 1, that is to say the digitization of the voice content as voice data and the division into packets (payload PL) take place.

[0067] Voice packets 110 are created by the transmitter, which is also termed the source (of the voice packet) in technical language, and then transmitted to the receiver, which is also termed the data sink (of the voice packet) in technical language. The transmitter inserts the time of sending ts into the header and sends the voice packet 110. If a voice packet 120 without voice content 1 (not illustrated in FIG. 1 and explained further below) is sent, in which the payload PL is omitted, the time of sending ts corresponds to the time of sending out ta, at which the voice packet 120 is sent. If a voice packet 110 with voice content 1 (illustrated in FIG. 1) is sent, the time of sending out is delayed by the sampling of the voice packet 110. This takes the time period TPS, so that the time of actual sending out of the voice packet 110 (actual transmission time ta) is shifted by the time period TPS, that is to say ta=ts+TPS. This time period TPS, which is required for sampling the payload PL, is also described as the length of the payload and can be considered to be constant (a correction for this is explained further below). Sending out the voice packets 110 is illustrated with reference to the timeline 50 and takes place by means of the transmitter (in the sense of the source of the voice packet).

[0068] As illustrated in FIG. 1, there is a certain time after the transmission until a voice packet 110 is received, at the time of receipt te, and it would be possible to start with the output of the payload PL in principle. This is illustrated in FIG. 1 in the transmission channel 10 of the IP-based network for the four voice packets 110, wherein each voice packet 110 is illustrated on one line one below the other, with reference to the timeline 51.

[0069] Delay DEL of the voice packet 110 and jitter JIT of the network represent two important influencing variables during the transmission of the voice packet 110 via the transmission channel 10.

[0070] Delay DEL or latency essentially refers to the sum of the (physical) signal delays on the transmission channels. Voice transmission via IP networks is accompanied by further delays due to temporary storage and, if necessary, data reduction, compression and decompression of the data. This is not considered further here, because these delays can be considered to be constant over a short time horizon (as can be assumed for the application of the invention). These constant delays should also be included in the delay DEL in the context of this text.

[0071] The random temporal fluctuation between the receipt of two voice packets 110, which is technically unavoidable, is termed jitter JIT. This jitter JIT has the consequence that voice packets 110 of the same length, which although they are sent with exactly the same time intervals with respect to one another, arrive at the receiver with different time intervals. This is shown in FIG. 1 by boxes JIT of different width for each of the voice packets 110, of which only the payload PL (which is essential for the invention) is illustrated, and the header H has been omitted for the sake of clarity.

[0072] It is noted once more that the temporal length of the individual components (H, PL, DEL, JIT, SBJ) in the illustration neither reproduces the actual time duration of the components relative to one another nor illustrates the timeline with a duration uniform over its length. The illustration is used solely to understand the sequence of the method described according to the invention in the context of the invention, without using a uniform (absolute or relative) timescale.

[0073] These different time intervals between the receipt of consecutive voice packets 110 may lead during further processing (e.g. playback or use) of the transmitted voice information 1 to the effect that it may not be possible to connect the payload PL of the individual voice packets 110 to a continuous output stream, as outlined for the first variant of voice output 90 (as an example of processing the received voice data) in FIG. 1. There, the output of the voice information is started immediately after receiving the payload PL of the first voice packet 110 at the time of receipt te1. The payload PL of the second voice packet is received at time te2 before the processing, e.g. playback, of the voice information from the payload PL of the first voice packet is completed. The payload PL of the second voice packet 110 is therefore temporarily stored in the receiver until processing of the payload PL of the first voice packet 110 is completed. The processing of the payload PL of the second voice packet 110 can therefore directly follow the processing of the payload PL of the first voice packet 110. The same is true for the payload PL of the third voice packet 110, so that up to this point for example, a continuous output of the voice information (in the sense of a possible processing of the payload PL) is possible.

[0074] However, owing to a particularly long jitter JIT, the payload PL of the fourth voice packet 110 is only received at the time of receipt te4, at which the processing of the payload PL of the third voice packet 110 is already complete.

[0075] Any possible desired playback (or other processing) must therefore wait until the time of receipt te4, which leads to a pause or interruption 92 in the voice output (processing the payload). Continuous processing is therefore not possible.

[0076] What is known as a buffer (jitter buffer) is used in the prior art to compensate for these different time intervals (jitter JIT) when receiving consecutive voice packets 110 and to enable continuous voice output. With respect to voice transmission, a defined time (wait time) is therefore waited after the payload PL of the first voice packet 110 from the sequence 119 of voice packets 110 was received at time of receipt te1 before further processing or voice output. This is illustrated in FIG. 1 in a second variant 91 of the voice output, in which, before the output of the voice information of the payload PL of the first voice packet 110, at time tw, after the receipt thereof at time te1, there is a wait of a static, fixed wait time, which is termed a (static) buffer SJB (static jitter buffer). After the time of receipt te1 of the first voice packet 110, there is therefore a wait for the fixed buffer SJB (in the sense of a period of time) before playback/processing of the voice message is started. During the buffer SJB, subsequent voice packets 110 continue to be received and stored temporarily. This means that after the playback of the voice information (payload PL) from the first voice packet 110, the payload (PL) of the subsequent voice packet 110 has been received already at time of receipt te2 and is stored temporarily, so that this subsequent voice packet can be played back directly following the payload PL from the first voice packet 110. The same is true for the further following voice packets 110, so that a continuous playback of the whole transmitted voice content 1 by the receiver is possible. The term playback is to be understood to be synonymous with processing in the sense of the invention and should also comprise other processing of the received payload PL from the voice packets 110 as an audible playback. Examples for this are described below.

[0077] The (static) buffer SJB therefore effects an additional deliberate delay of the processing (in the example described: voice output), in order to subsequently output the voice data isochronously. Voice packets 110 which arrive later than the delay time of the static buffer SJB can no longer be incorporated into the output data stream. The size of the buffer SJB is added to the delay DEL. The size of the buffer SJB therefore allows the choice between more delay (and lower packet loss rate) or less delay (and a higher packet loss rate). If a subsequent voice packet 110 is lost in the sense that its payload PL is not yet present for processing/output after processing/output of the payload PL of the preceding voice packet, there is an interruption in the continuous processing/voice output.

[0078] The following method is therefore known from the prior art:

[0079] Voice packets 110 with a defined portion of the entire voice content 1 (digital or digitized voice content) as payload PL are created (in practice e.g. with a payload length of 10 ms) in the transmitter (data source) and sent at the same time interval, i.e. with the same transmission frequency. This is shown in the timeline 50 by equidistant times of sending ts1 to ts4. The voice packets 110 arrive delayed at the receiver (data sink) due to the delay DEL and jitter (JIT) on the transmission channel 10. The delay DEL over the transmission channel 10 is to be considered constant for the short period under consideration, which is applied in the context of the invention for consecutive voice packets 110 (e.g. a radio message in a digital radio connection), whereas the jitter JIT per voice packet 110 may be different and is also usually within a certain jitter bandwidth. This leads to the voice packets 110 no longer arriving at the receiver with a constant time interval with respect to one another. This is shown in the timeline 51 by non-equidistant times of receipt te.

[0080] In order to eliminate these temporal differences in the further processing of the packets, audio output (voice output 90) does not take place immediately after the arrival of the first voice packet 110 at time of receipt te1. The voice output 90 is additionally delayed further by the buffer SJB (time of the jitter buffer) to time tw.

[0081] If one were not to do this and instead begin with the further processing immediately upon arrival of the first voice packet, this would lead, in the above example according to FIG. 1, to a voice gap (interruption 92 during the voice output) between the consecutive third and fourth voice packets 110, as the fourth voice packet 110 would not yet have arrived after the voice output of the payload PL of the third voice packet 110.

[0082] It can clearly be seen from FIG. 1 that the start of the voice output tw depends on when the first voice packet 110 arrives at the receiver (time te1) and how large the buffer SJB is. The start of the voice output is therefore dependent on the delay DEL, the jitter JIT of the first voice packet of a sequence of voice packets 110 and the length of the buffer SJB (jitter buffer). The delay DEL and the buffer SJB can be considered to be constant in the context of normal voice transmission (e.g. as a radio message). Jitter JIT remains as a variable influencing variable.

[0083] With respect to the use of voice over IP for the transmission of e.g. operational ground-to-air communication between an aircraft and air traffic control, there is an additional factor which is directly connected to jitter JIT and the delay DEL of the voice packets: the echo.

[0084] Irrespective of the specific example, this is always true if a radio message (e.g. from air traffic control) should be sent to an aircraft which is moving in airspace, e.g. from air traffic control, which is monitoring the relevant airspace. The airspace that is actually affected cannot always be covered by means of a single radio transmission route by a single radio transmitter. Rather, in many cases, several radio transmitters are provided (as ground stations) in a spatially distributed manner, which send out a radio message to an aircraft in the airspace from various locations, so that the radio receiver in the aircraft can receive the radio message in the whole airspace. Thus, safety-relevant communication can be realized, as is necessary e.g. in the case of ground-to-air communication in radio operation.

[0085] The radio transmission by the radio transmitters on the ground to the radio receivers in the aircraft essentially takes place at the speed of light. In the case of the radio links in question here, delay differences during transmission can be disregarded. The radio messages received by the radio receivers (which are formed by the transmitted voice content 1) are then generally output by the radio receivers directly. This is common practice and is not the subject of the invention. Usually, this radio link between radio transmitters and radio receivers is also not realized as voice over IP transmission. With respect to the particularly preferred embodiment of ground-to-air communication described here, the invention relates to jitter compensation via IP-based networks, which are for transmitting a voice message from a central source (termed a transmitter in the sense of a transmitter in voice over IP transmission in this text) to a radio transmitter (termed a receiver in the sense of a receiver in voice over IP transmission in this text). Several radio transmitters then send the voice message into the airspace by radio, for receipt by a radio receiver on-board the aircraft.

[0086] A radio transmitter therefore receives the digital voice packets 110 via a voice over IP connection, combines the payloads PL thereof to form the voice message 1, which is then sent out by radio in a modulated manner using conventional radio technology. This is what is meant with processing of voice content contained in the voice packets and ultimately also output of the voice content.

[0087] Also, if relevant delay fluctuations of the various transmissions of the same radio message, which lead to an echo during playback of the same radio message via various radio transmitters (CLIMAX operation), generally do not occur during the actual radio transmission between radio transmitters and radio receivers, the delays of the radio messages via the IP-based network by means of voice over IP from the transmitter (source of the radio message) to the receiver (sink of the radio message for the voice over IP transmission, which is the radio transmitter for sending out the radio message via radio) are different. This leads to various times of sending out for the same radio message by the various radio transmitters and creates an echo during the playback of the radio message in the receiver. The entire transmission path of the radio message from the source of the radio message (e.g. air traffic control) to the playback after the receipt of the radio transmission by the radio receiver is termed the radio channel, wherein the delay differences usually causing the echo occur during the voice over IP transmission by means of the IP-based network and cause the echo. In this respect, radio channel in the sense of the invention always also means transmission channel in the general sense. The receipt (and sending) of a contiguous voice content 1 is also termed a radio message in the described example. Accordingly, radio message also always means voice content in the general sense.

[0088] If the voice packet 110 with identical voice content 1 is successfully transmitted by the transmitter via two or more radio channels, the voice packet 110 is transmitted by the transmitter as voice packet 110 via the first channel 10 and as voice packet 210 via the second radio channel 20. Accordingly, the receiver also receives the first voice packet 110 and the second voice packet 210 via the respective radio channels 10, 20 with identical voice content 1 (FIG. 2). The structure of the voice packets 110 and 210 and the temporal sequence corresponds to the preceding description with respect to FIG. 1 with the voice packets 110. The payload of the voice packets 110 and 210 is identical.

[0089] Such a transmission of a voice packet 110, 210 with identical voice content 1, on radio channel 10 and on radio channel 20 (also termed voice packets 110, 210 to differentiate the radio channels 10, 20) is also termed CLIMAX operation. This is not limited to two radio channels 10, 20, but rather applies generally for several radio channels.

[0090] CLIMAX operation (in German ?berdeckung/in English overlap) is understood to mean the parallel and fundamentally simultaneous sending (broadcasting) of a voice packet 110, 210 by the transmitter from several transmission locations (radio transmitters) on the same frequency. The transmitter (as source of the radio message) therefore makes use of several different radio transmitters at various locations, which send out the same voice packet 110, 210 on the same radio frequency as simultaneously as possible, so that the voice contents overlap at the radio receiver, insofar as the radio message sent out by radio by the various radio transmitters is received at a (the same) receiver. This use case is a particularly preferred application of the invention.

[0091] CLIMAX operation of this type is therefore used for example in ground-to-air communication, in order to cover very large airspaces or regions with spatially difficult coverage, caused for example by mountains, using one transmission frequency. Upon receipt of the radio message 1 (i.e. of the transmitted voice content), the pilot may be located in a region, in which they receive the voice packets 110, 210 with the same voice content 1 from more than one radio transmitter location.

[0092] Therefore, it must be ensured by means of suitable technical methods that the sending out of a radio message takes place as simultaneously as possible at all radio transmitters (according to the regulation ED-137, within 10 ms), in order to prevent the occurrence of an echo of the radio message for the pilot.

[0093] FIG. 2 actually shows a situation, in which a radio message starting from a central transmitter (as source of the radio message) is sent out by sending out four voice packets 110 on a first radio channel 10 and four voice packets 210 on a second radio channel 20, in each case via a voice over IP connection, wherein the four voice packets 110 and the four voice packets 210 have identical voice content 1. The voice packets should be processed simultaneously in the receiver. In the particularly preferred use case of CLIMAX operation described here, this means that several receivers are provided, which respectively have one radio transmitter, which broadcasts received voice packets 110, 210 by radio via the voice over IP connection for receipt by one radio receiver. So that no echoes occur at the radio receiver, the voice packets 110, 120 received via the voice over IP connection should be sent out simultaneously by each radio transmitter (which corresponds to one receiver of the voice over IP connection in each case) at the various radio transmitter locations.

[0094] It can be seen in this example that the delay DEL from the transmitter to the receiver on the radio channel 20 is longer than to the receiver on radio channel 10. The differences in the delay occur owing to different interferences or properties in the radio channels 10, 20 (or more generally transmission channels 10, 20). This (different) delay DEL in the respective radio channel 10, 20 is to be understood as constant for the given short period under consideration of a radio message, so these differences in the delay can be termed static delay differences. In principle, these can also be determined, for example from the current positions of transmitter and receiver on the respective radio channel 10, 20 (delay differences owing to different distance) or by determining interferences, e.g. in the context of connection set-up on the radio channel 10, 20.

[0095] Thus, solely owing to the different delay DEL of the voice packets 110, 210 on the various radio channels 10, 20, there are different times of output tw of the identical voice content 1 at the receiver owing to the transmission of the same voice packet 110, 210 via the various radio channels 10, 20, namely the time of output tw(10) on radio channel 10 and the time of output tw(20) on the radio channel 20.

[0096] In the preferred use case, the time of output ta at the receiver means the sending out of the voice content by the radio transmitter. The voice content 1 is then only output at the radio receiver as analogue speech. Sending out the voice content via radio then leads to an undesirable echo in the (analogue) playback of the voice information by the radio receiver. Therefore, the entire system for processing the voice packets during and after the receipt via the IP-based network connection (voice over IP) can be understood as a receiver in the sense of the invention. In the example case described, this is therefore the system made up of the radio transmitter (as receiver of the voice packets via the voice over IP connection) and the output (in the sense of further processing), here by sending out the voice information via radio and receiving by the radio receiver, which then outputs the voice content in an audible manner. The radio link (in the sense of a further processing of the received voice packets for output at the end of the radio link), which is downstream of the receiver, in practice no longer leads to (new or further) delay differences on the various radio channels. In the following, only the behaviour during voice over IP transmission via the IP-based network is considered, irrespective of whether a direct (analogue) voice output takes place directly after the voice over IP transmission or only after subsequent radio transmission between radio transmitter and radio receiver.

[0097] A function named Dynamic Delay Compensation (DDC) is described in the standard ED-137 in order to compensate static delay differences from the transmitter to the receiver via various transmission or radio channels 10, 20. Here, the different values of the delay DEL are adjusted in such a manner for the various channels 10, 20 by an additional delay correction DDC, that the sending out on the fast radio channels 10 is delayed by the delay correction DDC, until it corresponds to the delay DEL on the slow radio channels 20. This delay due to the delay correction DDC is termed dynamic delay compensation. This delay correction can be realized by a delayed sending out at the transmitter location of the faster radio channel 10, that is to say in the region of the transmitter, or by a wait time during the processing of the voice information by the receiver, analogous to the buffer. The delay difference may be known and/or calculable and correspondingly applied in the transmitter or receiver. The delay difference may even be determined and applied during a connection period by the receiver in the previously described manner.

[0098] The result of applying the delay correction is shown in FIG. 3 for the case according to FIG. 2. Therefore, as a result, the sum of the actual delay of the voice packet 110, 210 andif presentthe delay correction DDC is the same for each radio channel 10, 20. The occurrence of the echo is then no longer dependent on the different delay DEL.

[0099] However, FIG. 3 makes it clear that even the jitter JIT, i.e. the arrival time of the voice packets 110, 210 varies for each radio channel 10, 20 (transmission channel). It is not difficult to see that, during CLIMAX operation of radio channels 10 and 20, owing to the delay differences, there would also be problems with echo, for which only the different jitter JIT is responsible. This also leads to different times of output tw(10) via radio channel 10 and tw(20) via radio channel 20 for the identical voice content 1.

[0100] As before, the time of receipt te1 of the first voice packet 110, 210 from a sequence 119 (cf. FIG. 1) determines the start of the voice output tw. If the first voice packet 210 arrives at an early time when considered relative to the voice packet 110 (i.e. the current jitter is smaller for this voice packet 210), the voice playback begins earlier. If the first voice packet 110 arrives at a late time when considered relatively (i.e. the current jitter is larger for this voice packet 110), the voice playback begins later, wherein voice playback or voice output also means re-sending out by radio.

[0101] This problem is not taken into consideration in the ED-137 standard by the delay correction when using several radio or transmission channels.

[0102] However, this problem is overcome by means of the proposed method for dynamic jitter compensation by means of a dynamic buffer. It is achieved that the start of the voice output tw of a radio message (or a sequence 119 of voice packets 110, 210) is no longer dependent on the jitter of the first voice packet 110, 210 of the sequence 119 of voice packets.

[0103] This is achieved according to the invention, in that the actual time of receipt (synonymous with the actual jitter) is determined for the first voice packet 110, 210 of a sequence 119 of voice packets 110, 210 representing the voice message (i.e. the voice information 1) and the buffer is fixed (in the sense of determined or specified) as a dynamic buffer DJB as a function of the actual time of receipt (synonymous with the actual jitter). The temporal length of the dynamic buffer is fixed to be dependent on the actual jitter in such a manner that the time period made up of jitter JIT and dynamic buffer DJB is constant.

[0104] This is illustrated in FIG. 4 for the temporal sequences described in FIGS. 2 and 3. Instead of a fixed buffer SJB according to FIG. 2 or 3, according to the invention there is the dynamic buffer DJB, which is orientated dynamically on the jitter (jitter value, actual time of receipt) of the first voice packet 110, 210 of the radio message or the voice content 1. If the first voice packet 110, 210 comes early when considered relatively (low jitter value), this leads to a temporally longer buffer DJB. The sum of the actual jitter JIT (jitter value in the sense of the random time of receipt with respect to no jitter or the shortest jitter that occurs) and the dynamic buffer DJB is therefore constant according to the invention. Receipt here means the receipt in the voice over IP transmission.

[0105] If one then considers several radio messages 1, 2, 3, 4 on various radio channels 10, 20, one can see in FIG. 5 that the actual jitter JIT for each radio message 1, 2, 3, 4 and each radio channel 10, 20, is different.

[0106] Nonetheless, the start of the voice output tw is the same for all radio messages 1, 2, 3, 4 within a radio channel 10, 20 and for all radio messages 1, 2, 3, 4 on all radio channels 10, 20. The further processing of the voice packets therefore takes place simultaneously for all radio messages and for all of the transmission channels.

[0107] Upon arrival of the first voice packet 111, 211 of a sequence 119 of voice packets 110, 210 (that is to say e.g. one of the radio messages 1, 2, 3, 4), the challenge when implementing the invention consists in detecting how high the actual jitter of this first voice packet 111, 211 actually is, or expressed differently, how large the relative packet transmission duration of this first voice packet 111, 211 is with respect to the total occurring random temporal fluctuations between the times of receipt of voice packets (jitter) during receipt of voice content via IP-based networks. Therefore, it must be determined quantitatively, whether the first voice packet 111, 211, which fixes the length of the buffer DJB, arrived early or late, when considered relatively.

[0108] This functionality is not known in a normal voice over IP application and is also not described in the ED-137 standard. Whenever a voice over IP connection (in the sense of a transmission of analogue voice contents 1, 2, 3, 4 via an IP-based network) between a source (transmitter) and a sink (receiver) is not set up for the first time when a sequence 119 of voice packets 110, 210 with the voice content represent digitized voice data (payload PL) should be broadcast, but rather a connection exists over a certain connection period, in which several (separate) voice contents (radio messages) 1, 2, 3, 4 should be transmitted, it is possible to determine the actual jitter of a first voice packet 111, 211 of the sequence 119 of voice packets 110, 210. This is possible for example in the case of telephone calls by means of voice over IP connections, if a connection is set up (permanently) for the duration of the telephone call, but in pauses in the conversation, no voice packets with voice content 110, 210, but rather voice packets without voice content 120 used for maintaining the connection are transmitted. As soon as new voice content 1, 2, 3, 4 is transmitted, a new dynamic buffer DJB in the sense of the invention is established.

[0109] A further use case is radio connections, for example in ground-to-air communication, in which it is important that the connection exists permanently, because messages have to be transmitted fast. A new connection set-up for each radio message 1, 2, 3, 4 would take too long and holds too many risks. A radio connection of this type may for example be developed according to the ED-137 standard. An application of this type is described in detail below. The invention is not limited to this however, but rather can be used in all applications in which a connection for voice communication exists over a relatively long time (permanently).

[0110] In the context of a use according to the ED-137 standard, a permanent voice over IP connection exists in the IP-based network, during which voice packets 110, 120, 210 are constantly transmitted in RTP format for each radio channel 10, 20, which are used both for voice transmission and maintaining the connection. This is illustrated in FIG. 6 for a connection period 30 and the radio channel 10, in which three radio messages 1, 2, 3 (with voice content in digitized form, that is to say voice data) are illustrated by way of example, wherein the three points indicate that not all radio messages 1, 2, 3, 4 and voice packets 110, 111, 120 are illustrated. The radio messages 1, 2, 3 in each case contain voice packets 110 with voice content, i.e. with header H and payload PL and form a sequence 119 of voice packets. The same applies analogously for the further radio channels 20.

[0111] If no radio message 1, 2, 3, is currently being transmitted, empty voice packets 120 without voice content are transmitted, which do not have any voice data (payload PL) and only have a header H. The sending out of the voice packets 110, 120 preferably takes place at a predetermined sending clock rate.

[0112] The first voice packet 111 of a sequence of voice packets 110, which form a radio message 1, 2, 3, is in each case a distinguished voice packet 111, which is distinguished in that this voice packet is the first voice packet 110 with voice content, which follows one or more voice packets 120 without voice content and therefore indicates the start of a new radio message 1, 2, 3. For this first voice packet 111 with voice content of a sequence of voice packets, the actual time of receipt (i.e. the actual jitter) of this voice packet 111 is determined by comparing the time information in the received voice packet 111 with the receiver time upon receipt. The buffer DJB (dynamic buffer/jitter buffer) according to the invention is determined as a function of the actual time of receipt.

[0113] With the aid of this permanent connection during the connection period 30 and the voice packets 110, 120 exchanged during the connection period 30, it is possible, in a simple manner, to determine the shortest packet delay, when considered relatively, and thus to calculate the actual jitter in the receiver. If, upon arrival of the first voice packet 111 to be played back of a sequence 119 of voice packets 110, it is then known what actual jitter (jitter value) or time of receipt this first voice packet 111 has, the delay time until the start of the playback of the voice packet (i.e. the buffer DJB according to the invention) can be dynamically adjusted accordingly.

[0114] The procedure according to the invention therefore consists of two functional parts: [0115] Part A: [0116] Determining the minimum packet delay between transmitter (data source) and receiver (data sink) during the transmission via the IP-based network (voice over IP) [0117] Part B: [0118] Delaying the first voice packet 111 of a radio message 1, 2, 3 (sequence 119 of voice packets) by dynamically adjusting the buffer DJB (in the sense of the size of the buffer)

[0119] These two parts operate independently of one another are both executed constantly. The procedure is presented in detail in the following:

Part A

[0120] The voice packets 110, 120 are created by the transmitter (data source) according to the ED-137 standard and labelled as RTP packets. The sampling instant of the first voice sample is entered in the RTP header H as a timestamp (T.sub.RTP). This is adopted as the time of sending ts of the voice packet 110 (also RTP packet). Sampling of the voice content 1 is understood to mean sampling analogue voice content 1 with a sampling frequency, for example 8 kHz, during which a digital value is respectively assigned to the voice content 1, that is to say the digitization of the voice content as voice data and the division into packets (payload PL) take place.

[0121] After the sampling, the voice packet 110 is transmitted via the IP-based network to the data sink by means of voice over IP, wherein the payload PL is also appended to the header H. The relative or actual time of sending out for the voice packet 110 at the transmitter is therefore given by the sum of the timestamp T.sub.RTP (time of sending ts) and the length or duration TPS of the payload PL (that is to say the temporal length or size of the voice data contained). For an RTP packet, the length or duration of the payload PL may also be termed RTP.sub.PL and may in practice be e.g. 10 ms.

[0122] For voice packets 120 without voice content (also termed R2S packets), the duration of the sampling is omitted, i.e. the R2S packet is shorter by the length or duration TPS of the payload PL (termed RTP.sub.PL for RTP packets). In voice packets 120 without voice content, the timestamp TRIP therefore directly corresponds to the time of sending (time of sending ta). This difference in the two types of voice packets 110, 120 is in fact taken into account in that the receiver time is standardized to voice packets 120 without voice content or correlates with the same and the expected time of receipt is correspondingly corrected by the duration of the payload RTP.sub.PL.

[0123] A timer (receiver clock T.sub.RECEIVER, termed T.sub.SINK in the following) runs in the receiver (data sink) with the same nominal frequency as a timer of the transmitter (transmitter clock of the data source). In other words, the clocks (timers) of transmitter and receiver run at the same speed for the time periods considered in the context of the invention of the order of magnitude of a radio message. Over longer times, design-mediated deviations of the timers of sink and receiver result, which are corrected automatically according to preferred embodiments of the invention. This has already been described and is also shown once more in the algorithms according to FIGS. 7 and 9, which are described below.

[0124] After the set-up of a session (in the sense of a connection of transmitter and receiver, which is maintained for the connection period 30), the timer T.sub.SINK is set to the time value T.sub.RTP+RTP.sub.PL of the first arrived voice packet 110 with voice content (RTP packet) or the first arrived voice packet 120 without voice content (R2S packet), where RTP.sub.PL=0 applies for a voice packet 120 without voice content. Thus, the time which is required for the sampling of the voice packet 110 for the payload PL, i.e. the duration RTP.sub.PL is included. This in fact corresponds to a standardization of the receiver clock to the transmitter clock with respect to the delay of voice packets 120 without voice content.

[0125] As a result, after the receipt of a voice first packet, termed the initial packet, during a connection period on a transmission channel of the voice packet, the receiver, a receiver clock is therefore initialized using a transmitter time by means of the time information in the initial packet, so that the receiver time correlates with the transmitter time when sending out the voice packet.

[0126] From then on, the times (timer values) T.sub.RTP+RTP.sub.PL of the arriving RTP and R2S packets, i.e. the timestamp TRIP of the voice packet 110, 120 (if appropriate corrected by the duration of the payload RTP.sub.PL for voice packets 110 with voice content) are compared with the receiver time (timer value of the timer in the sink) upon receipt of the voice packet, in order to determine a minimum relative packet transmission duration.

[0127] Algorithmically, this can be achieved as follows:

[0128] The start 300 of the algorithm shown in FIG. 7 begins after the establishment of the set-up 301 of a session and the receipt 302 of a first voice packet 110, 120 after the set-up of the session with the setting 303 of the receiver time T.sub.SINK=T.sub.RTP+RTP.sub.PL. Thus, the receiver time T.sub.SINK is correlated with the transmitter time T.sub.SOURCE at the time of sending ts of the voice packet 110, 120, which is entered in the header H of the voice packet 110, 120, i.e. T.sub.SOURCE at the time ts is T.sub.RTP. A possible duration of the payload RTP.sub.PL for a voice packet 110 with voice content 1 is corrected.

[0129] Within an adjustable interval T.sub.i (e.g. i.sub.max is equal to 100 or 200 voice packets) during the connection duration 30, the following processing is carried out after every receipt 304 of a voice packet 110, 120 (RTP or R2S packet).

[0130] At the start of each interval T.sub.i, a variable T.sub.MIN is set to the value ?MAX.sub.JITTER, e.g. ?200, wherein MAX.sub.JITTER specifies a maximum delay of the time of receipt, which must be taken into account. This corresponds to the maximum jitter that occurs. The time is specified as values without a unit of measure in this case. For example, a unit of measure may be 125 ?s (? kHz), i.e. the sampling time. However, the invention also functions with any desired other sampling times.

First Step 305:

[0131] Within an interval T.sub.i, the deviation of the actual time of receipt at the receiver (sink) is then calculated for each voice packet 110, 120 (RTP packet, R2S packet). This takes place in principle by means of the formation of the difference delta of the actual time of transmission (for voice packets 120 without voice content, R2S packets: timestamp T.sub.RTP, as RTP.sub.PL=0; for voice packets 110 without voice content, RTP packets: timestamp T.sub.RTP+RTP.sub.PL) and the receiver time (timer T.sub.SINK):

[00001] $delta = T_{RTP} + {RTP}_{PL} - T_{SINK}$

where RTP.sub.PL=0 applies for voice packets 120 without voice content (R2S packets).

[0132] This is based on the following idea. After the described initialization of the receiver time, the difference delta is equal to zero, if the jitter upon receipt of the voice packet 110, 120 for initializing the receiver time and upon receipt of the currently considered voice packet 110, 120 was equal, as the correlation of the receiver time corresponds exactly to the transmitter time plus the delay and the jitter.

[0133] If delta is greater than zero, the current voice packet 110, 120 was faster than the hitherto fastest voice packet, which formed the basis of the initialization or later adjustment of the receiver time. If, on the other hand, delta is less than zero, the current voice packet 110, 120 was slower than the voice packet which forms the basis of the receiver time.

[0134] In this respect, the receiver time (here: T.sub.SINK) is therefore in each case compared with the time information, contained in the header H of the subsequent voice packet 110, 120, for the transmitter time (here: T.sub.RTP plus, if appropriate, +RTP.sub.PL) when sending out the voice packet 110, 120, and a relative packet transmission duration is determined relative to the receiver time, which corresponds to the value delta.

Second Step 306:

[0135] Therefore, the fastest relative voice packet 110, 120 during the connection period can be determined in principle as a result.

[0136] According to an actual embodiment, it is additionally proposed that initially, in each interval Ti, the voice packet 110, 120 with the shortest relative delay (that is to say with the lowest relative jitter) is found by comparing the variable T.sub.MIN with delta. To this end, the variable T.sub.MIN, which is set at the start of each interval Ti to the highest relative jitter value (T.sub.MIN=?MAX.sub.JITTER) in principle, which is absolutely conceivable or is never reached in practice, is set to the value of delta, if a voice packet 110, 120 with a shorter relative delay is found in the interval T.sub.i, that is to say T.sub.MIN<delta applies.

[0137] When the measurement interval T.sub.i ends, T.sub.MIN therefore specifies the shortest relative delay of a voice packet 110, 120 that has occurred in the measurement interval T.sub.i.

[0138] A resultant minimum relative packet transmission duration is stored temporarily as T.sub.MIN, until the interval Ti ends.

Third Step 308:

[0139] After the or each interval T; ends 307, the receiver time T.sub.SINK is adjusted in such a manner as a function of the temporarily stored minimum relative packet transmission duration T.sub.MIN that the receiver time T.sub.SINK correlates with the transmitter time T.sub.SOURCE when sending out the voice packet with the minimum relative packet transmission duration.

[0140] This can be realized algorithmically using the following steps:

[0141] If the request 309 T.sub.MIN is greater than zero, this means that within the last interval Ti, at least one voice packet 110, 120 with shorter jitter JIT (i.e. shorter packet transmission time, delay) than hitherto has arrived. In this case, the receiver time (timer of the data sink, T.sub.SINK) is corrected as follows (correction 312 of the receiver time): T.sub.SINK=T.sub.SINK+T.sub.MIN. As a result, the receiver time is therefore advanced by the value T.sub.MIN. The time or timer correction described here is used to synchronize the receiver clock (timer of the sink) to the fastest voice packet 110, 120 within the interval Ti or the entire connection period 30, i.e. the transmitter time and the receiver time are equal for the time of sending and the time of receipt for a voice packet with the shortest jitter JIT that occurs, i.e. the shortest occurring packet transmission time until the receipt.

[0142] In the example presented, the adjustment of the receiver time as a function of the temporarily stored minimum relative packet transmission duration therefore takes place in such a manner that the receiver time is set back by the minimum relative packet transmission duration, if this indicates that a voice packet with a shorter packet transmission time was received. This takes place in a simple and reliable manner using the embodiment described here. However, other implementations are also conceivable, for example, that at least a certain number of faster packets must occur in the measurement interval before a correction of the receiver time is undertaken. Random measurement errors can be compensated as a result.

[0143] If T.sub.MIN is equal to 0, this means that within the last measurement interval T.sub.i, at 30) least one voice packet 110, 120 with the same (shortest) jitter JIT (i.e. shortest packet transmission time, delay) as hitherto has arrived. In this case, the receiver time (timer of the data sink, T.sub.SINK) is not corrected, but rather remains unchanged.

[0144] If the request 310 T.sub.MIN is less than 0, this means that within the last measurement interval T.sub.i, no voice packet with a shorter or the same jitter (packet transmission time, delay) as hitherto has arrived during the connection period 30. In this case, the receiver time (timer of the data sink, T.sub.SINK) is corrected as follows:

[00002] $T_{SINK} = T_{SINK} + (T_{MIN} / G),$

where G is a weighting factor.

[0145] Due to the weighting 311 of T.sub.MIN, the receiver time is corrected as a function of the temporarily stored minimum relative packet transmission duration T.sub.MIN in such a manner that the receiver time is set back if the temporarily stored minimum relative packet transmission duration indicates that only voice packets with a longer packet transmission duration were received, wherein the receiver time is set back by a defined or definable duration. The receiver time or timer correction described here is used such that the receiver clock is not corrected exclusively in the direction of faster voice packets 110, 102 (with shorter jitter JIT). A correction is also sensible for the opposite direction (only slower voice packets 110, 102 are received over a relatively long time). Thus, the fact may be taken into account, for example that the clocks at the source and sink are not running at exactly the same speed (particularly in the case of a relatively long connection period 30).

[0146] Also, it is automatically taken into account that due to a change in the transmission route (i.e. in particular changes in the transmission channel 10) all voice packets 110, 120 arrive later and the previously achieved value of the fastest voice packet 110, 120 (with the shortest jitter JIT) is therefore no longer valid.

[0147] The correction method described here takes this into account, in that the receiver time is corrected with a weighting factor (for example a weighting factor 3 is set), if slower packets occur exclusively in the measurement interval. Other implementation variants are also conceivable and possible, e.g. the correction by a fixed value or a different weighting factor, preferably in a range between 1 and 10, but also beyond that, if necessary.

[0148] This determination of the minimum relative packet transmission duration takes place for voice packets without voice content and voice packets with voice content, wherein the different length of the various voice packets is preferably corrected according to the invention.

[0149] After these preceding steps have been carried out, the counter i for the measurement interval T.sub.i and T.sub.MIN=?MAX.sub.JITTER is set back in a re-initialization 313 of the interval T.sub.i.

[0150] Using the described method and algorithm, it is possible to determine the relative minimum packet delay of a voice packet by means of the receiver time T.sub.SINK. This can be carried out completely in the receiver.

[0151] The various previously described steps can also be read on the basis of the flow chart for the initialization of the receiver clock and the determination of a minimum relative packet transmission duration according to FIG. 7. The individual steps of the described method procedure can be understood on the basis of the example log file, an extract of which is illustrated in FIG. 8. For the sake of clarity, the interval T.sub.i for this log file has 10 voice packets and will be considerably larger in real practice, for example 100 or 200 voice packets, and can be fixed by the person skilled in the art in a suitable manner. The weighting factor applied by way of example in the second part of the log file was 3. The measurement interval according to FIG. 8 is synonymous with the previously described interval T.sub.i.

Part B

[0152] After the initialization and adjustment of the receiver time by determining the minimum packet delay between transmitter (data source) and receiver (data sink), which preferably takes place continuously during a connection period 30 after the set-up of a connection between transmitter and receiver by evaluating the received voice packets 110, 120 with and without voice content according to FIG. 7, voice packets 110, 120 are exchanged permanently according to the ED-137 standard until the connection between transmitter and receiver ends.

[0153] According to the method described in FIG. 9, the start 400 of which also takes place with the set-up of a session 401 and which according to the invention can in principle run in parallel with the method illustrated in FIG. 7, the buffer DJB proposed according to the invention is determined as a dynamic jitter buffer as a function of the actual time of receipt of a voice packet.

[0154] Normally, during the connection period 30, voice packets 110 with voice content are then transmitted if a radio message 1, 2, 3, 4 should be sent out. The voice packets 120 without voice content are used for line monitoring of the radio channel 10, 20, for the time in which, during the connection period 30, no active radio transmission takes place (cf. FIG. 6). The field PT (Payload Type), which indicates the type of the voice packet 110, 120, is located in the header H of the voice packets 110 with voice content (RTP packet) and the voice packets 120 without voice content (R2S packet), as is defined as a header extension in the ED-137 Standard. A further PTT type field determines whether the voice contents should be broadcast via the flight radio transmitter. If this is set to PTT?0, the receiver is notified that voice content is transmitted with this voice packet 110. If, by contrast, the PTT type field is set to PTT=0, the receiver is notified that this voice packet 120 is transmitted without voice content. During normal operation, therefore, voice packets 120 without voice content are sent with PTToff (PTT=0) and voice packets 110 with voice content are sent with PTTon (PTT #0). A change from PTToff to PTTon therefore indicates the start of a radio message 1, 2, 3, 4. A change from PTTon to PTToff therefore indicates the end of a radio message 1, 2, 3, 4.

[0155] In order to detect such a change, according to FIG. 9, the PTT type fields of two consecutive voice packets 110, 120 are evaluated in each case, wherein a variable PTT.sub.OLD indicates which PPT type the preceding voice packet 110, 120 had, and the variable PTT indicates which PTT the current voice packet 110, 120 (or in other words the voice packet following the preceding voice packet) has.

[0156] Therefore, if initially voice packets 120 with PTToff are received and at a time, a voice packet 110 with PTTon is received, then this voice packet is the first voice packet 111 of a new radio message 1, 2, 3, 4. This voice packet 111 is then delayed by a dynamic buffer DJB as a function of the relative time of receipt according to the time of the receiver at receipt relative to the time information in the header H of the voice packet 111 for sending out the voice packet 111, until the receiver starts playing back the voice content 1, 2, 3, 4 (or the radio message).

[0157] After the start 400 and in the case of a set-up session 401, first the initialization 402 of the variable PTT.sub.OLD to the value PTT.sub.OLD=PTT.sub.off takes place. According to the invention, the receiver determines the value of the variable PTT as PTToff or PTTon upon receipt 403 of each voice packet 110, 120. By querying 404 the variables PTT.sub.OLD and PTT, it is determined whether the voice packet is a first voice packet 111 with voice content after at least one voice packet 120 without voice content, for example by means of the query PTT=PPT.sub.on && PTT.sub.OLD=PPT.sub.off. If this is not the case, the value of the variable PTT.sub.OLD is set to the value of the variable PTT (step 410) and the receipt of a new voice packet 110, 120 is expected.

[0158] If this is actually a first voice packet 111 with voice content after a preceding voice packet 120 without voice content, the time of receipt of this first voice packet 111 is determined by comparing the time information in the received voice packet 111 with the receiver time upon receipt. Subsequently, the (dynamic) buffer DJB according to the invention is determined as a function of the actual time of receipt.

[0159] According to a preferred embodiment, this functions for example using the procedure described below:

[0160] In the case of a voice packet 111, a difference deltaJB is determined 405 of the actual time of receipt T.sub.RTP+RTP.sub.PL from the receiver time T.sub.SINK, which indicates the expected time of receipt for a voice packet 110 with voice content with the hitherto shortest jitter that has occurred:

[00003] $deltaJB = T_{RTP} + {RTP}_{PL} - T_{SINK}$

[0161] In the case of a voice packet 120 without voice content, which should not occur at all in this method step, the duration of the payload would be RTP.sub.PL=0.

[0162] This is in accordance with the determination of the relative packet transmission duration 305 by the difference delta according to FIG. 7.

[0163] Subsequently, a querying 406 of the difference deltaJB takes place. If deltaJB is greater than zero (deltaJB>0; query 406 j), this means that the voice packet 111 has a shorter relative packet transmission time (that is to say with lower jitter) than had hitherto been found during the adjustment of the receiver time according to Part A. Then, the playback of the voice content 1, 2, 3, 4 (radio message) should be delayed by the full length of the predefined maximum buffer (Defaultjitterbuffersize; e.g. 160), which was fixed for the voice packets with the shortest possible jitter JIT beforehand by the person skilled in the art in a suitable manner (setting 407 the buffer DJB to the maximum size MAX=Defaultjitterbuffersize):

DJB=MAX

[0164] DJB in this case describes the buffer, i.e. the size or temporal length of the buffer.

[0165] If deltaJB is less than zero (that is to say deltaJB<0; query 406 n), this means that the voice packet 111 has a longer relative packet transmission time (that is to say with higher jitter) than had hitherto been found during the adjustment of the receiver time according to Part A. Then, the playback of the voice content 1, 2, 3, 4 (radio message) should not be delayed by the full length of the predefined maximum buffer (Defaultjitterbuffersize=MAX). The dynamic buffer DJB according to the invention is reduced by the deviation from the shortest delay (that is to say deltaJB: setting 408 of the buffer DJB to a reduced size RED):

[00004] $DJB = Defaultjitterbuffersize - .Math. deltaJB .Math.$

[0166] |deltaJB| here describes the (positive) value of the inherently negative variable deltaJB.

[0167] Should the size of the buffer DJB become less than zero, JB.sub.SIZE=0 is set (zeroing 409 the buffer DJB.

[0168] Also, after that, the value of the variable PTT.sub.OLD=PPT is set to the value of the variable PTT (step 410) and the receipt of a new voice packet 110, 120 is expected.

[0169] The various previously described steps can also be read on the basis of the flow chart for delaying the first voice packet 111 of a radio message 1, 2, 3, 4 (sequence 119 of voice packets) by dynamically adjusting the buffer DJB according to FIG. 9. The individual steps of the described method procedure can be understood from the example log file, an extract of which is illustrated in FIG. 10. The third voice packet illustrated is the first voice packet 110 with voice content (PTTon=1). As this packet is too late by 7 time units, when considered relatively (deltaJB=delta=?7), the predefined maximum buffer (Defaultjitterbuffersize=160) is reduced to 153 according to the calculation 160?7=153.

REFERENCE LIST

[0170] 1 Voice content, radio message [0171] 2 Voice content, radio message [0172] 3 Voice content, radio message [0173] 4 Voice content, radio message [0174] 10 First transmission channel of the IP-based network [0175] 20 Second transmission channel of the IP-based network [0176] 30 Connection period [0177] 50 Timeline during sending [0178] 51 Timeline on the transmission channel [0179] 90 Second variant of the voice output [0180] 91 First variant of the voice output [0181] 92 Interruption during the voice output [0182] 110 Voice packet with voice content on the first transmission channel [0183] 111 First voice packet with voice content [0184] 119 Sequence of voice packets [0185] 120 Voice packet without voice content [0186] 210 Voice packet with voice content on the second transmission channel [0187] 211 First voice packet with voice content [0188] 230 Packet transmission time or duration [0189] 300 Start of the algorithm for determining the minimum packet delay [0190] 301 Set-up of a session [0191] 302 Receipt of a first voice packet after the set-up of the session [0192] 303 Setting the receiver time [0193] 304 Receipt of a voice packet [0194] 305 First step: Determining the relative packet transmission duration [0195] 306 Second step: Determining the voice packet with minimum relative packet transmission duration [0196] 307 Checking for the end of the interval [0197] 308 Third step: Adjusting the receiver time [0198] 309 Query if T.sub.MIN greater than zero [0199] 310 Query if T.sub.MIN less than zero [0200] 311 Weighting of T.sub.MIN [0201] 312 Correction of the receiver time [0202] 313 Re-initialization of the interval [0203] 400 Start of the algorithm for fixing the dynamic buffer [0204] 401 Set-up of a session [0205] 402 Initializing the variable PTT.sub.OLD [0206] 403 Receipt of a voice packet [0207] 404 Querying the variables PTT.sub.OLD and PTT [0208] 407 Setting the buffer DJB to the maximum size [0209] 408 Setting the buffer DJB to a reduced size [0210] 409 Zeroing the buffer DJB [0211] H Header of a voice packet (communication control) [0212] PL Payload (voice content as digitized voice data) [0213] DEL Delay of the voice packet on the transmission channel (delay) [0214] JIT Jitter (random temporal fluctuations between times of receipt of voice packets) [0215] SJB Static buffer (jitter buffer) (prior art) [0216] DJB Dynamic buffer (jitter buffer) (invention) [0217] TPS Length of the payload (duration of sending out the payload) [0218] DDC Additional delay correction for various transmission channels [0219] TWD Time period from the sending out of the first voice packet until the start of the processing [0220] PPT PTT field of the current voice packet [0221] PPT.sub.OLD PPT of the preceding voice packet [0222] T.sub.SINK Receiver time [0223] T.sub.SOURCE Transmitter time [0224] delta Difference of the actual time of receipt and the receiver time [0225] deltaJB Difference of the actual time of receipt and the receiver time [0226] ts Time of sending of the voice packet [0227] ta Actual time of transmission (time of sending out) [0228] te Time of receipt of the voice packet [0229] tw Start of the voice output (time of output or time of processing)

METHOD FOR JITTER COMPENSATION DURING RECEIPT OF VOICE CONTENT OVER IP-BASED NETWORKS AND RECEIVER FOR THAT AND METHOD AND DEVICE FOR SENDING AND RECEIVING VOICE CONTENT WITH JITTER COMPENSATION

Inventors

Cpc classification

Classification Explorer

H04L65/764

ELECTRICITY

Classification Explorer

H04M7/0084

ELECTRICITY

Classification Explorer

H04J3/0664

ELECTRICITY

Classification Explorer

H04L47/10

ELECTRICITY

Classification Explorer

H04J3/0632

ELECTRICITY

Classification Explorer

H04L47/283

ELECTRICITY

Classification Explorer

H04L49/9023

ELECTRICITY

Classification Explorer

H04L65/1059

ELECTRICITY

Classification Explorer

H04L47/2416

ELECTRICITY

Classification Explorer

H04L65/70

ELECTRICITY

Classification Explorer

H04L65/80

ELECTRICITY

International classification

Classification Explorer

H04M7/00

ELECTRICITY

Classification Explorer

H04L49/9023

ELECTRICITY

Classification Explorer

H04L65/80

ELECTRICITY

Abstract

Claims

Description