What are the lossless compression algorithms

Definition of lossless compression

The so-called lossless compression format, as the name implies, is an audio format that compresses sound signals without loss. Common formats like MP3, WMA, etc. are lossy compression formats. Compared to the source WAV files, they all have a considerable degree of signal loss, which is the root cause of their 10% compression rate. The lossless compression format is like using a compression software such as Zip or RAR to compress the audio signal. The resulting compression format is restored to a WAV file, which is exactly the same as the source WAV file! But if you use Zip or RAR to compress WAV files, you must unzip the compressed package before playing. The lossless compression format can be played in real time directly through the playback software, which is exactly the same as the lossy format such as MP3. In short, the lossless compression format is a format that can reduce the volume of WAV files without sacrificing any audio signal.

Commonly used lossless compression algorithms include Shannon-Fano encoding, Huffman encoding, Run-length encoding, LZW (Lempel-Ziv-Welch) encoding, and arithmetic encoding.

Huffman encoding

This method completely constructs the codeword with the shortest average length of the different prefixes based on the occurrence probability of characters, which is sometimes called the best encoding, and is generally called Huffman encoding. It is a coding method that statistical independent sources can reach the minimum average code length. High coding efficiency.

Fundamental:

The code is constructed according to the probability of the occurrence of source characters. The source characters with a higher probability of occurrence are given a shorter code length, while the source characters with a lower probability of occurrence are given a longer code length. Finally, the encoded The average codeword is the shortest.

Coding steps:

(1) Initialization, according to the size of the symbol probability, the symbols are sorted in descending order.

(2) The two symbols with the smallest probability form a node.

(3) Repeat step 2.

(4) Starting from the root node to the "leaf" corresponding to each symbol, from top to subscript marked with "0" (upper branch) or "1" (lower branch) as to which is "1" and which is "0" is irrelevant It is important that the final result is only that the assigned codes are different, and the average length of the codes is the same.

(5) Write the code of each symbol from the root node along the branch to each leaf.

What are the lossless compression algorithms

Points to note about Huffman coding:

Huffman coding has no error protection function. If there is an error in the code, it may cause a series of subsequent decoding errors.

Huffman encoding is variable-length encoding, so it is difficult to find or call the content of the file at will.

Huffman depends on the statistical characteristics of the source. Each codeword encoded by Huffman is an integer: so in fact, the average code length is difficult to reach the size of the information entropy.

Huffman coding and decoding must have a code table. If the number of messages is large, the code table to be stored is also large, which will affect the storage capacity of the system and the speed of coding and decoding.

Arithmetic coding

What are the lossless compression algorithms

Arithmetic coding represents a set of sources as an interval between 0 and 1 on the real number line. Each element in this set must be used to shorten this interval. The more elements in the source set, the smaller the resulting interval. When the interval becomes smaller, you need some more digits to represent the interval. This is the principle of the interval as the code. Arithmetic coding first assumes a probability model of the source, and then uses these probabilities to narrow the interval representing the source set.

Two basic parameters are used in arithmetic coding:

Symbol probability and its coding interval

The probability of the source symbol determines the efficiency of the compression coding, and also determines the interval of the source symbol in the encoding process, and these intervals are included between 0 and 1.

Several issues that need to be noted in arithmetic coding:

1. Since the accuracy of the actual computer cannot be infinitely long, overflow in the operation is an obvious problem, but most machines have 16-bit, 32-bit or 64-bit precision, so this problem can be solved using the scaling method.

2. The arithmetic encoder generates only one code word for the entire message. This code word is a real number in the interval [0, 1), so the decoder cannot decode until it receives all the bits representing this real number.

3. Arithmetic coding is also a coding method that is very sensitive to errors. If there is an error in one bit, the entire message will be translated incorrectly.

Run-length coding

RLE (Run-Length Encoding) is a compression scheme for data that contains multiple repetitions arranged in sequence. The principle is to replace a series of repeated values â€‹â€‹with a single value plus a count value, and the stroke length is the number of consecutive and repeated units. If you want to get the original data, just expand this code.

What are the lossless compression algorithms

Comparing the number of codes before and after RLE encoding, it can be found that 73 codes are used to represent the data in this line before encoding, and only 10 codes are used to represent the original 73 codes after encoding. The ratio of the amount of data before and after compression is about 771 , Ie the compression ratio is 7: 1. This shows that RLE is indeed a compression technology, and the encoding technology is practical.

The performance of RLE depends on the characteristics of the image itself. RLE compression coding is especially suitable for computer-generated images, and is very effective for reducing the storage space of image files. However, because natural images with rich colors often have few consecutive pixels on the same line with the same color, and the number of consecutive lines with the same color value on several consecutive lines is even less, if you still use the RLE encoding method, you can not only compress the image Instead, the data may make the original image data larger.

The decoding is carried out according to the same rules as used in the encoding, and the data obtained after the restoration is exactly the same as the data before compression.

Therefore, RLE is a lossless compression technology. It is used in encodings such as BMP, JPEG / MPEG, TIFF, and PDF, and is also used in fax machines.

LZW encoding

LZW achieves compression by establishing a string table and using shorter codes to represent longer strings. The correspondence between the string and the encoding is dynamically generated during the compression process, and is implicitly included in the compressed data. When decompressing, it is restored according to the table. It is a lossless compression. The full name is Lempel-ziy-Welch encoding, referred to as LZW. Compression algorithm.

Fundamental

Extract the different characters in the original text file data, create a compilation table based on these characters, and then replace the corresponding characters in the original text file data with the index of the characters in the compilation table to reduce the size of the original data. It looks similar to the realization principle of the palette image, but it should be noted that the compilation table here is not created in advance, but is dynamically created based on the original file data, and the encoded data is used when decoding. Restore the original compilation table.

The specific implementation steps of the LZW coding algorithm are as follows:

1. The dictionary at the beginning contains all possible roots (Root), and the current prefix P is empty;

2. The current character (C): = the next character in the character stream;

3. Determine whether the level-character string P + C is in the dictionary

(1) If "Yes": P: = P + C // (Extend P with C)

(2) If "No

â‘ Output the codeword representing the current prefix P to the codeword stream

â‘¡ Add the suffix-character string P + C to the dictionary;

â‘¢Let P: = C / (the current P only contains a sub-character C)

4. Determine whether there are still code words to be translated in the code word stream

(1) If â€œYesâ€, return to step 2

(2) If "No"

â‘ Output the code representing the current prefix P to the codeword stream

â‘¡End

VOZOL D5 Vape

VOZOL D5 Vape are so convenient, portable, and small volume, you just need to take them
out of your pocket and take a puff, feel the cloud of smoke, and the fragrance of fruit surrounding you. It's so great.
We are China leading manufacturer and supplier of Disposable Vapes puff bars, vozol d5 disposable vape,vozol d5 vape kit,
vozol d5 vape pen, and e-cigarette kit, and we specialize in disposable vapes, e-cigarette vape pens, e-cigarette kits, etc.

vozol d5 disposable vape,vozol d5 vape kit,vozol d5 vape pen,vozol d5 vape 1000 puffs,vozol d5 e-cigarette 1000 puffs

Ningbo Autrends International Trade Co.,Ltd. , https://www.supervapebar.com