What is Video encoding?
Video encoding is the process of converting original video to digital file so that they are not saved as individual images but as fluid videos. In other words, it is the process of compressing and changing an analog source to a digital one. Generally, a coder/decoder (codec) is used for video file compression.
before, video files were all in raw formats, they were a collection of still photos. For a video recorded at 30 frames per second (fps), you must use 30 photos per second of footage. Therefore, 1800 images per minute are used for a video, resulting in a massive video file size. The only solution to overcome this issue is to compress these videos, but the quality will be lost through this process. Later, engineers developed video encoding techniques to compress these files without compromising the quality.

What is Video encoding?
Video encoding is the process of converting original video to digital file so that they are not saved as individual images but as fluid videos. In other words, it is the process of compressing and changing an analog source to a digital one. Generally, a coder/decoder (codec) is used for video file compression.
before, video files were all in raw formats, they were a collection of still photos. For a video recorded at 30 frames per second (fps), you must use 30 photos per second of footage. Therefore, 1800 images per minute are used for a video, resulting in a massive video file size. The only solution to overcome this issue is to compress these videos, but the quality will be lost through this process. Later, engineers developed video encoding techniques to compress these files without compromising the quality.
Video encoding targeting
Reducing file size
Reducing buffering for streaming video
Changing video resolution or aspect ratio
Changing audio format or quality
Converting obsolete files to modern formats
Making a video compatible with a certain device like a computer, tablet, smartphone,
smart TV, et
What are codecs?
Codec is a software or video encoder that encodes or decodes a digital data stream or signal. It compresses the raw video and audio files between analog and digital formats and makes them smaller. The block diagram of a codec is shown below.

Encoder – handles compression.
Decoder – decompresses the file and prepares it for viewing.
What is H.264/AVC?
264 is also called Advanced Video Coding (AVC) or MPEG-4 Part 10. It is a video compression technology developed by the International Telecommunications Union (as H.264) and the International Organization for Standardization/International Electrotechnical Commission Moving Picture Experts Group (as MPEG-4 Part 10, Advanced Video Coding, or AVC).
Nowadays, H.264 codec is most commonly used in video streaming. This codec is an industry standard for video compression that helps creators record, compress, and distribute their online videos. It delivers good video quality at lower bitrates compared to previous standards. Hence, it is widely used in cable TV broadcasting and Blu-ray disks.
How does H.264/AVC work?
The H.264 video encoder performs prediction, transformation, and encoding processes to produce a compressed H.264 bitstream. It utilizes a block-oriented standard with motion competition to process frames of video content. The output will be macroblocks that consist of block sizes as large as 16×16 pixels.
Now, the H.264 video decoder performs complementary processes like decoding, inverse transform, and reconstruction to produce a decoded video sequence. It receives the compressed H. 264 bitstream, decodes each syntax element, and extracts the information like quantized transform coefficients, prediction information, etc. Further, this information will be used to reverse the coding process and recreate a sequence of video images. The H.264 video coding and decoding process is shown below.
Advantages of H.264
Lower bandwidth usage and higher resolution monitoring – It provides high-quality transmission of full-motion video with lower bandwidth requirements and lower latency than traditional video standards like MPEG-2. H.264 uses an efficient codec that provides high-quality images and uses minimal bandwidth.
Lower H.264 bitrate than other formats – It has an 80% lower bitrate than Motion JPEG video. It is estimated that the bitrate savings can be 50% or more compared to MPEG-2. For example, H.264 can provide a better image quality at the same compressed bitrate. At a lower bitrate, it provides the same image quality.
Reduced demand for video storage – It reduces the size of digital video file content by 50% and uses less storage to store video compared to other standards that prove essential to allow easy video transmission through IP.
Incredible video quality– It delivers clear, high-quality video content at a data rate of ¼, which is half the size of the other video format.
More efficient – It is two times more efficient, and the file size is 3X times smaller than the MPEG-2 codecs – making this compression format more efficient. This codec results in low transmission bandwidth for video content.
Suitable for slow-motion video content– It is extremely efficient for low-motion video codecs using megapixel cameras.
Decoded picture buffering
Previously encoded pictures are used by H.264/AVC encoders to provide predictions of the values of samples in other pictures. This allows the encoder to make efficient decisions on the best way to encode a given picture. At the decoder, such pictures are stored in a virtual decoded picture buffer (DPB). The maximum capacity of the DPB, in units of frames (or pairs of fields), as shown in parentheses in the right column of the table above, can be computed as follows:
Hardware
Because H.264 encoding and decoding requires significant computing power in specific types of arithmetic operations, software implementations that run on general-purpose CPUs are typically less power efficient. However, the latest quad-core general-purpose x86 CPUs have sufficient computation power to perform real-time SD and HD encoding. Compression efficiency depends on video algorithmic implementations, not on whether hardware or software implementation is used. Therefore, the difference between hardware and software based implementation is more on power-efficiency, flexibility and cost. To improve the power efficiency and reduce hardware form-factor, special-purpose hardware may be employed, either for the complete encoding or decoding process, or for acceleration assistance within a CPU-controlled environment.
CPU based solutions are known to be much more flexible, particularly when encoding must be done concurrently in multiple formats, multiple bit rates and resolutions (multi screen video), and possibly with additional features on container format support, advanced integrated advertising features, etc. CPU based software solution generally makes it much easier to load balance multiple concurrent encoding sessions within the same CPU.
The 2nd generation Intel “Sandy Bridge” Core i3/i5/i7 processors introduced at the January 2011 CES (Consumer Electronics Show) offer an on-chip hardware full HD H.264 encoder, known as Intel Quick Sync Video.
A hardware H.264 encoder can be an ASIC or an FPGA.
ASIC encoders with H.264 encoder functionality are available from many different semiconductor companies, but the core design used in the ASIC is typically licensed from one of a few companies such as Chips&Media, Allegro DVT, On2 (formerly Hantro, acquired by Google), Imagination Technologies, NGCodec. Some companies have both FPGA and ASIC product offerings.
Texas Instruments manufactures a line of ARM + DSP cores that perform DSP H.264 BP encoding 1080p at 30fps. This permits flexibility with respect to codecs (which are implemented as highly optimized DSP code) while being more efficient than software on a generic CPU.
H.265 Standard
The High Efficiency Video Coding (HEVC) protocol, also known as H.265, was developed as a successor to H.264 by the Joint Collaborative Team on Video Coding (JCT-VC). The H.265 protocol reduces the bit rate required for streaming video when compared to H.264 while maintaining comparable video quality.
Due to the complexity of this new encoding/decoding protocol, more advanced processing is required. However, advantages such as lower latency can be realized with the H.265 protocol.
Some manufacturers, such as RGB Spectrum, take advantage of both H.264 and H.265 streaming protocols. Zio decoders, for example, can decode both H.264 and H.265 streams. This allows the customer to use whichever protocol is appropriate for the application
High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, is a video compression standard designed as part of the MPEG-H project as a successor to the widely used Advanced Video Coding (AVC, H.264, or MPEG-4 Part 10). In comparison to AVC, HEVC offers from 25% to 50% better data compression at the same level of video quality, or substantially improved video quality at the same bit rate. It supports resolutions up to 8192×4320, including 8K UHD, and unlike the primarily 8-bit AVC, HEVC’s higher fidelity Main 10 profile has been incorporated into nearly all supporting hardware.
While AVC uses the integer discrete cosine transform (DCT) with 4×4 and 8×8 block sizes, HEVC uses both integer DCT and discrete sine transform (DST) with varied block sizes between 4×4 and 32×32. The High Efficiency Image Format (HEIF) is based on HEVC.As of 2019, HEVC is used by 43% of video developers, and is the second most widely used video coding format after AVC.
HEVC was designed to substantially improve coding efficiency compared with H.264/MPEG-4 AVC HP, i.e. to reduce bitrate requirements by half with comparable image quality, at the expense of increased computational complexity. HEVC was designed with the goal of allowing video content to have a data compression ratio of up to 1000:1. Depending on the application requirements, HEVC encoders can trade off computational complexity, compression rate, robustness to errors, and encoding delay time Two of the key features where HEVC was improved compared with H.264/MPEG-4 AVC was support for higher resolution video and improved parallel processing methods.
HEVC is targeted at next-generation HDTV displays and content capture systems which feature progressive scanned frame rates and display resolutions from QVGA (320×240) to 4320p (7680×4320), as well as improved picture quality in terms of noise level, color spaces, and dynamic range
Video coding layer
The HEVC video coding layer uses the same “hybrid” approach used in all modern video standards, starting from H.261, in that it uses inter-/intra-picture prediction and 2D transform coding. A HEVC encoder first proceeds by splitting a picture into block shaped regions for the first picture, or the first picture of a random access point, which uses intra-picture prediction. Intra-picture prediction is when the prediction of the blocks in the picture is based only on the information in that picture. For all other pictures, inter-picture prediction is used, in which prediction information is used from other pictures After the prediction methods are finished and the picture goes through the loop filters, the final picture representation is stored in the decoded picture buffer. Pictures stored in the decoded picture buffer can be used for the prediction of other pictures.
HEVC was designed with the idea that progressive scan video would be used and no coding tools were added specifically for interlaced video. Interlace specific coding tools, such as MBAFF and PAFF, are not supported in HEVC. HEVC instead sends metadata that tells how the interlaced video was sent Interlaced video may be sent either by coding each frame as a separate picture or by coding each field as a separate picture. For interlaced video HEVC can change between frame coding and field coding using Sequence Adaptive Frame Field (SAFF), which allows the coding mode to be changed for each video sequence. This allows interlaced video to be sent with HEVC without needing special interlaced decoding processes to be added to HEVC decode
EVC replaces 16×16 pixel macroblocks, which were used with previous standards, with coding tree units (CTUs) which can use larger block structures of up to 64×64 samples and can better sub-partition the picture into variable sized structures. HEVC initially divides the picture into CTUs which can be 64×64, 32×32, or 16×16 with a larger pixel block size usually increasing the coding efficiency

Parallel processing tools
Tiles allow for the picture to be divided into a grid of rectangular regions that can independently be decoded/encoded. The main purpose of tiles is to allow for parallel processing Tiles can be independently decoded and can even allow for random access to specific regions of a picture in a video stream.
Wavefront parallel processing (WPP) is when a slice is divided into rows of CTUs in which the first row is decoded normally but each additional row requires that decisions be made in the previous row WPP has the entropy encoder use information from the preceding row of CTUs and allows for a method of parallel processing that may allow for better compression than tiles.
Tiles and WPP are allowed, but are optional. If tiles are present, they must be at least 64 pixels high and 256 pixels wide with a level specific limit on the number of tiles allowed.
Slices can, for the most part, be decoded independently from each other with the main purpose of tiles being the re-synchronization in case of data loss in the video stream. Slices can be defined as self-contained in that prediction is not made across slice boundaries. When in-loop filtering is done on a picture though, information across slice boundaries may be required. Slices are CTUs decoded in the order of the raster scan, and different coding types can be used for slices such as I types, P types, or B types
Dependent slices can allow for data related to tiles or WPP to be accessed more quickly by the system than if the entire slice had to be decoded The main purpose of dependent slices is to allow for low-delay video encoding due to its lower latency.
HEVC has 33 intra prediction modes
HEVC specifies 33 directional modes for intra prediction compared with the 8 directional modes for intra prediction specified by H.264/MPEG-4 AVC. HEVC also specifies DC intra prediction and planar prediction modes. The DC intra prediction mode generates a mean value by averaging reference samples and can be used for flat surfaces. The planar prediction mode in HEVC supports all block sizes defined in HEVC while the planar prediction mode in H.264/MPEG-4 AVC is limited to a block size of 16×16 pixels. The intra prediction modes use data from neighboring prediction blocks that have been previously decoded from within the same picture.
Tiers and levels with maximum property values[25] | ||||||
Level | Max luma sample rate | Max luma picture size | Max bit rate for Main | Example picture resolution @ | ||
Main tier | High tier | |||||
1 | 552,960 | 36,864 | 128 | – | 128×96@33.7 (6)176×144@15 (6) | |
2 | 3,686,400 | 122,880 | 1,500 | – | 176×144@100 (16)352×288@30 (6) | |
2.1 | 7,372,800 | 245,760 | 3,000 | – | 352×288@60 (12)640×360@30 (6) | |
3 | 16,588,800 | 552,960 | 6,000 | – | 640×360@67.5 (12) | |
3.1 | 33,177,600 | 983,040 | 10,000 | – | 720×576@75 (12) | |
4 | 66,846,720 | 2,228,224 | 12,000 | 30,000 | 1,280×720@68 (12) | |
4.1 | 133,693,440 | 20,000 | 50,000 | 1,280×720@136 (12) | ||
5 | 267,386,880 | 8,912,896 | 25,000 | 100,000 | 1,920×1,080@128 (16) | |
5.1 | 534,773,760 | 40,000 | 160,000 | 1,920×1,080@256 (16) | ||
5.2 | 1,069,547,520 | 60,000 | 240,000 | 1,920×1,080@300 (16) | ||
6 | 1,069,547,520 | 35,651,584 | 60,000 | 240,000 | 3,840×2,160@128 (16) | |
6.1 | 2,139,095,040 | 120,000 | 480,000 | 3,840×2,160@256 (16) | ||
6.2 | 4,278,190,080 | 240,000 | 800,000 | 3,840×2,160@300 (16) |
H.264
Pros:
High-quality coding
Most widely used codec
Good device, browser, and container compatibility
Uses less computing power
Cons:
Uses more bandwidth
Not the highest quality on the market
Lossier than HVAC
H.265
Pros:
Higher-quality and more efficient coding
Requires half the bandwidth
Almost lossless encoding
Better motion prediction and compensation
Cons:
Not as widely used
Limited compatibility with devices and browsers
Requires more powerful equipment