The VC-5 Codec

Simple website describing the history of the SMPTE VC-5 codec and the status of work underway at SMPTE.

History

The VC-5 Codec was originally developed by CineForm: a company founded to create software solutions for video editing. CineForm was bought by GoPro to build its software development capability.

The motivation for creating a new codec was to have a very simple codec sufficient for compressing video over USB as part of an effort to create a video capture dongle. That project was never completed, but the codec proved to be so efficient that it was used in CineForm solutions for video processing and editing. The codec allowed real-time editing of video without requiring the installation of special-purpose hardware.

The codec was never given an official name but internally was designated “Cedoc” and that name lives on in the codebase maintained by GoPro.

Characteristics

The CineForm codec and its VC-5 standard use 32-bit tag-value pairs to represent information in the bitstream. The first 16 bits (in big-endian order) specify the tag (for example, image width or height) and the second 16 bits carry the value.

To represent information such as codeblocks that are longer than 16 bits, certain tag-value pairs use the tag to represent the type of data and the value to represent the size of the data chunk that follows the tag-value pair.

Key advantages to the CineForm/VC-5 codec:

Differences

There are some differences between the CineForm/GoPro codec and the VC-5 standard:

  1. The tag number for a few tag-value pairs was changed and new tag-value pairs were introduced in the VC-5 stndard.

  2. Some portions of the CineForm/GoPro bitstream were little-endian, but mostly big-endian, so the VC-5 standard changed all of the bitstream to big-endian (network order).

  3. Blocks of entropy-encoded wavelet coefficients (codeblocks) have an explicit representation in the Vc-5 bitstream using chunks.

Codeblocks are signaled by a tag-value pair with the length of the codeblock carried as the value.

Standardization

GoPro started the process of standardizing the codec through SMPTE with the goal of getting the codec into professional post-production. To that end, it was necessary to create an IMG application which required standardizing the bitstream and defining how to embed the bitstream into an MXF file.

SMPTE standard video codecs are numbered and the CineForm codec was the fifth standardization project undertaken through SMPTE so the name “VC-5” (video codec number 5) was assigned.

The CineForm codec has been standardized by SMPTE under the name VC-5 and with the 2073 document number.

The VC-5 standards documents have multiple parts. Collectively, the set of VC-5 standards documents is called the VC-5 Standards Suite.

SMPTE standards names have the format “SMPTE type number-part:year” where the year is the year of publication. In the case of VC-5, the number will always be “2073” with the part designating a particular document in the VC-5 Standards Suite. All of the documents in the VC-5 Standards Suite are type “ST” meaning the document is a standard, with two exceptions:

The sample encoder, reference decoder, other software described in RP 2073-2 and the bitstream vectors and test images are collectively called the “test materials”.

Overview

An overview of each of the VC-5 standards is provided in the overview document SMPTE OV 2073-0.

Elementary Bitstream

The VC-5 bitstream is specified by SMPTE ST 2073-1 Elementary Bitstream. This document defines the syntax and semantics of all VC-5 bitstreams.

A VC-5 bitstream can contain one or more rectangular arrays of integer components with a precision of at most 16 bits. The elementary bitstream standard does not explicitly specify how to encode an image into a VC-5 bitstream. It only provides the framework for specifying how images and other rectangular arrays can be encoded using the VC-5 standards.

Conformance Specification

The SMPTE RP 2073-2 Conformance Specification defines how to verify the compliance of an encoder or decoder implementation with the VC-5 standards. This standard is updated for every additional part added to the VC-5 Standards Suite to specify how to verify compliance of new and revised parts.

Image Formats

SMPTE ST 2073-3 Image Formats specifies how to encode common image formats such as RGB(A) and YCbCr(A) without color difference component subsampling into a VC-5 bitstream. This standard adds tag-value pairs to encode image-specific information such as the pixel format of the source image.

SMPTE ST 2073-3 introduces the concept of a pattern element: a rectangular subset of component samples in an image corresponding to a single pixel. For example, an RGB image would comprise three component arrays, one for each color component, and each pattern element comprises a single component sample. The concept of pattern element is very useful for describing Bayer images. For example, a pattern element in a typical Bayer image might comprise a 2 by 2 pattern element containing R, G, G, and B color components.

Subsampled Color Difference Components

Images can be representing using YCbCr color components. The Cb and Cr components may be subsampled. SMPTE ST 2073-4 Subsampled Color Difference Components extends SMPTE ST 2073-3 to describe subsampled color difference components using an extension of the pattern element concept. SMPTE ST 2073-4 adds tag-value pairs that describe the subsampling scheme.

Layers

Some images logically comprise multiple images with the same dimensions and pixel format. For example, a stereo pair is two images representing the left and right halves of the stereo pair. Each image has the same dimensions and format.

SMPTE ST 2073-5 Layers adds the capability to represent multiple images in the bitstream, each image having the same dimensions and format. Each individual image is called a layer.

Applications of layers include stereo pairs, multiple image exposures for HDR, and the top and bottom images in interlaced video.

Sections

A VC-5 bitstream is a sequence of tag-value pairs. The model decoder is a simple state machine that transitions to the next image component or wavelet transform within an image component. Nothing in the VC-5 bitstream explicitly identifies the structure in the sequence of tag-value pairs.

SMPTE ST 2073-6 Sections adds tag-value pairs that can be used to delineate semantically relevant portions of the bitstream. For example, section tags can identify each image component within the bitstream or each wavelet transform within a component.

Sections enable additional capabilities including:

If image component arrays are delineated using sections, then the decoder can skip components that do not have to be decoded. For example, if the image represented in the bitstream contains Y, Cb, and Cr components and the output image is monochrome, then it is not necessary to decode the Cb and Cr components.

Wavelet transforms are present in the bitstream in order from small (lower resolution) to large (higher resolution). If wavelet transforms are delineated using sections and the output image has reduced resolution, then the larger (higher resolution) transforms can be skipped.

Sections also allow multiple images with different dimensions, formats, and other characteristics to be represented in a single VC-5 bitstream.

Metadata

The CineForm codec was created for video editing and incorporated many capabilities that are more often associated with a specific application, not the bitstream. Added image processing capabilities to the bitstream itself allows the processing to be performed regardless of the application that is calling the codec. For example, an application requests a video frame but the decoded image has color correction applied before the decoded frame is delivered to the caller.

ST 2073-7 Metadata specifies the method for embedding metadata in a VC-5 bitstream.

There are four types of metadata supported by the VC-5 codec:

  1. Intrinsic metadata that assist in decoding the images represented by a VC-5 bitstream,
  2. Extrinsic metadata defined by other standards,
  3. Streaming data, and
  4. Dark metadata.

Examples of extrinsic metadata include Adobe XMP metadata. The XML representation can be embedded in the VC-5 bitstream and extracted during decoding.

Streaming data is used for time series measurements associated with camera applications such as GPS coordinates and accelerometer readings and is based on the GoPro Metadata Format (GPMF).

Source code and sample data for GPMF is available as open-source from GitHub

Dark metadata is intended for metadata that does not have a published standard such as vendor-specific metadata.

Landing page for Metadata Extraction at GoPro Labs.

MXF Wrapper

SMPTE uses the Material Exchange Format (MXF) as the container for video and audio tracks.

SMPTE ST 2073-10 specifies how to embed a VC-5 bitstream as an video track in an MXF generic container.

Current and Upcoming Projects

IMF Application VC-5

A draft of ST 2067-72 for specifying how to use the VC-5 codec in an IMF Application is nearly complete and ready for pre-FCD and Public CD.

VC-5 MXF Wrapper Revision

The SMPTE standard ST 2073-10 specifies how to embed a VC-5 bitstream as a video track in an MXF file. The document was approved and published before the VC-5 standards for layers, sections, and metadata were drafted. It might be useful to add features from layers, sections, and metadata to IMF Application VC-5 which depends on ST 2073-10 MXF Wrapper. A new project for revising ST 2073-10 has been approved. Work is pending completion of the first version of IMF Application VC-5 and is expected to begin shortly after ST 2067-72 enters Public CD.

IMF Application VC-5 Revision

After the VC-5 MXF wrapper has been revised, then features from layers, sections, and metadata can be added to IMF Application VC-5 as deemed useful. A project proposal for the revision of ST 2067-72 has not been drafted or submitted for review.

Availability

SMPTE standards are available from the SMPTE Store.

The SMPTE website contains a list of all published documents with a link for each document to its location in the SMPTE Store.

Each version of a standard (designated by the year suffix) is a separate document so it is not possible to provide up to date links to the latest version of each document in the VC-5 Standards Suite, but since all VC-5 standards have the same number, the documents can be found by searching the standards for “2073”.

The overview document SMPTE OV 2073-0 can be downloaded from the SMPTE Store for free.

References

SMPTE OV 2073-0:2023 VC 5 Video Essence – Overview for the SMPTE 2073 Document Suite
https://doi.org/10.5594/SMPTE.OV2073-0.2023

SMPTE ST 2073-1:2017 VC 5 Video Essence – Part 1. Elementary Bitstream
https://doi.org/10.5594/SMPTE.ST2073-1.2017

SMPTE RP 2073-2:2022 VC-5 Video Essence – Part 2. Conformance Specification
https://doi.org/10.5594/SMPTE.RP2073-2.2022

SMPTE ST 2073-3:2015 VC-5 Video Essence – Part 3. Image Formats
https://doi.org/10.5594/SMPTE.ST2073-3.2015

SMPTE ST 2073-4:2015 VC-5 Video Essence – Part 4. Subsampled Color Difference Components
https://doi.org/10.5594/SMPTE.ST2073-4.2015

SMPTE ST 2073-5:2015 VC-5 Video Essence – Part 5. Layers
https://doi.org/10.5594/SMPTE.ST2073-5.2015

SMPTE ST 2073-6:2015 VC-5 Video Essence – Part 6. Sections
https://doi.org/10.5594/SMPTE.ST2073-6.2015

SMPTE ST 2073-7:2022 VC-5 Video Essence – Part 7. Metadata
https://doi.org/10.5594/smpte.st2073-7.2022

SMPTE ST 2073-10:2017 MXF – Mapping VC 5 Video Essence into the MXF Generic Container
https://doi.org/10.5594/SMPTE.ST2073-10.2017

SMPTE ST 377-1:2019 Material Exchange Format (MXF) – File Format Specification
https://doi.org/10.5594/smpte.st377-1.2019

An Introduction to the GoPro Metadata Format (GPMF)
https://www.trekview.org/blog/2020/metadata-exif-xmp-360-video-files-gopro-gpmd/