186 lines
6.3 KiB
TeX
Executable File
186 lines
6.3 KiB
TeX
Executable File
% -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*-
|
|
%!TEX root = Vorbis_I_spec.tex
|
|
% $Id$
|
|
\section{Embedding Vorbis into an Ogg stream} \label{vorbis:over:ogg}
|
|
|
|
\subsection{Overview}
|
|
|
|
This document describes using Ogg logical and physical transport
|
|
streams to encapsulate Vorbis compressed audio packet data into file
|
|
form.
|
|
|
|
The \xref{vorbis:spec:intro} provides an overview of the construction
|
|
of Vorbis audio packets.
|
|
|
|
The \href{oggstream.html}{Ogg
|
|
bitstream overview} and \href{framing.html}{Ogg logical
|
|
bitstream and framing spec} provide detailed descriptions of Ogg
|
|
transport streams. This specification document assumes a working
|
|
knowledge of the concepts covered in these named backround
|
|
documents. Please read them first.
|
|
|
|
\subsubsection{Restrictions}
|
|
|
|
The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis
|
|
streams use Ogg transport streams in degenerate, unmultiplexed
|
|
form only. That is:
|
|
|
|
\begin{itemize}
|
|
\item
|
|
A meta-headerless Ogg file encapsulates the Vorbis I packets
|
|
|
|
\item
|
|
The Ogg stream may be chained, i.e., contain multiple, contigous logical streams (links).
|
|
|
|
\item
|
|
The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, per link)
|
|
|
|
\end{itemize}
|
|
|
|
|
|
This is not to say that it is not currently possible to multiplex
|
|
Vorbis with other media types into a multi-stream Ogg file. At the
|
|
time this document was written, Ogg was becoming a popular container
|
|
for low-bitrate movies consisting of DivX video and Vorbis audio.
|
|
However, a 'Vorbis I audio file' is taken to imply Vorbis audio
|
|
existing alone within a degenerate Ogg stream. A compliant 'Vorbis
|
|
audio player' is not required to implement Ogg support beyond the
|
|
specific support of Vorbis within a degenrate Ogg stream (naturally,
|
|
application authors are encouraged to support full multiplexed Ogg
|
|
handling).
|
|
|
|
|
|
|
|
|
|
\subsubsection{MIME type}
|
|
|
|
The MIME type of Ogg files depend on the context. Specifically, complex
|
|
multimedia and applications should use \literal{application/ogg},
|
|
while visual media should use \literal{video/ogg}, and audio
|
|
\literal{audio/ogg}. Vorbis data encapsulated in Ogg may appear
|
|
in any of those types. RTP encapsulated Vorbis should use
|
|
\literal{audio/vorbis} + \literal{audio/vorbis-config}.
|
|
|
|
|
|
\subsection{Encapsulation}
|
|
|
|
Ogg encapsulation of a Vorbis packet stream is straightforward.
|
|
|
|
\begin{itemize}
|
|
|
|
\item
|
|
The first Vorbis packet (the identification header), which
|
|
uniquely identifies a stream as Vorbis audio, is placed alone in the
|
|
first page of the logical Ogg stream. This results in a first Ogg
|
|
page of exactly 58 bytes at the very beginning of the logical stream.
|
|
|
|
|
|
\item
|
|
This first page is marked 'beginning of stream' in the page flags.
|
|
|
|
|
|
\item
|
|
The second and third vorbis packets (comment and setup
|
|
headers) may span one or more pages beginning on the second page of
|
|
the logical stream. However many pages they span, the third header
|
|
packet finishes the page on which it ends. The next (first audio) packet
|
|
must begin on a fresh page.
|
|
|
|
|
|
\item
|
|
The granule position of these first pages containing only headers is zero.
|
|
|
|
|
|
\item
|
|
The first audio packet of the logical stream begins a fresh Ogg page.
|
|
|
|
|
|
\item
|
|
Packets are placed into ogg pages in order until the end of stream.
|
|
|
|
|
|
\item
|
|
The last page is marked 'end of stream' in the page flags.
|
|
|
|
|
|
\item
|
|
Vorbis packets may span page boundaries.
|
|
|
|
|
|
\item
|
|
The granule position of pages containing Vorbis audio is in units
|
|
of PCM audio samples (per channel; a stereo stream's granule position
|
|
does not increment at twice the speed of a mono stream).
|
|
|
|
|
|
\item
|
|
The granule position of a page represents the end PCM sample
|
|
position of the last packet \emph{completed} on that
|
|
page. The 'last PCM sample' is the last complete sample returned by
|
|
decode, not an internal sample awaiting lapping with a
|
|
subsequent block. A page that is entirely spanned by a single
|
|
packet (that completes on a subsequent page) has no granule
|
|
position, and the granule position is set to '-1'.
|
|
|
|
|
|
Note that the last decoded (fully lapped) PCM sample from a packet
|
|
is not necessarily the middle sample from that block. If, eg, the
|
|
current Vorbis packet encodes a "long block" and the next Vorbis
|
|
packet encodes a "short block", the last decodable sample from the
|
|
current packet be at position (3*long\_block\_length/4) -
|
|
(short\_block\_length/4).
|
|
|
|
|
|
\item
|
|
The granule (PCM) position of the first page need not indicate
|
|
that the stream started at position zero. Although the granule
|
|
position belongs to the last completed packet on the page and a
|
|
valid granule position must be positive, by
|
|
inference it may indicate that the PCM position of the beginning
|
|
of audio is positive or negative.
|
|
|
|
|
|
\begin{itemize}
|
|
\item
|
|
A positive starting value simply indicates that this stream begins at
|
|
some positive time offset, potentially within a larger
|
|
program. This is a common case when connecting to the middle
|
|
of broadcast stream.
|
|
|
|
\item
|
|
A negative value indicates that
|
|
output samples preceeding time zero should be discarded during
|
|
decoding; this technique is used to allow sample-granularity
|
|
editing of the stream start time of already-encoded Vorbis
|
|
streams. The number of samples to be discarded must not exceed
|
|
the overlap-add span of the first two audio packets.
|
|
|
|
\end{itemize}
|
|
|
|
|
|
In both of these cases in which the initial audio PCM starting
|
|
offset is nonzero, the second finished audio packet must flush the
|
|
page on which it appears and the third packet begin a fresh page.
|
|
This allows the decoder to always be able to perform PCM position
|
|
adjustments before needing to return any PCM data from synthesis,
|
|
resulting in correct positioning information without any aditional
|
|
seeking logic.
|
|
|
|
|
|
\begin{note}
|
|
Failure to do so should, at worst, cause a
|
|
decoder implementation to return incorrect positioning information
|
|
for seeking operations at the very beginning of the stream.
|
|
\end{note}
|
|
|
|
|
|
\item
|
|
A granule position on the final page in a stream that indicates
|
|
less audio data than the final packet would normally return is used to
|
|
end the stream on other than even frame boundaries. The difference
|
|
between the actual available data returned and the declared amount
|
|
indicates how many trailing samples to discard from the decoding
|
|
process.
|
|
|
|
\end{itemize}
|