362 lines
16 KiB
TeX
362 lines
16 KiB
TeX
\section{The RS03 codec}
|
|
|
|
\begin{figure}
|
|
\begin{center}
|
|
\includegraphics[width=10cm]{spiral-rs03.eps}
|
|
\caption{Physical RS03 layout}
|
|
\label{layout-phy}
|
|
\end{center}
|
|
\end{figure}
|
|
|
|
This section describes the data format of the dvdisaster RS03 Reed-Solomon codec.
|
|
RS03 can store parity data either in a separate file or append it to the .iso
|
|
image on the same medium. In contrast to its predecessors RS03 is fully multi-threadable.
|
|
RS03 is expected to become the default codec soon after its introduction in
|
|
dvdisaster 0.80.
|
|
|
|
\subsection{Physical layout}
|
|
|
|
Optical media are recorded as a single long spiral\footnote{Multiple layered
|
|
media contain one spiral for each physical layer, but are otherwise conceptually
|
|
identical.} of sectors which are indexed beginning with 0.
|
|
The first sector lies at the innermost position of the spiral and
|
|
numbering continues onward to the outside of the spiral.
|
|
|
|
Reed-Solomon encoding works best when errors are evenly distributed over
|
|
all ecc blocks. Therefore we must strive to spread our ecc blocks evenly over
|
|
the media surface. To facilitate such distribution, dvdisaster logically
|
|
divides the medium into 255 units which are called ``layers'' for historical reasons.
|
|
Figure \ref{layout-phy} illustrates how a medium is divided into $n$ data layers,
|
|
one CRC layer, and $m$ ecc layers, with $n+m+1 = 255$. Ecc blocks are comprised
|
|
by taking one byte from each layer as shown in fig. \ref{layout-logical} on
|
|
the following page.
|
|
This distributes the ecc block reasonably good
|
|
over the medium surface.
|
|
|
|
Layer size is measured in numbers of sectors which are 2048 bytes in size. All
|
|
255 layers have the same size. The data layers map exactly
|
|
to the iso image which is to be protected by dvdisaster;
|
|
e.g. the number and sequence of sectors in $Data_1,\ldots,Data_n$
|
|
is the same as in the iso image. Two extra sectors are appended to the ISO image
|
|
holding the ecc header ``EH''; these are logically treated as a part of the ISO image.
|
|
If the ISO image size plus the two EH headers is not an integer multiple
|
|
of the layer size, the last (n-th) data layer will be padded accordingly.
|
|
|
|
The data layers are followed by a CRC layer. Each CRC layer sector contains a
|
|
data structure holding CRC32 checksums for the data sectors plus additional
|
|
parameters which were used during the RS03 encoding process.
|
|
The data and CRC layers are protected by the
|
|
Reed-Solomon parity which is stored in the remaining layers ($ECC_1,\ldots,ECC_m$).
|
|
Since the medium capacity is not necessarily an integer multiple of 255, some
|
|
unused sectors remain at the end of the medium. These are neither written nor
|
|
referenced in any way.
|
|
|
|
\smallskip
|
|
|
|
The RS03 data can be stored either directly on the medium or into a separate file.
|
|
Figure \ref{layout-phy} shows the first case where ecc data is embedded into the
|
|
image; this is also called a ``RS03 augmented ISO image'' in dvdisaster
|
|
terminology.
|
|
In the other case a separate error correction file will be created containing
|
|
the sectors starting with the EH header. The following discussion is based
|
|
on the augmented image case; see section \ref{eccfile} for handling the
|
|
file based format.
|
|
|
|
\subsection{Logical layout}
|
|
|
|
\begin{figure}
|
|
\begin{center}
|
|
\includegraphics[width=\textwidth]{layer.eps}
|
|
\caption{Logical RS03 layout}
|
|
\label{layout-logical}
|
|
\end{center}
|
|
\end{figure}
|
|
|
|
|
|
The relationship between layers and ecc blocks is stated again in the
|
|
logical view presented in figure \ref{layout-logical}. Here the 255 layers are
|
|
shown in stacked order. From each layer the $i-th$ byte corresponds to the
|
|
$i-th$ error correction block. The parity is calculated using $n$ data bytes
|
|
$d_{i,1},\ldots,d_{i,n}$ plus the crc byte $c_i$. The resulting $m$ roots of
|
|
the completed Reed-Solomon code are then stored in the
|
|
ecc bytes $e_{i,1},\ldots, e_{i,m}$.
|
|
|
|
Since the input iso image plus the two EH sectors is usually not an integer
|
|
multiple of the layer size, the last data layer $data_n$ may contain padding
|
|
sectors containing a special signature. The content of the padding sectors
|
|
is used in the ecc byte calculation and is written into the augmented image.
|
|
|
|
\subsection{Calculating the layout for encoding}
|
|
|
|
\begin{table}
|
|
\begin{center}
|
|
\begin{tabular}{|lrr|}
|
|
\hline
|
|
Medium type & Maximum size & Layer size \\
|
|
\hline
|
|
CD & 359.424 & 1.409 \\
|
|
DVD 1 layer & 2.295.104 & 9.000 \\
|
|
DVD 2 layers & 4.171.712 & 16.359 \\
|
|
BD 1 layer & 11.826.176 & 46.377 \\
|
|
BD 2 layers & 23.652.352 & 92.754 \\
|
|
\hline
|
|
\end{tabular}
|
|
\caption{Sector size parameters for several media types}
|
|
\label{layout-size-table}
|
|
\end{center}
|
|
\end{table}
|
|
|
|
When encoding with RS03 the layout of the augmented image is fully specified
|
|
by two values: the maximum medium size and the iso image size
|
|
(from now on, ``size'' always means ``number of 2048K sectors''). \bigskip
|
|
|
|
Media sizes are hard coded and taken from table \ref{layout-size-table}. Since we need to divide the medium into 255 layers, the layer size
|
|
is:
|
|
|
|
\[layer\ size = \left\lfloor\frac{medium\ size}{255}\right\rfloor\]
|
|
|
|
This allows us to compute the number of data layers needed to cover the iso image plus the ecc header:
|
|
|
|
\[data\ layers = \left\lceil\frac{iso\ image\ size + 2}{layer\ size}\right\rceil\]
|
|
|
|
The number of padding sectors in the last data layer is:
|
|
|
|
\[padding\ sectors = layer\ size * data\ layers - iso\ image\ size - 2\]
|
|
|
|
For Reed-Solomon encoding, we will have to encode
|
|
|
|
\[n\ data\ bytes = data\ layers + 1\]
|
|
|
|
and produce the following number of parity bytes:
|
|
|
|
\[m\ roots = 255 - n\ data\ bytes\]
|
|
|
|
The RS03 augmented image must fill the medium completely (except for the
|
|
$medium\ size \bmod 255$ sectors at the end). However for performance
|
|
reasons the maximum redundancy is capped to 200\%, or 170 roots. This means
|
|
that the ISO image must at least span the first 255-170=85 layers, otherwise
|
|
additional padding will be added to fill up the 85 data layers. This situation
|
|
is not reflected in the calculations and figure shown above.
|
|
|
|
\subsection{Re-calculating the layout from defective media}
|
|
\label{recover}
|
|
|
|
In order to recover a defective medium, the values of {\em layer size}
|
|
and {\em data layers} need to be determined. The RS03 format allows
|
|
for three heuristics with increasing complexity for learning about these
|
|
values:
|
|
|
|
\subsubsection{Using the Ecc Header}
|
|
|
|
All required information can be obtained from the data structures of the
|
|
Ecc Header which is described in appendix \ref{eh}.
|
|
If ecc data is stored in a separate error correction
|
|
file, the first 4096 bytes of the ecc file yield the Ecc Header.
|
|
Otherwise, let $n$ be the size of the ISO file system which
|
|
can be obtained from the ISO file system master block. Then the
|
|
Ecc Header is typically found in the RS03-augmented image at sectors $n,n+1$
|
|
or at $n+150,n+151$ (due to padding inserted by some popular CD-R mastering
|
|
software). \smallskip
|
|
|
|
If the ISO file system master block is unreadable, the Ecc Header
|
|
can be identified by its characteristic signature and checksum.
|
|
If the Ecc Header is encountered during reading of the defective medium
|
|
it might be worthwhile to generate a tentative ISO master block in the image
|
|
file. This would speed up future processing of the image; however current
|
|
implementations of dvdisaster do not yet implement this feature.
|
|
|
|
\subsubsection{Using the CRC layer}
|
|
|
|
Each CRC layer sector contains a data structure which not only holds the
|
|
CRC32 checksums but also a copy of important parameters from the Ecc header
|
|
(see section \ref{crc} for details). CRC sectors can be easily recognized
|
|
by looking for their signature and checksum while scanning the medium image.
|
|
If dvdisaster finds a valid CRC sector and the Ecc header is defective,
|
|
a tentative Ecc header is written to the image to speed up further
|
|
operations on the image file.
|
|
|
|
However it should be noted that since all CRC sectors are stored
|
|
consecutively on the medium, they can easily be wiped out by a large
|
|
defective region on the medium. Therefore, another heuristic exists
|
|
for learning about the RS03 layout.
|
|
|
|
\subsubsection{Evaluating the Reed-Solomon code}
|
|
\label{recover-rs}
|
|
|
|
If neither the Ecc Header nor any CRC sectors are readable
|
|
the RS03 layout can be determined by the following heuristic.
|
|
|
|
First, the medium size is determined from table \ref{layout-size-table}.
|
|
This is always possible as long as the drive will recognize the medium at all.
|
|
Since the layer size is $\left\lfloor\frac{medium\ size}{255}\right\rfloor$,
|
|
the location of the 255 layers on the medium is now known. The remaining task is to find
|
|
out the redundancy of the Reed-Solomon code, e.g. how many layers contain
|
|
roots for the RS code.
|
|
|
|
\bigskip
|
|
|
|
Taking the {\em i-th} sector from each layer will produce a valid error
|
|
correction block, but with unknown redundancy. As RS03 will create redundancies
|
|
using 8 to 170 roots, we employ a brute-force approach by evaluating
|
|
the Reed-Solomon code for $8..170$ roots. If the error correction
|
|
is successful for $n$ roots and the sector from layer {\em 255-1-n} yields
|
|
the CRC data structure, the correct number of roots has been found.
|
|
|
|
\medskip
|
|
|
|
In reality, not all 162 combinations of roots need to tested since additional
|
|
information can be exploited:
|
|
|
|
\begin{enumerate}
|
|
\item If the sector from layer {\em 255-1-n} is present/readable,
|
|
we do not need to test for $n$ roots any further: Encoding with $n$ roots would
|
|
have produced a CRC sector in this place.
|
|
\item If the number of erasures (as indicated by unreadable sectors) is higher
|
|
than $n$, we can trivially skip the RS decoding. We might have to test another
|
|
set of 255 sectors though if testing for all other numbers of roots fails as well.
|
|
\end{enumerate}
|
|
|
|
Criterion 1) should quickly narrow down the possible numbers of roots
|
|
in the average case, e.g. when enough redundancy is available for recovering the medium.
|
|
Worst case behaviour of trying each ecc block for 8..170 roots is likely to appear only
|
|
when the medium is unrecoverable, e.g. when more sectors are damaged than the
|
|
Reed-Solomon code can correct.
|
|
|
|
\subsection{Contents of the CRC layer}
|
|
\label{crc}
|
|
|
|
Each sector of the CRC layer contains the data structure shown in
|
|
appendix \ref{crc-block}. Following the numbering from figure \ref{layout-logical},
|
|
CRC sector $c_i$ contains the CRC32 checksums for data sectors $d_{j,1},\ldots,d_{j,n}$
|
|
with $j = (i+1)\bmod\ layer\ size$. The purpose of this offset is to have
|
|
the error correction of ECC block $i$ recover the CRC checksum for the next
|
|
ECC block $i+1$. In case of readable but corrupted sectors this will keep the
|
|
error correction in erasure mode and therefore save precious redundancy (the
|
|
RS code can recover twice as much errors when the location of defective data
|
|
is known). \medskip
|
|
|
|
Checksums for data sector $d_{j,k}$ are stored in array element
|
|
CrcBlock-$>$crc[k]. Unused array elements are set to zero.
|
|
The remaining contents of the CRC sector structure provide
|
|
configuration and layout information; see appendix \ref{crc-block} for details.
|
|
|
|
\subsection{Encoding the ecc layers}
|
|
|
|
Encoding the error correction information
|
|
requires reading and buffering of at least
|
|
255 sectors comprising the ecc block
|
|
(see fig. \ref{layout-logical} for a definition of the ecc block).
|
|
A possible encoding algorithm
|
|
might process each ecc block at a time. For each ecc block $i$
|
|
it would do the following: \smallskip
|
|
|
|
First, the $n$ data sectors $d_{i,1},\ldots,d_{i,n}$ of the ecc block
|
|
are read in. The CRC layer
|
|
sector $c_i$ is initialized, filled in with checksums
|
|
generated by processing the previous ecc block, and completed
|
|
by calculating its own checksum {\em selfCRC}. Unused portions of $c_i$
|
|
remain zero. Afterwards the
|
|
CRC32 checksums of $d_{i,1},\ldots,d_{i,n}$ are calculated
|
|
and stored away using the same buffering mechanism. Since the hand-over of
|
|
CRC checksums between ecc blocks is the only place where RS03 does
|
|
not fully parallelize, data sector I/O and CRC32 caching needs to be
|
|
carefully thought out in multithreaded implementations. \medskip
|
|
|
|
|
|
Once $d_{i,1},\ldots,d_{i,n}$ and $c_i$ have been prepared,
|
|
2048 sets of a RS(255,k) code (with $k=255-n-1$) are calculated by
|
|
looping over the 2048 bytes of the ecc sectors. If $l$ denotes
|
|
a certain byte position between $0,\ldots,2047$ in the ecc block
|
|
sectors, then the {\em l-th} byte from $d_{i,1},\ldots,d_{i,n},c_i$
|
|
is retrieved and fed into the RS(255,k) encoder. The resulting
|
|
parity bytes $p_1,\ldots,p_m$ are stored in byte position $l$ of the
|
|
ecc layer sectors $e_{i,1},\ldots,e_{i,m}$. When all 2048 bytes
|
|
of the ecc block sectors have been processed the ecc layer sectors
|
|
can be written out; either into the error correction file or into
|
|
the RS03 augmented image. \medskip
|
|
|
|
The RS(255,k) encoder is the same for RS01, RS02 and RS03. See
|
|
appendix \ref{rs} for the parameters used in the encoder.
|
|
|
|
\subsection{Encoding as a separate error correction file}
|
|
\label{eccfile}
|
|
|
|
If the image size is too close to the medium capacity, not enough
|
|
space is left for augmenting the image with redundancy. dvdisaster
|
|
will refuse to augment images when there is insufficient space
|
|
for at least 8 roots. Creating images with less than 43 roots
|
|
(20\% of redundancy) will trigger a warning that the error correction
|
|
capacity may be too low. In those cases, storing the error correction
|
|
information in a separate file comes as an alternative.
|
|
|
|
\medskip
|
|
|
|
RS03 error correction files (``ecc files'')
|
|
contain the same error correction information
|
|
and layout as in the augmented image case, with the following differences:
|
|
|
|
\paragraph{Omittance of data padding sectors.} While the image format
|
|
shown in figures \ref{layout-phy} and \ref{layout-logical}
|
|
may contain padding sectors between the ecc header and the CRC layer,
|
|
those sectors are not written into the ecc file.
|
|
The padding sectors are however required during encoding and
|
|
decoding, e.g. they need to be virtually created in memory
|
|
when processing the respective ecc blocks.
|
|
Therefore an ecc file providing {\em nroots} of redundancy
|
|
will contain {\em 2 + (nroots+1) * layer size} sectors.
|
|
Physically it will contain the ecc header, then the CRC layer
|
|
and finally the {\em nroots} ecc layers. In contrast to the
|
|
augmented image case, the ecc header {\em is not part of an ecc
|
|
block} and can therefore not be recovered
|
|
by the error correction. If the ecc header is lost in an ecc file,
|
|
its contents can be reconstructed from a still existing block
|
|
in the CRC layer and then be rewritten accordingly.
|
|
|
|
\paragraph{Freely chooseable redundancy.} In the augmented image case
|
|
the redundancy is always chosen to fill up the medium completely.
|
|
For ecc files the redundancy can be freely chosen by the user between
|
|
8 roots (3.2\%) and 170 roots (200\%). Encoding with more than 170 roots
|
|
is technically possible, but run-time requirements get out of proportion;
|
|
hence the selectable redundancy is capped at 200\%.
|
|
|
|
\medskip
|
|
|
|
As a consequence of the variable redundancy the ecc file layout can only
|
|
be determined by looking at the ecc header or CRC sectors. The strategy
|
|
of experimentally evaluating the Reed-Solomon code (see sub
|
|
section \ref{recover-rs}) however can not be applied to ecc files since
|
|
neither the size of the padding area nor the original size of the
|
|
possibly truncated image and ecc files can be determined. \medskip
|
|
|
|
To see whether this is really a limiting factor we look
|
|
at the typical outcome of recovering a single file from a defective
|
|
medium:
|
|
|
|
\begin{itemize}
|
|
\item The ecc file is fully read, but random sectors are damaged.
|
|
\item The ecc file is truncated to the position of the first read error.
|
|
\end{itemize}
|
|
|
|
In both scenarios it is highly likely that at least one CRC sector survives
|
|
at the beginning of the file; in that case the error correction will not
|
|
only recover the image but also repair the ecc file into its original state.
|
|
|
|
\bigskip
|
|
|
|
Although this gives RS03 ecc files good chances to remain functional even
|
|
when being partly damaged, it is highly recommended to store ecc files
|
|
only on media which are themself being protected by dvdisaster.
|
|
ISO and UDF file systems
|
|
do not have sufficient redundancy for their meta data (e.g. directory
|
|
structures). If such meta data becomes unreadable a significant
|
|
number of files may become completely inaccessible.
|
|
Please note that this is a
|
|
general weakness of file-based data protection and recovery: The
|
|
meta data is not part of any file and can therefore not be protected by
|
|
any error correction data put inside the file(s).
|
|
\smallskip
|
|
This is the also the simple reason why we did not use tools like PAR2
|
|
and developed dvdisaster instead;
|
|
the image-based approach of dvdisaster protects
|
|
both files and meta data.
|