Files
Stéphane Lesimple 38defdf7aa Apply (most) debian patches
Apply debian patch 02-encryption
Apply debian patch 03-dvdrom
Apply a modified version of patch 05-help-dialog
Apply debian patch 08-fix-gnu-make-detection
Apply debian patch 10-use-non-size-specific-icon-and-add-keywords-to-desktop-file
Apply debian patch 12-fix-spelling-of-up-to
Apply debian patch 13-fix-missing-language-field-in-po-files
Apply a modified version of debian patch 14-make-builds-reproducible
Apply debian patch 17-fix-all-but-deprecated-api-warnings
Apply a modified version of debian patch 18-update-copyright-in-about-dialog
Apply debian patch 19-show-text-files-with-abs-path
Apply debian patch 22-fix-hurd-i386-ftbfs
Apply debian patch 23-add-bdrom-support
Apply debian patch 25-fix-man-pages
Apply debian patch 27-allow-opening-in-browser-again
Apply debian patch 28-pdftex-reproducibility
Apply debian patch 29-fix-more-typos
Apply debian patch 30-hurd-kfreebsd-ftbfs
Apply debian patch 31-improve-hurd-and-kfreebsd-support
Apply debian patch 33-honour-LDFLAGS
Apply debian patch 34-gcc8-format-security.patch
Apply debian patch 35-archived-homepage
Apply debian patch 36-fix-parallelism
2020-08-20 14:17:35 +02:00

238 lines
11 KiB
TeX

\newpage
\section{The RS01 codec}
\label{rs01}
This section describes the dvdisaster RS01 Reed-Solomon codec.
It was conceived during the summer of 2004 for creating
error correction files in the first dvdisaster versions.
At this time, CD media was still predominant.
Typical machines were based on Pentium 4 (tm) processors.
Measured by todays standards physical RAM and hard disk
space were scarce, and especially hard disk random I/O
was extremely slow.
\smallskip
In order to work efficiently with the available technology,
RS01 was designed to be as space efficient as possible
and to minimize hard disk random access.
Optimizing the data layout for random access efficiency
lead to a parity byte distribution which left the error correction
file vulnerable to being damaged. RS01 was
occasionally being critcized for not being able to recover
from damaged error corrction files, but these points
were not really fair. RS01 error correction
files were never designed for being stored on fragile
media. They are supposed to
be either stored on hard disk, or to be stored on optical
media which itself is protected by dvdisaster error
correction which has the following consequences:
Unlike optical media, hard disks do not degrade
gradually. Hard disks are usually either 100\% readable or
completely dead, so we can assume that error correction
files on hard disk are either completely readable or fully lost.
Storing error correction files on optical media is a different
story. While an error correction file could protect itself to some
degree against lost sectors (as RS03 ecc files do), it is still
prone to the shortcomings of a file level error correction.
The biggest disadvantage of file level error correction is
that there is no protection of file system meta data.
If meta data like a directory node becomes damaged, all files
in the directory are lost regardless of the redundancy contained
within the files. Therefore any medium containing error
correction files must be protected with an image level
error correction layer (by using RS01,RS02 or RS03 on the medium),
since only image level error correction avoids meta
data sectors to become a single point of failure. See the
discussion at \url{https://web.archive.org/web/20180428070843/http://dvdisaster.net/en/qa32.html} for
more information on the advantages of image level data protection
over file level approaches.
\smallskip
Nevertheless, the time has come to phase out the RS01 codec.
Consider creating an error correction file with 32 roots
for a 650MiB sized image using both codecs\footnote{The benchmark was
done using the GNU/Linux version
of dvdisaster 0.79.4 on a AMD Athlon(tm) II X4 615e
processor. RS03 used all 4 cores of the machine.
Both image and ecc files were stored in {\tt /dev/shm}
to rule out I/O effects.}:
\begin{center}
\begin{tabular}{|l|r|r|}
\hline
codec & ecc file size & encoding time \\
\hline
RS01 & 94.58MiB & 46.2s \\
RS03 & 96.68MiB & 2.4s \\
\hline
\end{tabular}
\end{center}
RS03 is about 2.2\% less storage efficient than RS01 since
its data layout has been rearranged for better parallelization.
But this is made up by a 19-fold speed improvement as
RS03 can use multiple cores and SSE2 extensions
(of course the speed improvement varies depending on the
hardware used).
Since all other properties of RS03 do at least match those
of RS01, it's fair to begin phasing out RS01 in dvdisaster.
%\smallskip
dvdisaster V0.80 will be the first and only version
featuring all three codecs. In version 0.82, users
will be presented a note the RS01 became deprecated.
In subsequent releases support for encoding RS01 will
be removed. Of course, capabilities to use and decode
RS01 will remain in dvdisaster for umlimited time.
Existing RS01 error correction files should remain in use
and there is be no need to replace them with RS03 ones.
\subsection{Physical layout}
\begin{figure}
\begin{center}
\includegraphics[width=67mm]{spiral-rs01.eps}
\caption{Interpretation of physical layout in the .iso image}
\label{layout-phy-one}
\end{center}
\end{figure}
RS01 is meant to protect data which has already been written to an optical
medium, so the parity data can not be appended to the medium and must instead
be kept in a separate error correction file. Like all dvdisaster
codecs, RS01 is based on a RS(255,k) Reed-Solomon code with each
ecc block being comprised of $n$ data bytes and $k$ parity bytes, and
$n+k=255$.
The $n$ data bytes are taken from an iso image generated from the medium.
Reading data directly from the optical drive during encoding would slow down the
process tremendously due to massive random access over the medium, and
quickly wear out the drive mechanics. However producing the .iso image
takes one fast linear read, accesses the drive in a way it is designed to be used,
and puts the data on hard disk which can sustain the needed random access I/O.
Reed-Solomon codes
work best when errors are evenly distributed over all ecc blocks.
Therefore the $n$ data bytes used for creating an ecc block must be picked from
locations which are evenly distributed over the medium with a maximum
distance between each data byte pair. To obtain a suitable data distribution,
it is taken into account that optical media are recorded as a single long
spiral\footnote{Multiple layered
media contain one spiral for each physical layer, but are otherwise conceptually
identical.} of sectors each containing 2048 bytes.
The first sector lies at the innermost position of the spiral and is indexed with 0;
numbering continues onward to the outside of the spiral. The .iso image
contains a 1:1 mapping of this storage scheme, with the first 2048 bytes
holding the contents of sector 0, the next 2048 bytes resembling sector 1, and so on.
When encoding with $n$ data bytes per ecc block, the iso image is divided into
$n$ layers which physically map to the medium as shown in fig.\ref{layout-phy-one}.
This distributes the ecc block reasonably good over the medium surface.
However since the image size does not need
to be a multiple of the layer size, the $n$-th layer may be physically shorter
as the layer size. For encoding purposes, the non-existant sectors in layer
$n$ are treated as sectors being filled with 2048 zero bytes.
\subsection{Logical ecc file layout}
\begin{figure}
\begin{center}
\includegraphics[width=\textwidth]{rs01-layout.eps}
\caption{Logical RS01 layout}
\label{layout-logical-one}
\end{center}
\end{figure}
The ecc file layout, and therefore the relationship between the iso image
contents and the ecc file, is shown in
figure \ref{layout-logical-one}. The first 4096 bytes of the ecc file
contain the ecc header whose format is described in appendix \ref{eh}.
For RS01, only the data fields marked with ``all'' or ``RS01'' are
relevant; all other fields should be set to zero.
Next to the ecc header comes the CRC section of the ecc file. If the
iso image contains $s$ sectors, the next $4*s$ bytes in the ecc file
contain the CRC32 sums of the sectors from the iso image: Let $b_1,\dots,b_{2048}$ denote
the bytes of the first data sector; $b_{2049},\dots,b_{4096}$ those of the
second data sector and so on. Then $c_1 = CRC32(b_1,\dots,b_{2048})$,
$c_2 = CRC32(b_{2049},\dots,b_{4096})$ etc. Note that in contrast to
RS02 and RS03, bytes from the CRC section are not included into the ecc block
calculation and are therefore not protected by ecc.
\smallskip
The remainder of the ecc file contains the parity bytes of the
ecc blocks. For an ecc file built with $k$ roots,
the iso image is logically divided into
$n = 255-k$ layers as shown in figure \ref{layout-logical-one}.
The $d_{i,j}$ denote the $i-th$ byte in the $j-th$ layer.
In order to create the first ecc block, bytes $d_{1,1}$ to $d_{1,n}$ are taken from the
$n$ layers. Then the RS(255,k) code is calculated (see appendix \ref{rs} for its parameters)
and the
resulting $k$ parity bytes $e_{1,1}$ up to $e_{k,1}$ are stored
in the ecc file. The resulting ecc block is marked grey in the
figure. The next ecc blocks are calculated and stored accordingly.
In total, the ecc section contains $k*ls$ bytes of parity information,
with the $k$ parity bytes of each ecc block being stored consecutively.
\subsection{Calculating the layout for encoding}
The RS01 layout is fully determined by the number of roots for the error correction code
and the iso image size in sectors (from now on, ``size'' always means ``number of
2048K sectors). The number of roots can be freely chosen by the user from the
range of $[8...100]$. The iso image size is directly measured
from the iso image file.
\smallskip
The number of data layers is simply calculated from the number of roots, $k$:
\[ data\ layers = 255 - k\]
The size of each layer is:
\[ layer\ size = \left\lceil\frac{medium\ size}{data\ layers}\right\rceil\]
At the end of the last layer, $data\ layers * layer\ size - medium\ size$
zero filled padding sectors are used in the encoding process.
\subsection{Getting the layout when recovering defective media}
The required parameters are taken from the ecc header stored in
the error correction file (see appendix \ref{eh}). Especially,
the number of roots are taken from the {\em eccBytes} field and
the medium size is recorded in the {\em sectors} field.
\subsection{md5 checksums}
RS01 provides two md5 checksums for integrity checking.
The md5 sum of the iso image is calculated and stored in the
{\em mediumSum} field of the ecc header.
Another md5 sum is calculated over the ecc file, excluding the
first 4096 bytes, and stored in the {\em eccSum} field of
the ecc header. It can be used to verify the integrity of the
ecc file itself. The ecc header is protected by its own CRC
checksum which is stored in the {\em selfCRC} field.
\smallskip
The md5 checksum generation is the major obstacle for parallelizing
the encoder. In RS03, md5sum generation has been made optional since
the RS03 layout allows suffcient consistency checks
by doing a quick error syndrome check using the Reed-Solomon code.
\subsection{Special cases}
Error correction files can be created for any type of input files, not just iso files,
as long as the input files are ``reasonably'' long\footnote{Input files should contain
at least 2048*(255-k) bytes, so that there is at least one sector for each data
layer.}. Since input files are processed in units of 2048 kByte sectors,
files whose byte size is not an integer multiple of 2048 are virtually padded
with zeroes. In that case, the {\em inLast} field of the ecc header
contains the real byte size of the last file ``sector'' so that recovering the
last file sector does not write out the padding bytes. A size of zero in the
{\em inLast} field means that the last sector contains 2048 bytes.