New upstream version 0.79.5
This commit is contained in:
843
documentation/codec-specs/rs02.tex
Normal file
843
documentation/codec-specs/rs02.tex
Normal file
@@ -0,0 +1,843 @@
|
||||
\newpage
|
||||
\section{The RS02 codec}
|
||||
|
||||
This section describes the dvdisaster RS02 Reed-Solomon codec.
|
||||
It was developed during the winter of 2005/2006 in order to facilitate
|
||||
augmenting iso images directly with error correction data.
|
||||
|
||||
RS02 is based on the Reed-Solomon encoders and decoders
|
||||
introduced with RS01, but focuses exclusively on augmenting
|
||||
iso images. The allocation of data sectors within an ecc block
|
||||
follows a similar scheme as in RS01. However the layout of the
|
||||
parity bytes is vastly different between RS01 and RS02, as the codec must
|
||||
cope with any parity sector being damaged or unreadable.
|
||||
Consequently a RS02 image can lose as many sectors as
|
||||
allowed by the redundancy of
|
||||
the error correction data, and the lost sectors can be any
|
||||
combination of data and parity sectors, as it is expected from
|
||||
a Reed-Solomon scheme.
|
||||
|
||||
\smallskip
|
||||
|
||||
Unlike RS01, which will be completely superseded by RS03 soon,
|
||||
the case of RS02 vs. RS03 still remains open, as both codecs
|
||||
have their individual strengths. RS02 is slightly more space
|
||||
efficient than RS03, so on CD media RS02 might provide
|
||||
slightly more redundancy (typically one additional root) than RS03.
|
||||
This effect will be less
|
||||
pronounced on larger media like DVD and BD.
|
||||
RS02 images can be augmented to an arbitrary size which may
|
||||
be smaller than the maximum medium size, while RS03 requires
|
||||
augmenting the image to the full medium size.
|
||||
This might favour RS02 for working on images which are only
|
||||
30\% or less of the medium size, as they can be encoded with
|
||||
less than the maximum of 170 roots
|
||||
(the maximum redundancy requires lots of time to compute, producing
|
||||
a three-fold redundancy which may not be needed in all cases).
|
||||
On the other hand RS03 will counter
|
||||
the performance argument since it can encode at least
|
||||
20 times faster than RS02 on multi-core architectures,
|
||||
because RS02 encoding can not be parallelized.
|
||||
See the end of section \ref{rs01} for a speed comparison of RS01 vs. RS03;
|
||||
RS01 and RS02 are very similar performance-wise.
|
||||
Finally, the data layout of RS03 does not depend on interspersed
|
||||
ecc headers which gives it a better robustness over RS02;
|
||||
see subsection \ref{layout-logical-two} for details.
|
||||
|
||||
\subsection{Physical layout}
|
||||
|
||||
\begin{figure}
|
||||
\begin{center}
|
||||
\includegraphics[width=10cm]{spiral-rs02.eps}
|
||||
\caption{Physical RS02 layout}
|
||||
\label{layout-phy-two}
|
||||
\end{center}
|
||||
\end{figure}
|
||||
|
||||
RS02 must be applied to the .iso image before it is written to
|
||||
the medium. Additional sectors are appended to the .iso image
|
||||
containing the parity data. The data structures of the .iso image
|
||||
are not changed to reflect the new image size, so the original
|
||||
part of the augmented .iso image remains untouched. The parity
|
||||
sectors can be removed from the augmented image by simply
|
||||
truncating the .iso image to its original sector size; the resulting
|
||||
image file will have the same contents as prior to the augmentation.
|
||||
As a side effect, the parity data is invisible to applications reading
|
||||
the medium at the filesystem level, including most hardware media
|
||||
players. If you find a player which gets confused by media containing
|
||||
RS02 (or RS03) parity, please consider telling the dvdisaster project about it. As of this writing,
|
||||
not a single device has been reported to run into problems with
|
||||
the RS02 data scheme. The RS02 augmented image might conflict with
|
||||
optical media writing software, though. If the writing software
|
||||
decides the image length by looking at the iso filesystem structures,
|
||||
the parity data portion of the image might not be written to the medium.
|
||||
Most current writing programs do however measure the .iso image by examining
|
||||
its file size, and will transfer the parity data correctly. To be sure you
|
||||
should follow the steps described under ``Testing image compatibility''
|
||||
at the dvdisaster site (\url{http://dvdisaster.net/en/howtos92.html}) once
|
||||
before using each version of your optical media authoring software.
|
||||
|
||||
Like the other dvdisaster codecs, RS02 is based on a RS(255,k) Reed-Solomon code
|
||||
with each ecc block being comprised of $n$ data bytes and $k$ parity bytes, and
|
||||
$n+k=255$. The $n$ data bytes comprise the .iso image which will be
|
||||
written to the medium, and the additional ecc header and CRC checksums added
|
||||
by dvdisaster. Reed-Solomon encoding works best when errors are
|
||||
distributed evenly over all ecc blocks. Therefore we must strive to distribute
|
||||
the ecc blocks evenly over the medium surface. To facilitate such mapping,
|
||||
dvdisaster logically divides the medium into 255 logical units which are called
|
||||
``layers'' for historical reasons.
|
||||
Figure \ref{layout-phy-two} shows how the medium is divided into $n$ data layers
|
||||
and $k$ ecc layers, with $n + k = 255$. Ecc blocks are
|
||||
created by taking on byte from each layer as shown in fig. \ref{layout-logical-two}
|
||||
on the following page. This distributes the ecc block reasonably good over
|
||||
the medium surface.
|
||||
All layers have the same length in bytes, with the possible exception of
|
||||
data layer $n$. As the .iso image size plus the size of one ecc header and the CRC data
|
||||
is usually not a multiple of the layer size, the $n$-th data layer may be shorter
|
||||
than the layer size and considered to be filled up with a virtual zero padding.
|
||||
The zero padding is not written out to the augmented image (note that data layer $n$
|
||||
is intentionally drawn shorter in fig. \ref{layout-phy-two}), but it is used in the
|
||||
calculation of the respective parity bytes.
|
||||
|
||||
\newpage
|
||||
|
||||
\subsection{Logical layout}
|
||||
\label{sec-layout-logical-two}
|
||||
|
||||
\begin{figure}
|
||||
\begin{center}
|
||||
\includegraphics[width=\textwidth]{rs02-layout.eps}
|
||||
\caption{Logical RS02 layout}
|
||||
\label{layout-logical-two}
|
||||
\end{center}
|
||||
\end{figure}
|
||||
|
||||
The data layout in the augmented image is shown in figure \ref{layout-logical-two}.
|
||||
Note that in this figure the data is byte-indexed; e.g. $d_{1,1}$ denotes the
|
||||
first data byte in the augmented image. Each layer has a length of
|
||||
$ls$ bytes, with the exception of data layer $n$ which may be shortened (see subsection \ref{calc-two} for an exact calculation of its size).
|
||||
Some ecc layers my be interleaved with redundant copies of the ecc header.
|
||||
The ecc header size is not included in the respective ecc layer size.
|
||||
|
||||
\paragraph{Data layers.} A data layer index $d_{i,j}$ refers to the $i$-th byte in the $j$-th data layer.
|
||||
The $n$ data layers are mapped in a linear fashion to the original iso image.
|
||||
$d_{1,1}$ maps to the first .iso image byte, and $d_{ns,n}$ maps to the last .iso image
|
||||
byte ($ns$ is the number of remaining iso image bytes in the $n$-th data layer).
|
||||
|
||||
The last data layer is special because it does not only contain the rest of the iso image,
|
||||
but also the ecc header and the CRC checksums. These extensions
|
||||
are logically treated as a part of the iso image; their contents are used in the
|
||||
ecc data calculation and are therefore protected by the ecc data.
|
||||
The ecc header follows immediately after byte $d_{ns,n}$ and is 4096 bytes long.
|
||||
Its format is described in appendix \ref{eh}. For RS02, only the data fields
|
||||
marked with ``all'' or ``RS02'' are relevant; all other fields should be set to zero.
|
||||
|
||||
Data layer $n$ does also contain the CRC32 checksums of each data sector
|
||||
upto the ecc header. If the .iso image contains $s$ sectors,
|
||||
then the CRC field contains $4s$ bytes, rounded up
|
||||
to the nearest multiple of 2048.
|
||||
CRC32 checksums are calculated over a whole CD sector comprising 2048 bytes.
|
||||
Let $c_{y,j}$ be the 4-byte checksum of the $y$-th sector in the $j$-th layer
|
||||
and $lss$ be the number of sectors in each layer.
|
||||
Then $c_{y,j}$ = CRC32($d_{2048*y,j}$, $d_{2048*y+1,j}$, \dots, $d_{2048*y+2047,j}$).
|
||||
|
||||
$y_1$ is usually not the first sector in the layer, but a later sector.
|
||||
In general, $y_i$ = $(i+offset)$ mod $lss$. The offset is introduced to restore
|
||||
the CRC32 sums of ecc block $i+1$ during the correction of ecc block $i$.
|
||||
This helps if the data portion of the image is corrupted with wrong byte values and
|
||||
the sectors containing the CRC32 sums have been lost.
|
||||
The error correction will start at the ecc block $i$ which is determined
|
||||
by the offset, and whose CRC32 checksums are stored in the ecc header (at least one
|
||||
ecc header will be recovered before any error correction can begin). Correcting
|
||||
ecc block $i$ will recover the CRC32 checksums for ecc block $i+1$ in the image
|
||||
(and possibly some more in advance, as less than 2048 bytes are required for
|
||||
one set of checksums). This makes it possible to detect corrupted bytes by the
|
||||
checksums and flag them as erasures which effectively doubles the error correction
|
||||
capabilities of the Reed-Solomon code.
|
||||
|
||||
\paragraph{Ecc layers.} For an image augmented with $k$ roots, the parity bytes
|
||||
will be spread over $k$ ecc layers. In order to calculate the first ecc block,
|
||||
bytes $d_{1,1}$ to $d_{1,n}$ are taken from the $n$ data layers. The RS(255,k) code
|
||||
is calculated (see appendix \ref{rs} for its parameters) and the resulting $k$
|
||||
parity bytes $e_{1,1}$ to $e_{1,k}$ are stored in the $k$ ecc layers.
|
||||
The next ecc blocks are calculated and stored accordingly; ecc block $i$
|
||||
is marked grey in the figure.
|
||||
Care must be taken to honour the non-linear mapping of
|
||||
ecc layer bytes as the ecc area is interleaved with 20-40 copies of the ecc header.
|
||||
The ecc header copies are placed at sector addresses whose numbers
|
||||
are large powers of two. This makes it possible to heuristically search for them
|
||||
during the decoding boot-strap process
|
||||
when no other information (image size, layer size, etc.) is yet available.
|
||||
See section \ref{search-two} on the search heuristics and section \ref{addressing-two}
|
||||
on calculating ecc bytes positions from the non-linear mapping.
|
||||
|
||||
\subsection{Calculating the image layout for encoding}
|
||||
|
||||
The image layout can be either computed automatically to fill up
|
||||
the medium as much as possible, or by user selected criteria such
|
||||
as a maximum image size or a specified redundancy.
|
||||
|
||||
\subsubsection{Automatic layout calculation}
|
||||
\label{calc-two}
|
||||
|
||||
The only available inputs to automatic layout detection are
|
||||
the .iso image size and a table of maximum media sizes (see
|
||||
tab. \ref{layout-size-table} in the RS03 section for the
|
||||
respective values). From the media size table the smallest
|
||||
possible medium is chosen which can contain the .iso image.
|
||||
In some border cases, with e.g. the .iso image being only
|
||||
100 sectors smaller than the medium capacity, the automatic
|
||||
layout calculation will fail later due to insufficient space
|
||||
on the medium. In such cases, the user must decide between
|
||||
choosing the next larger medium size or splitting the image
|
||||
contents onto two media by himself (splitting a 700MiB CD image
|
||||
onto two CDs may be better than writing it to a DVD).
|
||||
|
||||
\smallskip
|
||||
|
||||
From now on, all calculations are given in numbers of 2048K
|
||||
sectors or sectors addresses
|
||||
unless noted otherwise. The number of sectors required for the
|
||||
CRC checksums can be directly computed from the .iso image size:
|
||||
|
||||
\[
|
||||
crc\; sectors = \left\lceil\frac{4 * iso\; image\; sectors}{2048}\right\rceil
|
||||
\]
|
||||
|
||||
The total accumulated size of all data layers is the sum of the
|
||||
.iso image size, the number of crc sectors and the two sectors required
|
||||
for the ecc header. Since these sectors are protected by the parity,
|
||||
they are called {\em protected sectors}:
|
||||
|
||||
\[protected\; sectors = iso\; image\; size + 2 + crc\; sectors\]
|
||||
|
||||
These calculations also produce two important sector addresses within
|
||||
the augmented image:
|
||||
|
||||
\begin{itemize}
|
||||
\item The sector with address {\em iso image size} marks the location
|
||||
of the ecc header; and
|
||||
\item The sector at address $iso\; image\; size + 2$ marks the beginning
|
||||
of the CRC checksum data.
|
||||
\end{itemize}
|
||||
|
||||
The next step is to partition the {\em protected sectors} and the remaining
|
||||
medium space into an optimal layer size. It is carried out iteratively.
|
||||
|
||||
\smallskip
|
||||
|
||||
For an approximate start, we determine the free space on the medium:
|
||||
|
||||
\[free\; space = medium\; capacity - protected\; sectors\]
|
||||
|
||||
and estimate a preliminary value for the number of roots and data layers:
|
||||
|
||||
\[k\; roots = min\left(170, \quad\left\lfloor\frac{255 * free\; space}{medium\; capacity}\right\rfloor\right)\]
|
||||
|
||||
\[n\; data = 255 - k\; roots\]
|
||||
|
||||
The maximum number of roots is capped at 170 which is approximately a three-fold
|
||||
redundancy. Larger values would get too computationally expensive.
|
||||
|
||||
\smallskip
|
||||
|
||||
The preliminary layer size is then:
|
||||
|
||||
\[preliminary\; layer\; size = \left\lceil\frac{protected\; sectors}{n\; data}\right\rceil\]
|
||||
|
||||
and the expected size of the parity layers is:
|
||||
|
||||
\[preliminary\; ecc\; size = k\; roots * preliminary\; layer\; size\]
|
||||
|
||||
\smallskip
|
||||
|
||||
From these values we iteratively compute a $2^p$ which has about 20-40 multiples
|
||||
in the {\em preliminary ecc size} address space. This value will be used
|
||||
for interleaving the ecc header copies with the ecc layers:
|
||||
|
||||
\smallskip
|
||||
|
||||
{\tt
|
||||
p := 5
|
||||
|
||||
while($\frac{preliminary\;ecc\;size}{2^p} > 40$)
|
||||
|
||||
\quad p := p + 1}
|
||||
|
||||
\smallskip
|
||||
|
||||
Now the chosen values might be actually too big since we haven't taken
|
||||
the ecc header copies into account.
|
||||
So the final task is to add up the number pf parity sectors and ecc header
|
||||
copies. If these fit into the free medium space, we are done; otherwise
|
||||
the calculations are done again with one root less.
|
||||
|
||||
\newpage
|
||||
|
||||
\bigskip
|
||||
|
||||
while($n\; roots > 7$)
|
||||
|
||||
\smallskip
|
||||
|
||||
\quad $layer\; size := \left\lceil\frac{protected\; sectors}{n\; data}\right\rceil$
|
||||
|
||||
\smallskip
|
||||
|
||||
\quad $ecc\; size := n\; roots * layer\; size$
|
||||
|
||||
\smallskip
|
||||
|
||||
\quad $first\; ecc\; header\; repeat\; addr := \left\lceil\frac{protected\; sectors}{2^p}\right\rceil * 2^p$
|
||||
|
||||
\smallskip
|
||||
|
||||
\quad $space\; for\; interleaved\; sectors := protected\; sectors + ecc\; size - first\; ecc\; header\; repeat\; addr$
|
||||
|
||||
\smallskip
|
||||
|
||||
\quad $number\; of\; ecc\; copies := \left\lfloor\frac{space\; for\; interleaved\; sectors}{2^p - 2}\right\rfloor + 1$
|
||||
|
||||
\smallskip
|
||||
|
||||
\quad $total\; added\; sectors := 2 + crc\; sectors + ecc\; size + 2 * number\; of\; ecc\; copies$
|
||||
|
||||
\medskip
|
||||
|
||||
\quad If $iso\; image\; sectors + total\; added\; sectors < medium\; size$,
|
||||
we have a valid layout: STOP.
|
||||
|
||||
\smallskip
|
||||
|
||||
\quad Otherwise, set $n\; roots := n\; roots - 1$ and $n\; data := 255 - n\; roots$
|
||||
and do another iteration.
|
||||
|
||||
\medskip
|
||||
|
||||
The iteration will either terminate with a valid layout or fail when
|
||||
$n\; roots$ drops below the minimum redundancy of 8 roots.
|
||||
|
||||
\subsubsection{Layout calculation by user selected criteria}
|
||||
|
||||
The user has several means of specifying a certain redundancy:
|
||||
|
||||
\paragraph{Specifying the maximum number of sectors for the augmented image.}
|
||||
|
||||
This case is simply handled by setting {\em medium capacity} to the user
|
||||
selected sector size rather than using the maximum medium size from the
|
||||
built-in table. Afterwards, calculations continue as described in
|
||||
section \ref{calc-two}.
|
||||
|
||||
\paragraph{Specifying the number of roots to use.}
|
||||
|
||||
In this case we can skip the calculations for {\em free space} and
|
||||
{\em k roots} as described in section \ref{calc-two}, and instead
|
||||
set {\em k roots} directly to the user selected value. Then the
|
||||
layout calculation proceeds as usual.
|
||||
|
||||
\paragraph{Specifying the percentage of redundancy to use.}
|
||||
|
||||
For a given number of {\em k roots}, the resulting redundancy in percent is:
|
||||
|
||||
\[\frac{k\; roots \cdot 100}{255 - k\; roots}\]
|
||||
|
||||
Pick a suitable value for {\em k roots} so that the user selected value
|
||||
is met or slightly exceeded. Proceed with the given number of roots
|
||||
as described in the previous paragraph.
|
||||
|
||||
\subsubsection{Layout calculation from ecc header information}
|
||||
\label{recalc-layout-header-two}
|
||||
|
||||
In a given ecc header struct {\em eh}, the number of sectors in the .iso
|
||||
image is recorded as \mbox{\em eh-$>$sectors}
|
||||
and the number of roots is contained in \mbox{\em eh-$>$eccBytes}.
|
||||
Calculation of the layout is done as shown in section \ref{calc-two},
|
||||
with the exception of omitting the calculation for {\em free space} and
|
||||
setting {\em k roots} directly to \mbox{\em eh-$>$eccBytes}.
|
||||
|
||||
\subsection{Automatic layout calculation example}
|
||||
\label{example-two}
|
||||
|
||||
Let's assume we are going to encode an .iso image of 295.000 sectors.
|
||||
This is well below the CD medium capacity of 359.424 sectors, so we
|
||||
start with:
|
||||
|
||||
\smallskip
|
||||
|
||||
$medium\; capacity = 359.424\; sectors$
|
||||
|
||||
$iso\; image\; size = 295.000\; sectors$
|
||||
|
||||
\medskip
|
||||
|
||||
The number of CRC sectors will be:
|
||||
|
||||
\smallskip
|
||||
|
||||
$crc\; sectors = \left\lfloor\frac{4 * 295.000}{2.048}\right\rfloor = 577\; sectors$
|
||||
|
||||
\medskip
|
||||
|
||||
The total size of all data layers is:
|
||||
|
||||
$protected\; sectors = 295.000 + 2 + 577 = 295.579\; sectors$
|
||||
|
||||
\bigskip
|
||||
|
||||
The next step is creating some preliminary starting values:
|
||||
|
||||
\smallskip
|
||||
|
||||
$free\; space = 359.424 - 295.579 = 63.845\; sectors$
|
||||
|
||||
\smallskip
|
||||
|
||||
$k\; roots = min\left(170, \left\lfloor\frac{255* 63.845}{359.424}\right\rfloor\right) = min(170, 45) = 45\; roots\; (or\; layers)$
|
||||
|
||||
\smallskip
|
||||
|
||||
$n\; data = 255 - 45 = 210\; layers$
|
||||
|
||||
\bigskip
|
||||
|
||||
Now some more preliminary values can be computed:
|
||||
|
||||
\smallskip
|
||||
|
||||
$preliminary\; layer\; size = \left\lceil\frac{295.579}{210}\right\rceil = 1.408\; sectors$
|
||||
|
||||
\smallskip
|
||||
|
||||
$preliminary\; ecc\; size = 45 * 1.408 = 63.360\; sectors$
|
||||
|
||||
\bigskip
|
||||
|
||||
Finally, we compute $p = 11$ since $\frac{63360}{2^{11}} = 30,9$.
|
||||
|
||||
\bigskip
|
||||
|
||||
Now the chosen values must be verified to produce a layout which is
|
||||
still smaller than the image size. We compute (the first two values are already known):
|
||||
|
||||
\medskip
|
||||
|
||||
$layer\; size = \left\lceil\frac{295.579}{210}\right\rceil = 1.408\; sectors$
|
||||
|
||||
\smallskip
|
||||
|
||||
$ecc\; size = 45 * 1.408 = 63.360\; sectors$
|
||||
|
||||
\smallskip
|
||||
|
||||
$first\; ecc\; header\; repeat\; addr = \left\lceil\frac{295.579}{2048}\right\rceil * 2048 = 296.960$
|
||||
|
||||
\smallskip
|
||||
|
||||
$space\; for\; interleaved\; sectors = 295.579 + 63.360 - 296.960 = 61.979\; sectors$
|
||||
|
||||
\smallskip
|
||||
|
||||
$number\; of\; ecc\; copies = \left\lfloor\frac{61.979}{2048-2}\right\rfloor + 1 = 31\; header\; repeats$
|
||||
|
||||
\smallskip
|
||||
|
||||
$total\; added\; sectors = 2 + 577 + 63.360 + 2 * 31 = 64.001\; sectors$
|
||||
|
||||
\medskip
|
||||
|
||||
This layout will generate an augmented image containing
|
||||
$295.000 + 64.001 = 359.001$ sectors which is
|
||||
less than the medium capacity of 359.424 sectors
|
||||
and therefore accepted.
|
||||
|
||||
\subsection{Re-calculating the layout from defective media}
|
||||
\label{search-two}
|
||||
|
||||
In order to recover a defective medium, at least one ecc header must
|
||||
remain readable and be located by the following heuristic. This
|
||||
is a major difference to RS03, which has more and different means
|
||||
for bootstrapping the recovery (see section \ref{recover} for details).
|
||||
Once one ecc header has been recovered, the ecc data layout can be
|
||||
calculated as described in section \ref{recalc-layout-header-two}.
|
||||
From this point, the error correction is done using the parameters
|
||||
and data described in section \ref{encoding-two}.
|
||||
|
||||
\smallskip
|
||||
|
||||
If the medium is not damaged or only slightly damaged, the following
|
||||
short cut might work: The size of the .iso image can be determined
|
||||
from the iso file system header. Then the ecc header immediately
|
||||
following the .iso image part of the augmented image is either
|
||||
located at sector number $iso\;image\;size$ or $iso\;image\;size + 150$.
|
||||
The latter case arises because some popular CD authoring software
|
||||
appends 150 padding sectors to any .iso image it creates.
|
||||
|
||||
\smallskip
|
||||
|
||||
If the short cut does not work due to the required sectors being damaged,
|
||||
the following strategy is employed. The size of the augmented image
|
||||
can always be determined; it can either be queried from the drive or
|
||||
it is the file size of a file-based image. Then apply the following
|
||||
algorithm:
|
||||
|
||||
\bigskip
|
||||
|
||||
$p = \left\lfloor log_2(image\; size)\right\rfloor$
|
||||
|
||||
while $p > 32$
|
||||
|
||||
\quad $pos = \left\lfloor\frac{image\; size}{2^p}\right\rfloor \cdot 2^p$
|
||||
|
||||
\smallskip
|
||||
|
||||
\quad while $pos > 0$
|
||||
|
||||
\qquad if {\em sector at pos} is a valid ecc header: STOP.
|
||||
|
||||
\qquad if {\em sector at pos} is unreadable, set $pos := pos - 2^p$ .
|
||||
|
||||
\hspace*{13mm} Continue with inner while loop.
|
||||
|
||||
\qquad if {\em sector at pos} is readable and not a ecc header, set $p := p -1$ .
|
||||
|
||||
\hspace*{13mm} Continue with outer while loop.
|
||||
|
||||
\bigskip
|
||||
|
||||
In order to test for a valid ecc header, check that {\em ec-$>$cookie}
|
||||
equals the 16-byte string ``*dvdisaster*RS02''. Then check that the
|
||||
CRC32 sum of the ecc header matches the value recorded in {\em eh-$>$selfCRC},
|
||||
with {\em eh-$>$selfCRC} set to the byte sequence 0x47,0x50,0x4c,0x00
|
||||
for the purpose of calculating the CRC32 sum.
|
||||
|
||||
\medskip
|
||||
|
||||
Please notice that during testing of the sectors at multiples of $2^{(p-1)}$,
|
||||
all sectors previously tested for $2^p$ will be examined again. It is therefore
|
||||
highly recommended to cache results from previous iterations of the outer
|
||||
while loop, especially when reading sectors from the optical medium.
|
||||
|
||||
|
||||
|
||||
\subsection{Sector addressing and initialization scheme}
|
||||
\label{addressing-two}
|
||||
|
||||
For encoding and decoding purposes it is required to retrieve the
|
||||
{\em i-th} sector from the {\em j-th} data or ecc layers, e.g. to calculate
|
||||
the corresponding sector number in the augmented image.
|
||||
The reverse calculation is also needed, e.g. to calculate the
|
||||
corresponding layer and sector index for a given sector number
|
||||
in the augmented image.
|
||||
|
||||
\smallskip
|
||||
|
||||
Bear in mind that as shown in figure \ref{layout-logical-two}, an augmented image
|
||||
is divided into two logical parts. There is a data area containing
|
||||
the .iso image contents, the first ecc header and the CRC checksums.
|
||||
The data area is protected by the parity in the ecc area, which contains
|
||||
the parity data interleaved with copies of the ecc header.
|
||||
|
||||
\smallskip
|
||||
|
||||
In order to carry out the calculations described below, the following
|
||||
values from the layout calculation (see section \ref{calc-two} are required:
|
||||
|
||||
\bigskip
|
||||
|
||||
\begin{tabular}{lp{10cm}}
|
||||
{\em protected sectors} & the size of the data part in sectors \\
|
||||
{\em layer size} & the number of sectors per layer \\
|
||||
{\em $2^p$} & the modulo value for locating ecc header copies \\
|
||||
\end{tabular}
|
||||
|
||||
\paragraph{Converting (layer, sector index) pairs into image sector numbers.}
|
||||
|
||||
The {\em i-th} sector of data layer {\em j} has the following address $s$ in the image:
|
||||
|
||||
\[s = j \cdot layer\; size + i\]
|
||||
|
||||
If $s >= protected\; sectors$, $s$ is a padding sector which must not be read
|
||||
from the image file, but created in memory (see the paragraph on initialization below).
|
||||
|
||||
\bigskip
|
||||
|
||||
To calculate the sector address $es$ of the {\em i-th} sector from the {\em j-th}
|
||||
ecc layer, the non-linear mapping of the ecc sectors has to be taken into account.
|
||||
The index of the first interleaved ecc header is:
|
||||
|
||||
\[ first\; interleaved = \left\lceil\frac{protected\; sectors}{2^p}\right\rceil \cdot 2^p\]
|
||||
|
||||
Since {\em protected sectors} is equal to the address of the first ecc sector in the image,
|
||||
the amount of ecc sectors before the first interleaved ecc header is:
|
||||
|
||||
\[ base\; ecc\; sectors = first\; interleaved - protected\; sectors\]
|
||||
|
||||
The ecc sector we are looking for would have the following index if ecc
|
||||
sectors were linearly mapped:
|
||||
|
||||
\[ ecc\; index = j \cdot layer\; size + i\]
|
||||
|
||||
If {\em ecc index $<$ base ecc sectors}, $es$ = {\em protected sectors} + {\em ecc index}.
|
||||
Otherwise, the non-linear mapping must be taken into account. The number of interleaved
|
||||
ecc headers before the (currently unknown) sector position $es$ is:
|
||||
|
||||
\[ interleaved\; headers = \left\lfloor\frac{ecc\; index - base\; ecc\; sectors}{2^p - 2}\right\rfloor \]
|
||||
|
||||
Therfore the position of the ecc sector in the augmented image is:
|
||||
|
||||
\[ es = protected\; sectors + ecc\; index + 2 \cdot interleaved\; headers + 2 \]
|
||||
|
||||
\paragraph{Example.} To continue the example from section \ref{example-two}, the
|
||||
position of the 17th ecc sector in the 3rd ecc layer shall be computed. The relevant
|
||||
layout values are:
|
||||
|
||||
\smallskip
|
||||
|
||||
\begin{center}
|
||||
\begin{tabular}{rll}
|
||||
{\em protected sectors} & = & 295.579 \\
|
||||
{\em layer size} & = & 1.408 \\
|
||||
{\em $2^p$} & = & 2.048 \\
|
||||
\end{tabular}
|
||||
\end{center}
|
||||
|
||||
The first interleaved ecc header is at position:
|
||||
|
||||
\[ first\; interleaved = \left\lceil\frac{295.579}{2.048}\right\rceil \cdot 2.048 = 296.960 \]
|
||||
|
||||
Before the first interleaved ecc header,
|
||||
|
||||
\[ base\; ecc\; sectors = 296.960 - 295.579 = 1.381 \]
|
||||
|
||||
ecc sectors have been stored. The linear index of the sought ecc sector is:
|
||||
|
||||
\[ ecc\; index = 3 \cdot 1.408 + 17 = 4.241 \]
|
||||
|
||||
Since 4.241 $\ge$ 1.381, the embedded ecc headers must be taken into account. There are
|
||||
|
||||
\[ interleaved \; headers = \left\lfloor\frac{4.241 - 1.381}{2.048-2}\right\rfloor = 1 \]
|
||||
|
||||
interleaved ecc headers, each containing 2 physical sectors. Therefore the position
|
||||
of the sought ecc sector in the image is:
|
||||
|
||||
\[ es = 295.579 + 4.241 + 2 + 2 = 299.824 \]
|
||||
|
||||
\bigskip
|
||||
|
||||
{\bf Converting image sector numbers into (layer, sector index pairs).}
|
||||
|
||||
\smallskip
|
||||
|
||||
If the sector number $s$ $<$ {\em protected sectors}, the sector will map to the data part
|
||||
as follows:
|
||||
|
||||
\[layer = \left\lfloor s\; /\; layer\;size \right\rfloor\]
|
||||
\[i = s \bmod layer\;size\]
|
||||
|
||||
Otherwise, the mapping to the ecc part is calculated as follows. The index of the first interleaved ecc header is:
|
||||
|
||||
\[ first\; interleaved = \left\lceil\frac{protected\; sectors}{2^p}\right\rceil \cdot 2^p\]
|
||||
|
||||
If $s\; mod\; 2^p \le 1$, the sector maps to the {\em n-th} interleaved ecc header, with:
|
||||
|
||||
\[n = \left\lfloor\frac{s - first\; interleaved}{2^p}\right\rfloor\]
|
||||
|
||||
If $s < first\; interleaved$, the sector is an ecc parity sector with the following mapping:
|
||||
|
||||
\[ layer = \left\lfloor(s - protected\; sectors)\; /\; layer\; size\right\rfloor\]
|
||||
\[ i = (s - protected\; sectors)\; mod\; layer\;size\]
|
||||
|
||||
If $s \ge first\; interleaved$, the mapping of the ecc parity sector is calculated as follows:
|
||||
|
||||
The amount of ecc sectors before the first interleaved ecc header is:
|
||||
|
||||
\[ base\; ecc\; sectors = first\; interleaved - protected\; sectors\]
|
||||
|
||||
The number of interleaved ecc headers before sector $s$ is:
|
||||
|
||||
\[ interleaved\; headers = \left\lfloor\frac{s - first\; interleaved - 2}{2^p}\right\rfloor \]
|
||||
|
||||
If ecc sectors were mapped linearly, then $s$ had the linear index:
|
||||
|
||||
\[ ecc\; index = s - protected\; sectors - 2 \cdot interleaved\; headers - 2\]
|
||||
|
||||
Finally, this means that $s$ maps to the following parity sector:
|
||||
|
||||
\[ layer = \left\lfloor ecc\; index\; /\; layer\; size\right\rfloor \]
|
||||
\[ i = ecc\;index \bmod layer\; size \]
|
||||
|
||||
\paragraph{Padding sectors.} Let {\em iso image size} be the size of the
|
||||
.iso image prior to augmenting it with error correction data. In order
|
||||
to augment the image with error correction sectors, the following
|
||||
sectors are treated as padding sectors which are filled with zeroes:
|
||||
|
||||
\begin{itemize}
|
||||
\item All sectors $s$ $>$ {\em protected sectors}.
|
||||
\item The first ecc header (sectors $iso\; image\; size$ and $iso\; image\; size+1$).
|
||||
\end{itemize}
|
||||
|
||||
The first ecc header sectors must be treated as padding to break a circular
|
||||
dependency with the parity bytes; as the ecc header contains a md5 sum over
|
||||
all parity bytes, it can not be used as input for the parity generation.
|
||||
|
||||
\subsection{Encoding the checksums}
|
||||
\label{crc-two}
|
||||
|
||||
For each sector of the .iso image a CRC32 checksum is calculated and stored in the
|
||||
data part of the augmented image (see fig. \ref{layout-logical-two}). By using the
|
||||
conventions of section \ref{sec-layout-logical-two}, let $d_{i,j}$ be the $i$-th byte
|
||||
of the $j$-th data layer and $c(y,j)$ the 4-byte checksum of the $y$-th sector
|
||||
in the $j$-th data layer. Then $c(y,j)$ = CRC32($d_{2048*y,j}$, $d_{2048*y+1,j}$, \dots, $d_{2048*y+2047,j}$).
|
||||
|
||||
\smallskip
|
||||
|
||||
Let $first\; layer\; crc\; idx = (iso\; image\; size + 2) \bmod layer\; size$.
|
||||
|
||||
$n$ is the number of data layers.
|
||||
|
||||
\smallskip
|
||||
|
||||
A total of $\left\lceil\frac{iso\; image\; size}{512}\right\rceil$ sectors holding the
|
||||
CRC32 checksums must be generated. The checksums are sorted by the layer sector $y$ first,
|
||||
then by layer number $i$. So for each layer sector $y$, there is a block of $n$ checksums generated,
|
||||
and there are $layer\; size$ blocks of checksums total. Checksum generation does not start with
|
||||
layer sector $0$, but rather with layer sector $first\; layer\; crc\; idx$. Subsequent blocks
|
||||
are generated in ascending layer sector order {\em modulo layer size} so that all
|
||||
{\em layer size} layer sector positions are eventually covered.
|
||||
This scheme produces the following
|
||||
sequence of checksums:
|
||||
|
||||
\medskip
|
||||
|
||||
\begin{tabular}{l}
|
||||
$c((first\; layer\; crc\; idx + 1) \bmod layer\; size, \quad 1)$\\
|
||||
$c((first\; layer\; crc\; idx + 1) \bmod layer\; size, \quad 2)$\\
|
||||
\dots\\
|
||||
$c((first\; layer\; crc\; idx + 1) \bmod layer\; size, \quad n)$\\
|
||||
\hline
|
||||
\end{tabular}
|
||||
|
||||
\begin{tabular}{l}
|
||||
|
||||
$c((first\; layer\; crc\; idx + 2) \bmod layer\; size, \quad 1)$\\
|
||||
$c((first\; layer\; crc\; idx + 2) \bmod layer\; size, \quad 2)$\\
|
||||
\dots\\
|
||||
$c((first\; layer\; crc\; idx + 2) \bmod layer\; size, \quad n)$\\
|
||||
\hline
|
||||
\end{tabular}
|
||||
|
||||
\begin{tabular}{l}
|
||||
\dots\hspace*{81mm}\\[1mm]
|
||||
\hline
|
||||
\end{tabular}
|
||||
|
||||
\begin{tabular}{l}
|
||||
$c((first\; layer\; crc\; idx + layer\; size -1) \bmod layer\; size, \quad 1)$\\
|
||||
$c((first\; layer\; crc\; idx + layer\; size -1) \bmod layer\; size, \quad 2)$\\
|
||||
\dots\\
|
||||
$c(first\; layer\; crc\; idx \bmod layer\; size, \quad n-1^*)$\\
|
||||
\hline
|
||||
\end{tabular}
|
||||
|
||||
\begin{tabular}{l}
|
||||
$c(first\; layer\; crc\; idx \bmod layer\; size, \quad 1)$\\
|
||||
$c(first\; layer\; crc\; idx \bmod layer\; size, \quad 2)$\\
|
||||
\dots\\
|
||||
$c(first\; layer\; crc\; idx \bmod layer\; size, \quad n-1^*)$\\
|
||||
\end{tabular}
|
||||
|
||||
\bigskip
|
||||
|
||||
$^{*)}$ The last sectors of each data layer may be padding sectors. For those padding
|
||||
sectors, {\em no} CRC32 checksums are generated and stored (e.g. the number of
|
||||
generated checksums is always exactly {\em iso image size}).
|
||||
|
||||
Since {\em iso image size} is usually not a multiple of 512, the last sector in
|
||||
the data part may only be partially filled with checksum data. The remaining
|
||||
bytes of this sector must be filled with the repeated byte sequence
|
||||
0x47,0x50,0x4c,0x00 which is the ASCII string representation of the text ``GPL''.
|
||||
|
||||
\smallskip
|
||||
|
||||
A copy of the CRC32 sums for the layer sectors at position ($first\; layer\; crc\; idx \bmod layer\; size$)
|
||||
is stored in the ecc header, starting there at byte position 2048. This has the advantage that
|
||||
the CRC checksums are already available for the {\em first layer crc}-th sectors
|
||||
of data layers $1,\ldots,n$. Any corrupted bytes in those sectors are
|
||||
detected by the CRC32 and can be handled by the error correction in erasure mode,
|
||||
saving precious parity bytes. When the error correction has restored all sectors
|
||||
of the {\em first layer crc}-th ecc block, note that the {\em first layer crc}-th
|
||||
sector of data layer $n$ will contain the CRC32 checksums for the data sectors
|
||||
in the next ecc block ({\em first layer crc + 1}). Therefore the layout is robust
|
||||
against loss of CRC sectors as they are restored by the error correction just
|
||||
before they are actually needed.
|
||||
|
||||
\subsection{Encoding the ecc layers}
|
||||
\label{encoding-two}
|
||||
|
||||
Encoding the ecc layers requires the following steps:
|
||||
|
||||
\medskip
|
||||
|
||||
First the image must be examined whether it does already contain
|
||||
augmented ecc data (either RS02 or RS03). If ecc data is found, the
|
||||
image must be stripped to the original size of the .iso image.
|
||||
Nesting ecc data is not supported by the current codecs and it
|
||||
might derail the heuristics for detecting the augmented data
|
||||
properly. From a technical point, nesting ecc data does not
|
||||
make sense either.
|
||||
|
||||
\medskip
|
||||
|
||||
Next the image must be checked for missing sectors, and be rejected
|
||||
if it is incomplete. Producing and
|
||||
writing images with missing sectors to a medium is
|
||||
confusing to the user as dvdisaster will always report
|
||||
the medium as partially readable even though it does not contain
|
||||
any physical defects. Also the error correction will never
|
||||
succeed for such media as it is just restoring the sector
|
||||
in its missing state.
|
||||
During the check for missing sectors the CRC32 checksums
|
||||
of each sector can be computed as described in section \ref{crc-two} and,
|
||||
after writing a placeholder for the first ecc header, be appended to the image.
|
||||
Also, the MD5 sum of the .iso image can be calculated at this time and
|
||||
kept for insertion into the ecc header field {\em ec-$>$mediumSum}.
|
||||
As another step of preparation, enough space should be appended
|
||||
to the image to store the ecc layer sectors. This makes sure
|
||||
that the encoder does not run out of disk space during
|
||||
its potentially lenghty work, and minimizes the impact of
|
||||
fragmentation due to random writes into the appended
|
||||
image area under most file systems.
|
||||
|
||||
\medskip
|
||||
|
||||
Finally, the error correction information needs to be encoded.
|
||||
Please refer to fig. \ref{layout-logical-two} on the location
|
||||
of the bytes comprising an error correction block.
|
||||
Although the ecc blocks could be encoded by a byte-wise scheme,
|
||||
a possible encoding algorithm would preferably buffer at least the
|
||||
255 sectors holding the required data for 2048 subsequent ecc
|
||||
blocks, and process those in bulk. From the first $n$ data layers,
|
||||
the required bytes are retrieved and fed into the RS(255,k)
|
||||
Reed-Solomon encoder, with $k = 255 -n$. The RS(255,k) encoder
|
||||
is the same for RS01, RS02 and RS03. See
|
||||
appendix \ref{rs} for the parameters used in the encoder.
|
||||
|
||||
Please refer to
|
||||
the previous section on information about zeroed-out and
|
||||
zero-padded data sectors. The resulting $k$ parity bytes
|
||||
are distributed into the $k$ ecc layers. When writing out
|
||||
the ecc data into the image, free gaps must be left for
|
||||
the interleaved ecc headers; see section
|
||||
\ref{addressing-two} for information on calculating the
|
||||
interleaved ecc header positions. At this time, the MD5
|
||||
sums of each ecc layer can be updated incrementally.
|
||||
|
||||
\medskip
|
||||
|
||||
When all parity sectors have been calculated, the ecc headers
|
||||
can be completed by filling in their {\em eh-$>$eccSum} field.
|
||||
This field contains the MD5 sum calculated over the MD5 sums
|
||||
over each of the $k$ ecc layers. In contrast to a single MD5
|
||||
sum spanning the ecc layers in a linear fashion, this
|
||||
approach allows for an incremental calculation of the MD5 sum
|
||||
while the ecc data is generated and written out.
|
||||
Reference in New Issue
Block a user