Files
dvdisaster/documentation/user-manual/background.tex
Stéphane Lesimple 38defdf7aa Apply (most) debian patches
Apply debian patch 02-encryption
Apply debian patch 03-dvdrom
Apply a modified version of patch 05-help-dialog
Apply debian patch 08-fix-gnu-make-detection
Apply debian patch 10-use-non-size-specific-icon-and-add-keywords-to-desktop-file
Apply debian patch 12-fix-spelling-of-up-to
Apply debian patch 13-fix-missing-language-field-in-po-files
Apply a modified version of debian patch 14-make-builds-reproducible
Apply debian patch 17-fix-all-but-deprecated-api-warnings
Apply a modified version of debian patch 18-update-copyright-in-about-dialog
Apply debian patch 19-show-text-files-with-abs-path
Apply debian patch 22-fix-hurd-i386-ftbfs
Apply debian patch 23-add-bdrom-support
Apply debian patch 25-fix-man-pages
Apply debian patch 27-allow-opening-in-browser-again
Apply debian patch 28-pdftex-reproducibility
Apply debian patch 29-fix-more-typos
Apply debian patch 30-hurd-kfreebsd-ftbfs
Apply debian patch 31-improve-hurd-and-kfreebsd-support
Apply debian patch 33-honour-LDFLAGS
Apply debian patch 34-gcc8-format-security.patch
Apply debian patch 35-archived-homepage
Apply debian patch 36-fix-parallelism
2020-08-20 14:17:35 +02:00

610 lines
28 KiB
TeX

\section{Background information}
\label{background}
The information in this sub section is not required for operating
dvdisaster. However it is helpful for understanding how dvdisaster works
and may help you getting the most out of the program according to your needs.
\subsection{Properties of the error correction}
\label{background-properties}
This sub section outlines the basic ideas behind dvdisaster so that you can
see for yourself whether it meets your demands on data safety. If in doubt,
you should not use dvdisaster as a single means of protecting your data
and deploy additional data backup strategies.
\paragraph{Method of error correction.} dvdisaster
uses a \href{http://en.wikipedia.org/wiki/Reed-Solomon_error_correction}{Reed-Solomon code } together
with an error correction algorithm optimized for the treatment of erasures. The
implementation draws a lot of inspiration and program code from the
excellent \href{http://www.ka9q.net/code/fec/}{Reed-Solomon} library written
\href{http://www.ka9q.net/}{by Phil Karn}.
Using the \tlnk{howto-eccfile-basic-settings-ecc}{standard settings} of
an \tlnk{howto-eccfile}{error correction} file 223 medium sectors
are combined into one error correction code (``ECC'') block.
Medium read errors are regarded as ``erasures''; therefore a maximum
of 32 bad medium sectors\footnote{The given threshold of 32 correctable errors per ECC
block is the standard setting. It is possible to select \tlnk{howto-eccfile-basic-settings-ecc}{other values}
for higher or lower error correction capabilities.
When \tlnk{howto-augment}{error correction data is put directly on the medium},
the number of correctable errors depends on the free space on the medium.} are
correctable per ECC block.
The 223 sectors are selected so that they are evenly distributed over the medium
surface. Therefore large contigous areas of defective sectors can be corrected before
the threshold of 32 defects per ECC block is reached. This kind of error
pattern is especially common for an aging medium where the outer area is starting
to degenerate, and for scratches along the data spiral.
On the other hand, radial or diagonal scratches are expected to be correctable
in the optical disc drive itself. If not, the employed error correction
strategy performs rather neutral in these cases (neither especially good
nor extraordinary bad).
\paragraph{Limits of error correction.} In the wost case, 33 defective sectors (for the
standard setting) are sufficient to prevent a full data restoration. However
to achieve this effect, the errors have to be distributed over the medium in such
a way that they occur in the same ECC block - such patterns are very unlikely.
Empirical tests have shown that on aging media about 10\% of the overall
sector count may become defective before the threshold of 33 defects per ECC block is reached.
Scratches will cause the threshold to be reached earlier, therefore it is
recommended to visually check the media in regular intervals. Media with read errors
caused by scratches should be replaced immediately.
\paragraph{Hardware limits.} Most drives will not recognize media when the
lead-in area before the first sector (near the center hole) is damaged.
In such cases, dvdisaster will not be able to recover any content from the media.
It is not feasible to enhance the reliability of poor quality media by using
dvdisaster. Cheap media can decay within a few days to an extent which will
exceed the capabilities of the error correction code.
\subsection{Image level data recovery}
\label{background-image-level}
Media recovery with error correcting codes takes place in two steps:
\begin{enumerate}
\item First as much data as possible is read from the defective medium.
\item Then the still missing data is recovered using the error correction code.
\end{enumerate}
The amount of readable data (step 1) does not only depend on the reading
capabilities of the drive, but also on which logical level the reading process
takes place. This page discusses the logical levels and explains why dvdisaster
uses image level reading. Please see also \tlnk{qa-image-level}{Q1.5} for
a visualization of this concept.
\paragraph{Logical levels of a medium.} Optical discs are organized into
{\em data sectors} containing 2048 bytes each. Consecutively reading and storing
these sectors produces a medium {\em image}.
But working with single data sectors is unwieldy from a users perspective. Therefore
media contain {\em file systems} which combine data sectors into {\em files}.
This requires accurate book-keeping about which data sectors each file is
comprised of, and also further attributes like file names and access permissions.
For this book-keeping some data sectors are reserved on the medium and filled
with respective data structures.
As a consequence media contain different {\em logical levels}: Viewing the
medium as a sequence of data sectors means working at the {\em image level}.
However looking at the medium as a collection of files is the perspective of
the {\em file(system) level}.
\smallskip
The two levels have different properties when it comes to data recovery:
\paragraph{Shortcomings of reading a medium at the file level.} Reading a defective
medium at the file level means trying to read as much data as possible from each file.
But a problem arises when data sectors are damaged which have book-keeping
functions in the file system. The list of files on the medium may be truncated.
Or the mapping of data sectors to files is incomplete. Therefore files or parts
from files may be lost even though the respective data sectors would still
be readable by the hardware. This is very bad since even small readable portions
of damaged files are valuable for the error correcting code.
An extremely bad case occurs when the error correction data is also stored in
files. Then the error correction data is required to repair the file system,
but the defective file system prevents access to the error correction data.
That means total data loss and rises some issues on
the treatment of error correction files (more on that later in this sub section).
A similar problem arises when files are augmented with PAR2 error correction
data and then all these files are stored on the same medium. If the internal
bookkeeping of the file system becomes damaged, all files and therefore the
error correction data become inaccessible and are therefore lost.
\smallskip
However the situation improves greatly when employing an image-based approach:
\paragraph{Advantages of reading at the image level.} Reading at the image level
uses direct communication with the drive firmware to access the data sectors.
The number of readable sectors depends only on the reading capabilities of the drive,
but not on the state of the file system. A read error in one sector does not
block access to other data sectors. Since {\em all} sectors are recovered
which are still readable by the hardware, this method provides the best foundation
for the error correction.
Another consideration is that the image contains all data sectors of
the medium. When the image is completely
recovered the file system stored within is also fully repaired. A protection
at the image level is therefore more extensive than error correction at the file level.
dvdisaster is working exclusively at the image level to take advantage of these properties.
The \tlnk{background-methods}{RS02 and RS03 methods} even allow
for \tlnk{howto-augment}{storing the error correction data at the same medium}.
This is possible since reading the error correction information at the image level
can not be blocked by errors at other medium locations (damaged sectors with error
correction data will reduce the error recovering capacity, but not make recovery impossible).
\smallskip
The \tlnk{background-methods}{RS01 method} protects media at the image level, too,
but stores the error correction data in files. The next section hints at some pitfalls
arising from that.
\paragraph{Consequences for error correction file storage.} \label{background-image-level-consequences}The error correction data
created by dvdisaster protects media at the image level. But how are the error correction
files protected?
\smallskip
Since error correction files are read at the file level they are subject to the
problems discussed above. If the medium containing the error correction files
becomes damaged it may not be possible to access or read them completely.
Therefore it is important to protect error correction files at the image level as
well: \tlnk{background-eccfile-storage}{Media containing error correction files} must be
protected with dvdisaster, too.
Since image level protection is assumed there is {\em no further damage protection}
contained in the RS01 error correction files! This would not help much, anyways:
Error correction files could be created in a way that allows them to provide a
reduced data recovering capacity even when being damaged\footnote{Error correction files
created with the new RS03 codec have exactly these properties, e.g. they are robust
against damage as long as there is enough error correction capacity left. This is
possible as there is sufficient computing power and fast hard drive access available
now to create a suitable file format. At the time RS01 was developed these resources
were not yet available. But it must be stressed that even for RS03 error correction
files the disadvantages of reading at the file level hold!}. But
however such internal protection would be designed, the error correction file would
still be only protected at the file system level with all the disadvantages discussed above!
In addition, the computing time and redundancy used for internal protection is
better spent at the image level: The Reed-Solomon error correction works best
when error correction information is spread over huge amounts of data. It is
better to protect the image as a whole than individually protecting each file within.
\subsection{The RS01, RS02 and RS03 methods}
\label{background-methods}
dvdisaster contains three error correction methods.
RS01 and RS02 have been available right from the start of the project.
RS03 has been added recently to take advantage of
recent hardware, and to implement improvements
learned from evaluating its predecessors.
\paragraph{Comparison of the methods.} All three methods build on the
same \tlnk{background-properties}{Reed-Solomon} error correction. They calculate error
correction information for ISO images which is used to recover unreadable sectors if
the disc becomes damaged afterwards.
\smallskip
The methods differ in the way the error correction information is stored:
\begin{itemize}
\item RS01 creates {\bf error correction files} which are stored separately
from the image they belong to. Since data protection at
the \tlnk{background-image-level}{file level} is difficult,
error correction files must be stored on media which are protected against data loss by dvdisaster, too.
\item To apply the RS02 method an image is first created on hard disk using a
optical disc authoring (or writing) software. Before the image is written on the medium,
dvdisaster is used to {\bf augment the image} with error correction data.
Therefore the data to be protected and the error correction information are located at
the same medium. Damaged sectors in the error correction information reduce the data recovering
capacity, but do not make recovery impossible - a second medium for keeping or protecting the
error correction information is not required.
\end{itemize}
RS03 is a further development of RS01 and RS02.
It can create both error correction files and augmented images:
\begin{itemize}
\item RS03 can distribute work over multiple processor cores and is
therefore much faster than RS01/RS02 on modern hardware.
\item RS03 {\bf error correction files} are - contrary to RS01 - robust
against damage. This should not delude you into careless handling of your error
correction files though - the disadvantages of \tlnk{background-image-level}{reading at the filesystem level} are
still valid.
\item RS03 {\bf augmented images} do not require so-called master blocks
holding important information. This makes RS03 a bit more robust, but also more restrictive:
The augmented image must completely fill the medium now while the size of augmented images
can be freely chosen in RS02.
\end{itemize}
The changes for parallel computation and higher robustness make RS03 a bit less space efficient,
e.g. RS03 error correction data has slighly less error correction capacity than
its RS01/RS02 counterparts on images with equal size.
\newpage
\paragraph{Comparison of error correction storage.} The following table summarizes the differences
between error correction files (RS01, RS03) and augmented images (RS02, RS03):
\smallskip
\begin{tabular}{|p{0.473\textwidth}|p{0.473\textwidth}|}
\hline
{\em Error correction files} &
{\em Image augmented with error correction data} \\
\hline
any possible redundancy can be chosen &
redundancy is limited by free space on the medium (= medium capacity - size of data image) \\
\hline
already effective at 15\% redundancy since error correction files are required to be free of damage &
requires more redundancy (recommended: 20-30\%) to compensate defects in the error correction data \\
\hline
medium can be completely filled with data &
usable medium capacity is reduced by amount of error correction data \\
\hline
can be created for already existing media &
only applicable before writing the new medium since the image must be augmented with error correction information in advance \\
\hline
separately storing the error correction file from user data strengthens data protection &
common storage of user data and error correction data may reduce error correction capacity \\
\hline
mapping between error correction files and media must be kept; error correction files must be
protected against damage &
easy one-medium solution; error correction information needs not to be cataloged or explicitly protected \\
\hline
no compatibility issues with play-back units &
media with augmented images may not play correctly on all units \\
\hline
\end{tabular}
\paragraph{Further reading.} The specification for the dvdisaster codecs
has been specified in a separate document called \href{https://web.archive.org/web/20180428070843/http://dvdisaster.net/downloads/codecs.pdf}{codecs.pdf}.
Good knowledge in coding theory and programming is required.
\subsection{The linear reading strategy}
\label{background-linear}
dvdisaster contains two different reading strategies.
\medskip
{\bf The linear reading strategy is recommended for:}
\begin{itemize}
\item Creating images from undamaged media,
e.g. to \tlnk{howto-eccfile}{generate the error correction file}; and
\item \tlnk{howto-scan}{Scanning the medium} for reading speed and read errors.
\end{itemize}
\medskip
{\bf The} \tlnk{background-adaptive}{adaptive reading strategy} {\bf is recommended for:}
\begin{itemize}
\item \tlnk{howto-recover}{Extracting data from damaged media.}
\end{itemize}
\paragraph{Properties of the linear reading strategy.}\quad
\medskip
Optical media are
organized into sectors which are continously numbered beginning with zero.
Each sector contains 2048 bytes of data.
\begin{figure}
\centerline{\includegraphics[width=1.0\textwidth]{screenshots/weak-cd-scan.png}}
\caption{Linear reading of a weak but fully readable CD.}
\label{background-linear-weak-scan}
\end{figure}
The linear reading strategy reads the medium from the start (sector 0)
until the end (last sector). The reading speed is shown graphically to
provide information about the medium quality (see
figure \ref{background-linear-weak-scan} for an example and the explanations
in the following paragraphs).
\paragraph{Configuration.}\quad
\medskip
\paragraph{Number of sectors to skip after a read error.} Reading
attempts for defective sectors are slow and may wear out the drive
mechanics under unfavourable conditions. A series of read errors,
spread over a continous sector range, is more common than single spot defects.
Therefore a \tlnk{howto-scan-prefs-read-attempts}{configuration option} exists
so that a certain number of sectors
will be skipped after the occurance of a read error. The skipped sectors are
assumed to be defective without further reading attempts. Some guide lines for
selecting the number of skipped sectors are:
\begin{itemize}
\item Skipping a large number of sectors (e.g. {\bf 1024}) gives a quick
overview of the degree of damage, but will usually not collect enough
data for repairing the medium image.
\item Smaller values like {\bf 16, 32 or 64} are a good trade-off: The processing
time will be considerably shortened, but still enough data for repairing
the image is usually collected.
\item On DVD and BD media read errors do usually extend over at least 16 sectors
for technical reasons. Therefore a sector skip less than 16 is not recommended
for such media.
\end{itemize}
\paragraph{Limiting the reading range.}
\medskip
Reading can
be \tlnk{howto-scan-prefs-image}{limited to a given range of sectors} in
the ``Image'' preferences tab. This is helpful during
multiple read attempts of damaged media.
\paragraph{Estimating media quality.}\quad
\medskip
\paragraph{The speed curve.} Most drives
will slow down while reading medium areas which are in bad condition:
\begin{itemize}
\item Drops in the reading speed can be a warning for an imminent medium failure.
\item However some drives will read with full speed until the bitter end.
With such drives media deterioration may not show up in the reading
curve until actual read errors occur.
\end{itemize}
The reading curve is most accurate when using
the ``\tlnk{howto-scan}{Scan}'' function.
During the ``\tlnk{howto-eccfile-create}{Read}'' operation the
read data will be written to the hard
drive at the same time, which may cause irregularities in the reading curve
depending on the operating system and hardware used.
\paragraph{Read errors.} Read errors cause \tlnk{howto-interpret-defective-cd}{red markings} in
the spiral or respective messages at the command line. This means
that the medium could not be read at these places during the current reading pass:
\begin{itemize}
\item The medium is most likely defective.
\item The image should be \tlnk{howto-recover}{repaired} as
soon as possible and then be transferred to a new medium.
\end{itemize}
\subsection{The adaptive reading strategy}
\label{background-adaptive}
Please note: Adaptive reading is unavailable in the current
version of dvdisaster as it requires rework for RS03. It
will be re-introuced soon. It is documented here for completeness:
%dvdisaster contains two different reading strategies.
\medskip
{\bf The adaptive reading strategy is recommended for:}
\begin{itemize}
\item \tlnk{howto-recover-read}{Extracting data} from damaged media,
\end{itemize}
\medskip
{\bf The} \tlnk{background-linear}{linear reading strategy} {\bf is recommended for:}
\begin{itemize}
\item \tlnk{howto-eccfile-create}{Creating images} from undamaged media,
e.g. to generate the error correction file.
\item \tlnk{howto-scan}{Scanning the medium} for reading speed and read errors.
\end{itemize}
\paragraph{Properties of the adaptive reading strategy.}\quad
\medskip
The adaptive reading strategy uses a ``divide and conquer'' approach for locating still readable portions of a damaged medium. The strategy is based upon an article published by Harald Bögeholz in c't-Magazin 16/2005 where it was published together with the program {\em h2cdimage}:
\begin{enumerate}
\item At the beginning the medium is considered as a single unread interval.
Reading begins with sector zero.
\item The reading process continues sequentially unless either the end of the
current interval or a read error is encountered.
\item The reading process is terminated if either (3a) sufficient sectors for
a successful error correction have been read or (3b) no unreadable intervals
exceeding a given size remain.
\item Otherwise the largest remaining unread interval will be determined. Reading
continues in the middle (e.g. second half) of this interval; the first half of
this interval is kept for a later reading pass.
\end{enumerate}
\begin{figure}
\centerline{\includegraphics[width=1.0\textwidth]{screenshots/adaptive-progress.png}}
\caption{Progress of th adaptive reading.}
\label{background-adaptive-progress}
\end{figure}
The termination criterium (3a) is especially efficient: Reading
will stop as soon as enough sectors have been collected for a
successful image recovery using the error correction file. This
can reduce the reading time by as much as 90 percent compared
with a full read attempt, but does of course only work when
error correction data is available.
\paragraph{Configuration}\quad
\medskip
\paragraph{Error correction file.} Adaptive reading works best when error
correction data is available. Obviously the \tlnk{howto-ecc}{error correction data} must have
been created at a time where the medium was
still fully readable. To use a \tlnk{howto-eccfile-create}{error correction file} during
adaptive reading,
\tlnk{howto-recover-enter-eccfile}{enter its name} before starting the reading process.
\paragraph{Limiting the adaptive reading range.} Reading can
be \tlnk{howto-eccfile-prefs-image}{limited} using the ``Image'' preferences
to a part of the medium. This is not recommended
when error correction data is used since the limit might prevent sectors
from being read which are required for a succesful error correction. If
no error correction data is available, limiting the reading range might
be helpful during multiple reading attempts.
\paragraph{Early reading termination.} If no error correction
data is available, adaptive reading will stop when no unread intervals
\tlnk{howto-recover-prefs-read-attempts}{larger than a selectable size} remain.
The termination value should not be smaller than 128. Otherwise the laser
head will have to carry out lots of repositioning moves during the final
phase of the reading process. This will negatively affect both the life
expectancy of the drive and its reading capability. A better approach is
to stop adaptive reading earlier and then try reading the remaining sectors
with an additional \tlnk{background-linear}{linear reading} pass.
\subsection{Remarks on read errors}
\label{background-read-errors}
Optical media have their own error correction code which protects the data against
small manufacturing errors and inaccuracies during writing. If the writer and medium
are compatible and of high quality, the error correction built into the medium will
at first be mainly idle. This leaves enough reserves to compensate normal wear and
aging effects during many years of the medium usage.
\smallskip
When the capacity of the built-in error correction is finally exhausted, read errors
will start to appear on the medium. These will be reported by
the ``\tlnk{howto-scan}{Scan}''-operation
of dvdisaster. Depending on the time of first occurrence, two types of read errors
are of particular interest:
\paragraph{Read errors appearing right after writing the medium.} This is a sign of:
\begin{itemize}
\item media from a faulty production run, or
\item media which are not compatible with the writer.
\end{itemize}
A prudential choice is to dispose of the faulty media and to write the data
on error-free media, possibly switching to a different producer.
\smallskip
Please withstand the temptation of trying to preserve the faulty media
by means of an error correction file - this will most likely end with data loss.
\paragraph{Read errors after a few months/years.} The built-in error correction
of the medium will be increasingly loaded during its life time until it finally
fails and read errors show up. This happens for mechanical reasons (scratches,
warping of the plastic material) as well as for chemical causes (decaying dye
and/or reflective layer).
\smallskip
These effects typically occur while the medium is stored away for a few months,
and it may not be possible to read in all sectors afterwards.
\smallskip
Therefore it is crucial to \tlnk{howto-ecc}{create the error correction data} in
time. The ecc data contains information for recalculating the contents of missing
sectors (\tlnk{background-properties}{within certain limits}).
Therefore with the help of the ecc data dvdisaster can recover images even
if not all sectors could actually be read by the drive.
\smallskip
Since the error correction can reconstruct missing sectors up to a certain number,
it is not necessary to squeeze out a defective medium for every readable sector.
The \tlnk{background-adaptive}{adaptive reading strategy} checks during reading
whether enough data for error correction has been collected. As soon as this is
the case, reading stops and still unread sectors will be recovered using the ecc data.
\paragraph{Some hints for effectively reading damaged media}\quad
\medskip
The outcome from reading damaged media depends on several factors:
\begin{itemize}
\item {\bf Not all drives are built the same.}
Different drives have different reading capabilities. Take advantage of dvdisaster's
function for completing an image with several reading passes and use different drives
for each pass. Transfer the image file between computers using a network or removable
media in order to use drives installed in different machines.
\item {\bf Eject and insert the medium again.}
Sometimes it makes a difference to eject the medium, turn it about a quarter,
and then load it again for another reading pass.
\item {\bf Some drives read better while being cold.}
Turn off the computer over night and perform another reading attempt in the next morning.
But note: ``Cold'' refers to normal living room conditions - putting hardware or
media into the fridge can be very unhealthy for them.
\end{itemize}
\subsection{Hints for storing the error correction files}
\label{background-eccfile-storage}
Currently there are few (if any) exchangeable media technologies which can be a
cost-effective alternative to the various optical disc formats. So you will
probably not only use optical discs for data archival, but store the respective
error correction files on CD, DVD or BD as well.
\smallskip
There is nothing wrong with that, but bear in mind that your archived data and
the error correction files are stored on media with the same degree of reliability.
When read errors occur on the archived data, be prepared that the disc with the
respective error correction file might have aged beyond full readability, too.
\smallskip
Therefore it is important to protect your error correction files with the same
care as your other data\footnote{You might also choose
an \tlnk{howto-augment}{augmented image} using RS02 or RS03 instead
of creating an error correction file.}.
This is best achieved by integrating the error correction files into
your normal data backup scheme. Here are two ideas:
\paragraph{1. Storing the error correction files on dedicated media:}\quad
\medskip
If you decide to store error correction files on separate media, it
is \tlnk{background-image-level-consequences}{important} to protect those media
with dvdisaster as well. To avoid a never-ending chain (error correction
files for media of error correction files for ...), try the following:
\smallskip
Lets assume that five error correction files can be stored at each medium.
Write the first five error correction files to the first medium and create
another error correction file for that medium. Now save that error correction
file together with four other error correction files on the second medium.
If you continue that way, all error correction files except for those from the
last medium (which may still be kept on hard disk) are protected by dvdisaster.
\paragraph{2. Putting the error correction file on the next medium of a series:}\quad
\medskip
If you do not fill your media to the max (e.g. with less than 4GiB for
a single layered DVD), you can store the error correction file of one
medium on the succeeding medium within a series.