Compare commits
31 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
bf5b8d60aa | ||
|
|
8d6c1cd0ea | ||
|
|
efe921aff5 | ||
|
|
de3a853228 | ||
|
|
77e20ad55c | ||
|
|
ffa5c8bf64 | ||
|
|
ba47ccb680 | ||
|
|
e97b1bee52 | ||
|
|
24f43183fd | ||
|
|
0628cf9249 | ||
|
|
da879a098f | ||
|
|
eb010ec7f5 | ||
|
|
70fd411418 | ||
|
|
e2f1251c70 | ||
|
|
06ad11340d | ||
|
|
5c34457f2d | ||
|
|
7cdcba4a35 | ||
|
|
b30be14b3e | ||
|
|
3b8d510aeb | ||
|
|
9132c437fc | ||
|
|
7159bbb1fd | ||
|
|
ae9643f2ce | ||
|
|
6559e62276 | ||
|
|
1552fb8ec8 | ||
|
|
79d0b184b8 | ||
|
|
2793349268 | ||
|
|
4a7845dc7a | ||
|
|
978bbe4b40 | ||
|
|
0bd2b92237 | ||
|
|
be51d4c842 | ||
|
|
af2ac4b113 |
75
README.md
75
README.md
@@ -1,64 +1,55 @@
|
||||
# MultiPar
|
||||
|
||||
### v1.3.3.1 is public
|
||||
### v1.3.3.3 is public
|
||||
|
||||
This is a testing version to improve speed of PAR2 calculation.
|
||||
Because the new method isn't tested so much, there may be a bug, failure, or mistake.
|
||||
Be careful to use this non-stable version.
|
||||
When you don't want to test by yourself, you should not use this yet.
|
||||
If you see a problem, please report the incident.
|
||||
I will try to solve as possible as I can.
|
||||
I fixed a few rare bugs in this version.
|
||||
While most users were not affected by those problems,
|
||||
those who saw the matter would better use new version.
|
||||
If there is a problem still, I will fix as possible as I can.
|
||||
I updated some help documents about Batch script.
|
||||
I mentioned the location of help files in ReadMe text.
|
||||
|
||||
CPU's L3 cache optimization depends on hardware environment.
|
||||
It's difficult to guess the best setting for unknown type.
|
||||
It seems to work well on Intel and AMD 's most CPUs.
|
||||
Thanks Anime Tosho and MikeSW17 for long tests.
|
||||
But, I'm not sure the perfomance of rare strange kind CPUs.
|
||||
If you want to compare speed of different settings on your CPU,
|
||||
you may try samples (TestBlock_2023-08-31.zip) in "MultiPar_sample" folder
|
||||
on [OneDrive](https://1drv.ms/u/s!AtGhNMUyvbWOg0cF2UHcs709Icv4).
|
||||
New version supports a PC with max 8 OpenCL devices.
|
||||
Thanks [Yi Gu for reporting bug in a rare environment](https://github.com/Yutaka-Sawada/MultiPar/issues/110).
|
||||
I didn't think a user put so many OpenCL devices on a PC.
|
||||
It will detect a Graphics board correctly.
|
||||
|
||||
I improved GPU implementation very much.
|
||||
Thanks [Slava46 and K2M74 for many tests](https://github.com/Yutaka-Sawada/MultiPar/issues/99).
|
||||
While I almost gave up to increase speed, their effort encouraged me to try many ways.
|
||||
Without their aid, I could not implement this GPU function.
|
||||
OpenCL perfomance is varied in every graphics boards.
|
||||
If you have a fast graphics board, enabling "GPU acceleration" would be faster.
|
||||
If it's not so fast (or is slow) on your PC, just un-check the feature.
|
||||
I improved source file splitting feature at creating PAR2 files.
|
||||
Thanks [AreteOne for reporting bug and suggestion of improvment](https://github.com/Yutaka-Sawada/MultiPar/issues/117).
|
||||
When file extension is a number, it didn't handle properly.
|
||||
If someone saw strange behavior at file splitting ago, it should have been solved in this version.
|
||||
|
||||
I saw a new feature of Inno Setup 6, which changes install mode.
|
||||
It shows a dialog to ask which install mode.
|
||||
Then, a user can install MultiPar in "Program Files" directory by selecting "Install for all users".
|
||||
This method may be easier than starting installer by "Run as administrator".
|
||||
I test the selection dialog at this version.
|
||||
If there is no problem nor complaint from users, I use this style in later versions, too.
|
||||
I fixed a bug in verifying external files.
|
||||
It might not find the last slice in a source file, when the file data is redundant.
|
||||
Thanks [dle-fr for reporting bug and testing many times](https://github.com/Yutaka-Sawada/MultiPar/issues/130).
|
||||
This solution may improve verification of damaged files, too.
|
||||
When source files are mostly random data like commpressed archive, there was no problem.
|
||||
|
||||
|
||||
[ Changes from 1.3.3.0 to 1.3.3.1 ]
|
||||
[ Changes from 1.3.3.2 to 1.3.3.3 ]
|
||||
|
||||
Installer update
|
||||
- It shows dialog to select "per user" or "per machine" installation.
|
||||
- Inno Setup was updated from v6.2.2 to v6.3.1.
|
||||
|
||||
PAR2 client update
|
||||
- Change
|
||||
- Max number of threads to read files on SSD was increased to 6.
|
||||
|
||||
- Improvement
|
||||
- GPU acceleration would become faster.
|
||||
- Bug fix
|
||||
- Fixed a bug in GPU acceleration, when there are many OpenCL devices.
|
||||
- Failure of splitting source files with numerical extension was fixed.
|
||||
- Faulty prediction of the last block in a file with repeated data was fixed.
|
||||
|
||||
|
||||
[ Hash value ]
|
||||
|
||||
MultiPar1331.zip
|
||||
MD5: ECFC1570C839DD30A2492A7B05C2AD6E
|
||||
SHA1: 5E0E4CC38DAA995294A93ECA10AEB3AE84596170
|
||||
MultiPar1333.zip
|
||||
MD5: 01A201CA340C33053E6D7D2604D54019
|
||||
SHA1: F7C30A7BDEB4152820C9CFF8D0E3DA719F69D7C6
|
||||
|
||||
MultiPar1331_setup.exe
|
||||
MD5: A55E6FA5A6853CB42E3410F35706BAD9
|
||||
SHA1: 8D46BD6702E82ABA9ACCFA5223B2763B4DCEFE9E
|
||||
MultiPar1333_setup.exe
|
||||
MD5: 33F9E441F5C1B2C00040E9BAFA7CC1A9
|
||||
SHA1: 6CEBED8CECC9AAC5E8070CD5E8D1EDF7BBBC523A
|
||||
To install under "Program Files" or "Program Files (x86)" directory,
|
||||
you must select "Install for all users" at the first dialog.
|
||||
|
||||
Old versions and source code packages are available at
|
||||
[GitHub](https://github.com/Yutaka-Sawada/MultiPar/releases) or
|
||||
[OneDrive](https://1drv.ms/u/s!AtGhNMUyvbWOaSo1n_R8awJ_hg0).
|
||||
[OneDrive](https://1drv.ms/f/c/8eb5bd32c534a1d1/QtGhNMUyvbUggI5pAAAAAAAAKjWf9HxrAn-GDQ).
|
||||
|
||||
Binary file not shown.
@@ -25,7 +25,13 @@ Be careful to use those special features.
|
||||
|
||||
[ System requirement ]
|
||||
|
||||
MultiPar requires a PC with Windows Vista or later (Windows 7, 8, 10).
|
||||
MultiPar requires a PC with Windows Vista or later (Windows 7, 8, 10, 11).
|
||||
|
||||
[ Usage manual or Help documents ]
|
||||
|
||||
There are some usage manual or help documents in "help" folder.
|
||||
English pages exists in "help/0409" folder.
|
||||
You may open the manual by pushing "F1-key", while using MultiPar.
|
||||
|
||||
|
||||
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
|
||||
|
||||
@@ -54,7 +54,13 @@ QuickPar など他の PAR クライアントはコメント機能に対応して
|
||||
|
||||
[ 動作環境 ]
|
||||
|
||||
Windows Vista かそれ以降 (Windows 7, 8, 10) のパソコンが必要です。
|
||||
Windows Vista かそれ以降 (Windows 7, 8, 10, 11) のパソコンが必要です。
|
||||
|
||||
[ 使い方や解説文章 ]
|
||||
|
||||
ほとんど英語ですが、help フォルダーの中に使い方や解説文章が入ってます。
|
||||
日本語のページは help\0411 フォルダーの中にあります。
|
||||
MultiPar の使用中に、F1-key を押すと、マニュアルが表示されます。
|
||||
|
||||
|
||||
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
|
||||
|
||||
@@ -1,5 +1,31 @@
|
||||
Release note of v1.3.3 tree
|
||||
|
||||
par2j's "lc" option was changed to support more threads.
|
||||
Windows Vista will be removed from supported OS.
|
||||
|
||||
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
|
||||
|
||||
[ Changes from 1.3.3.2 to 1.3.3.3 ] (2024/06/15)
|
||||
|
||||
Installer update
|
||||
Inno Setup was updated from v6.2.2 to v6.3.1.
|
||||
|
||||
PAR2 client update
|
||||
Bug fix
|
||||
Fixed a bug in GPU acceleration, when there are many OpenCL devices.
|
||||
Failure of splitting source files with numerical extension was fixed.
|
||||
Faulty prediction of the last block in a file with repeated data was fixed.
|
||||
|
||||
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
|
||||
|
||||
[ Changes from 1.3.3.1 to 1.3.3.2 ] (2024/01/10)
|
||||
|
||||
PAR2 client update
|
||||
Improvement
|
||||
GPU acceleration will work well on AMD graphics boards.
|
||||
|
||||
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
|
||||
|
||||
[ Changes from 1.3.3.0 to 1.3.3.1 ] (2023/11/11)
|
||||
|
||||
Installer update
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
v1.3.3 の更新情報 (2023/11/11)
|
||||
v1.3.3 の更新情報 (2024/06/15)
|
||||
|
||||
まだ動作実験中ですので、不安な人は前のバージョンを使ってください。
|
||||
Windows Vista を動作対象の OS から外す予定です。
|
||||
|
||||
[ 1.3.2 から 1.3.3 への変更点 ]
|
||||
|
||||
|
||||
@@ -16,7 +16,7 @@ textarea{width:100%;}
|
||||
|
||||
<p> 
|
||||
Because MultiPar consists of PAR clients and GUI,
|
||||
it is possible to use a PAR client on Command prompt.
|
||||
it is possible to use a PAR client on Command Prompt.
|
||||
Read a manual of command line for the details of command and option.
|
||||
It's available by batch file (or command script).
|
||||
</p>
|
||||
@@ -27,13 +27,57 @@ Modify the options for PAR2 files, and set the path of <tt>par2j.exe</tt>.
|
||||
Write absolute path like;<br>
|
||||
<code>SET par2_path="C:\something directory\MultiPar\par2j.exe"</code><br>
|
||||
Read manuals (<tt>Command_***.txt</tt>) to know the detail of options.
|
||||
Read Windows OS 's help for Command prompt's usage.
|
||||
Read Windows OS 's help for Command Prompt's usage.
|
||||
</p>
|
||||
<p> 
|
||||
Save a sample script to a file like <tt>batch.bat</tt> or <tt>batch.cmd</tt>.
|
||||
To specify path of files or folders, Drag & Drop them on the batch file.
|
||||
When you put a shortcut icon of the batch file in "SendTo" menu,
|
||||
you can call the batch file by selecting files then Righ-Click & SendTo.
|
||||
There are some ways to specify a file or folder.<br>
|
||||
<ol>
|
||||
<li>Type everytime by keyboard on Command Prompt
|
||||
<p> 
|
||||
If you use the batch file at Command Prompt, change directory to the batch file at first.
|
||||
Next, type <tt>batch.bat "path of the file or folder"</tt>.
|
||||
If there is no space in the path, no need to cover by <tt>""</tt>.
|
||||
The path may be absolute path or relative path from the batch file.
|
||||
Normally absolute path would be safe.
|
||||
</p>
|
||||
<p> 
|
||||
If you want to specify multiple files or folders,
|
||||
type <tt>batch.bat "path of the first file or folder" "path of second file or folder" "path of third file or folder"</tt>.
|
||||
You may add some pathes after a script filename.
|
||||
</p>
|
||||
</li>
|
||||
<li>Drag & Drop a file or folder by mouse
|
||||
<p> 
|
||||
On Windows Explorer, drag a file or folder and drop it on your batch file.
|
||||
If you put the batch file on Desktop, it will be easy to Drag & Drop.
|
||||
</p>
|
||||
<p> 
|
||||
If you want to specify multiple files or folders,
|
||||
select them at once on Windows Explorer, and Drag & Drop the group on the batch file.
|
||||
While dragging, number of items may be shown.
|
||||
</p>
|
||||
</li>
|
||||
<li>Select SendTo at Windows Explorer's right click menu
|
||||
<p> 
|
||||
At first, create Short-cut icon of your batch file.
|
||||
Name it to be something easy to understand like "Create PAR2".
|
||||
Then, put the Short-cut icon in your <tt>SendTo</tt> folder.
|
||||
You can open the <tt>SendTo</tt> folder by typing <tt>shell:sendto</tt> on path-box of Windows Explorer.
|
||||
</p>
|
||||
<p> 
|
||||
After you select a file or folder on Windows Explorer, click mouse's right button.
|
||||
Select "SendTo" item on the right-click menu.
|
||||
As your created Short-cut item will exist in the SendTo sub-menu, select the Short-cut icon.
|
||||
</p>
|
||||
<p> 
|
||||
If you want to specify multiple files or folders,
|
||||
select them at once on Windows Explorer.
|
||||
While mouse cursor over there, click mouse's right button.
|
||||
Later action is same as single item.
|
||||
</p>
|
||||
</li>
|
||||
</ol>
|
||||
</p>
|
||||
<p> 
|
||||
If you want to confirm the result of scripting,
|
||||
|
||||
@@ -51,7 +51,7 @@ There are command-line manuals in "<tt>help</tt>" folder.
|
||||
</table>
|
||||
|
||||
<hr>
|
||||
<small>last update 2023/11/11 for version 1.3.3.1</small>
|
||||
<small>last update 2024/04/13 for version 1.3.3.3</small>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
||||
@@ -173,7 +173,8 @@ It will verify each PAR2 set continuously.
|
||||
It's possible to stop queue on GUI.
|
||||
<tr><td>Script file<td><tt>queue_verify.py</tt>
|
||||
<tr><td>Caution<td>You must select MultiPar Option: "Re-use verification result" to be "For 3 days" or longer,
|
||||
and you should check "Don't search subfolders" in "Verification and Repair options" section on "Client behavior" tab.
|
||||
and you should check "Don't search subfolders" in "Verification and Repair options" section on "Client behavior" tab.
|
||||
Furthermore, you must set proper <tt>save_path</tt> in the script.
|
||||
</table>
|
||||
</p>
|
||||
<p> 
|
||||
|
||||
@@ -31,7 +31,14 @@ Be careful to use those special features.
|
||||
|
||||
<h3>System requirement</h3>
|
||||
<p> 
|
||||
MultiPar requires a PC with Windows Vista or later (Windows 7, 8, 10).
|
||||
MultiPar requires a PC with Windows Vista or later (Windows 7, 8, 10, 11).
|
||||
</p>
|
||||
|
||||
<h3>Usage manual or Help documents</h3>
|
||||
<p> 
|
||||
There are some usage manual or help documents in "help" folder.
|
||||
English pages exists in "help/0409" folder.
|
||||
You may open the manual by pushing "F1-key", while using MultiPar.
|
||||
</p>
|
||||
|
||||
</body>
|
||||
|
||||
@@ -51,7 +51,7 @@
|
||||
</table>
|
||||
|
||||
<hr>
|
||||
<small><EFBFBD>ŏI<EFBFBD>X<EFBFBD>V 2023/11/11 (<28>o<EFBFBD>[<5B>W<EFBFBD><57><EFBFBD><EFBFBD> 1.3.3.1)</small>
|
||||
<small><EFBFBD>ŏI<EFBFBD>X<EFBFBD>V 2024/04/13 (<28>o<EFBFBD>[<5B>W<EFBFBD><57><EFBFBD><EFBFBD> 1.3.3.3)</small>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
||||
@@ -25,7 +25,14 @@ QuickPar
|
||||
|
||||
<h3><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD></h3>
|
||||
<p> 
|
||||
Windows Vista <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ȍ~ (Windows 7, 8, 10) <20>̃p<CC83>\<5C>R<EFBFBD><52><EFBFBD><EFBFBD><EFBFBD>K<EFBFBD>v<EFBFBD>ł<EFBFBD><C582>B
|
||||
Windows Vista <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ȍ~ (Windows 7, 8, 10, 11) <20>̃p<CC83>\<5C>R<EFBFBD><52><EFBFBD><EFBFBD><EFBFBD>K<EFBFBD>v<EFBFBD>ł<EFBFBD><C582>B
|
||||
</p>
|
||||
|
||||
<h3><EFBFBD>g<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD></h3>
|
||||
<p> 
|
||||
<EFBFBD>قƂ<EFBFBD><EFBFBD>ljp<EFBFBD><EFBFBD><EFBFBD>ł<EFBFBD><EFBFBD><EFBFBD><EFBFBD>Ahelp <20>t<EFBFBD>H<EFBFBD><48><EFBFBD>_<EFBFBD>[<5B>̒<EFBFBD><CC92>Ɏg<C98E><67><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>͂<EFBFBD><CD82><EFBFBD><EFBFBD><EFBFBD><EFBFBD>Ă܂<C482><DC82>B
|
||||
<EFBFBD><EFBFBD><EFBFBD>{<7B><><EFBFBD>̃y<CC83>[<5B>W<EFBFBD><57> help\0411 <20>t<EFBFBD>H<EFBFBD><48><EFBFBD>_<EFBFBD>[<5B>̒<EFBFBD><CC92>ɂ<EFBFBD><C982><EFBFBD><EFBFBD>܂<EFBFBD><DC82>B
|
||||
MultiPar <20>̎g<CC8E>p<EFBFBD><70><EFBFBD>ɁAF1-key <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ƁA<C681>}<7D>j<EFBFBD><6A><EFBFBD>A<EFBFBD><41><EFBFBD><EFBFBD><EFBFBD>\<5C><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>܂<EFBFBD><DC82>B
|
||||
</p>
|
||||
|
||||
</body>
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
[ MultiPar GUI - version 1.3.3.0 or later ]
|
||||
[ MultiPar GUI - version 1.3.3.3 or later ]
|
||||
|
||||
Usage: MultiPar.exe [command] [/base path] [/list path] [files]
|
||||
|
||||
@@ -99,6 +99,11 @@ write a line of "RecoveryFileLimit=1" under "[Option]" section.
|
||||
It's same as an option:
|
||||
"Variable (limited to size of largest data file)" on QuickPar.
|
||||
|
||||
If you want to limit size of Recovery Files by a specific value,
|
||||
write a line of "RecoveryFileLimit=2" under "[Option]" section.
|
||||
"Limit Size to" value is enabled, even when "Split Files" isn't checked.
|
||||
This setting is useful, if you don't want to split source files.
|
||||
|
||||
If you want to enable "Most Resent Used List",
|
||||
write this line "MRUMax=5" under "[Path]" section.
|
||||
You may change the number of items after "MRUMax=".
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
[ par2j.exe - version 1.3.3.0 or later ]
|
||||
[ par2j.exe - version 1.3.3.2 or later ]
|
||||
|
||||
Type "par2j.exe" to see version, test integrity, and show usage below.
|
||||
|
||||
@@ -367,16 +367,24 @@ The format is "/lc#" (# is from 1 to 32 as the number of using threads).
|
||||
253: It uses 3/4 number of physical Cores.
|
||||
254: It uses one less threads than number of physical Cores.
|
||||
0: It uses the number of physical Cores.
|
||||
255: It uses one more threads than number of physical Cores.
|
||||
255: It tries to use more threads than number of physical Cores.
|
||||
|
||||
You may set additional combinations;
|
||||
+1024 to disable CLMUL (and use old SSSE3 code),
|
||||
+2048 to disable JIT (for SSE2),
|
||||
+4096 to disable SSSE3,
|
||||
+8192 to disable AVX2,
|
||||
+256 or +512 (slower device) to enable GPU acceleration.
|
||||
You may set additional combinations for CPU feature;
|
||||
+1024 to disable CLMUL (and use slower SSSE3 code)
|
||||
+2048 to disable JIT (for SSE2)
|
||||
+4096 to disable SSSE3
|
||||
+8192 to disable AVX2
|
||||
|
||||
for example, /lc1 to use single Core, /lc45 to use half Cores and GPU
|
||||
You may set additional combinations for GPU control;
|
||||
+256 or +512 (slower device) to enable GPU acceleration
|
||||
+65536 for classic method
|
||||
+131072 for 16-byte memory access
|
||||
+262144 for 4-byte memory access and calculate 2 blocks at once
|
||||
+524288 for 16-byte memory access and calculate 2 blocks at once
|
||||
+1048576 for CL_MEM_COPY_HOST_PTR or +2097152 for CL_MEM_USE_HOST_PTR
|
||||
(When you set exclusive bits, larger value will be used.)
|
||||
|
||||
for example, /lc1 to use single Core, /lc508 to use half Cores and GPU
|
||||
|
||||
/m :
|
||||
Set this, if you want to set memory usage.
|
||||
|
||||
BIN
alpha/par2j.exe
BIN
alpha/par2j.exe
Binary file not shown.
Binary file not shown.
@@ -1,4 +1,4 @@
|
||||
[ par2j.exe - version 1.3.3.1 or later ]
|
||||
[ par2j.exe - version 1.3.3.2 or later ]
|
||||
|
||||
Type "par2j.exe" to see version, test integrity, and show usage below.
|
||||
|
||||
@@ -369,14 +369,22 @@ The format is "/lc#" (# is from 1 to 32 as the number of using threads).
|
||||
0: It uses the number of physical Cores.
|
||||
255: It tries to use more threads than number of physical Cores.
|
||||
|
||||
You may set additional combinations;
|
||||
+1024 to disable CLMUL (and use slower SSSE3 code),
|
||||
+2048 to disable JIT (for SSE2),
|
||||
+4096 to disable SSSE3,
|
||||
+8192 to disable AVX2,
|
||||
+256 or +512 (slower device) to enable GPU acceleration.
|
||||
You may set additional combinations for CPU feature;
|
||||
+1024 to disable CLMUL (and use slower SSSE3 code)
|
||||
+2048 to disable JIT (for SSE2)
|
||||
+4096 to disable SSSE3
|
||||
+8192 to disable AVX2
|
||||
|
||||
for example, /lc1 to use single Core, /lc45 to use half Cores and GPU
|
||||
You may set additional combinations for GPU control;
|
||||
+256 or +512 (slower device) to enable GPU acceleration
|
||||
+65536 for classic method
|
||||
+131072 for 16-byte memory access
|
||||
+262144 for 4-byte memory access and calculate 2 blocks at once
|
||||
+524288 for 16-byte memory access and calculate 2 blocks at once
|
||||
+1048576 for CL_MEM_COPY_HOST_PTR or +2097152 for CL_MEM_USE_HOST_PTR
|
||||
(When you set exclusive bits, larger value will be used.)
|
||||
|
||||
for example, /lc1 to use single Core, /lc508 to use half Cores and GPU
|
||||
|
||||
/m :
|
||||
Set this, if you want to set memory usage.
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
// create.c
|
||||
// Copyright : 2023-10-22 Yutaka Sawada
|
||||
// Copyright : 2024-02-09 Yutaka Sawada
|
||||
// License : GPL
|
||||
|
||||
#ifndef _UNICODE
|
||||
@@ -26,6 +26,11 @@
|
||||
|
||||
//#define TIMER // 実験用
|
||||
|
||||
#ifdef TIMER
|
||||
#include <time.h>
|
||||
static double time_sec, time_speed;
|
||||
#endif
|
||||
|
||||
// ソート時に項目を比較する
|
||||
static int sort_cmp(const void *elem1, const void *elem2)
|
||||
{
|
||||
@@ -196,7 +201,7 @@ int set_common_packet(
|
||||
__int64 prog_now = 0;
|
||||
|
||||
#ifdef TIMER
|
||||
unsigned int time_start = GetTickCount();
|
||||
clock_t time_start = clock();
|
||||
#endif
|
||||
print_progress_text(0, "Computing file hash");
|
||||
|
||||
@@ -305,14 +310,14 @@ unsigned int time_start = GetTickCount();
|
||||
off += (64 + main_packet_size);
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount() - time_start;
|
||||
printf("hash %d.%03d sec", time_start / 1000, time_start % 1000);
|
||||
if (time_start > 0){
|
||||
time_start = (int)((total_file_size * 125) / ((__int64)time_start * 131072));
|
||||
printf(", %d MB/s\n", time_start);
|
||||
time_start = clock() - time_start;
|
||||
time_sec = (double)time_start / CLOCKS_PER_SEC;
|
||||
if (time_sec > 0){
|
||||
time_speed = (double)total_file_size / (time_sec * 1048576);
|
||||
} else {
|
||||
printf("\n");
|
||||
time_speed = 0;
|
||||
}
|
||||
printf("hash %.3f sec, %.0f MB/s\n", time_sec, time_speed);
|
||||
#endif
|
||||
|
||||
error_end:
|
||||
@@ -341,7 +346,7 @@ int set_common_packet_multi(
|
||||
FILE_HASH_TH th[MAX_MULTI_READ];
|
||||
|
||||
#ifdef TIMER
|
||||
unsigned int time_start = GetTickCount();
|
||||
clock_t time_start = clock();
|
||||
#endif
|
||||
|
||||
memset(hSub, 0, sizeof(HANDLE) * MAX_MULTI_READ);
|
||||
@@ -545,14 +550,14 @@ unsigned int time_start = GetTickCount();
|
||||
}
|
||||
print_progress_done(); // 改行して行の先頭に戻しておく
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount() - time_start;
|
||||
printf("hash %d.%03d sec", time_start / 1000, time_start % 1000);
|
||||
if (time_start > 0){
|
||||
time_start = (int)((total_file_size * 125) / ((__int64)time_start * 131072));
|
||||
printf(", %d MB/s\n", time_start);
|
||||
time_start = clock() - time_start;
|
||||
time_sec = (double)time_start / CLOCKS_PER_SEC;
|
||||
if (time_sec > 0){
|
||||
time_speed = (double)total_file_size / (time_sec * 1048576);
|
||||
} else {
|
||||
printf("\n");
|
||||
time_speed = 0;
|
||||
}
|
||||
printf("hash %.3f sec, %.0f MB/s\n", time_sec, time_speed);
|
||||
#endif
|
||||
|
||||
error_end:
|
||||
@@ -700,7 +705,7 @@ int set_common_packet_hash(
|
||||
__int64 prog_now = 0;
|
||||
|
||||
#ifdef TIMER
|
||||
unsigned int time_start = GetTickCount();
|
||||
clock_t time_start = clock();
|
||||
#endif
|
||||
print_progress_text(0, "Computing file hash");
|
||||
|
||||
@@ -740,8 +745,8 @@ unsigned int time_start = GetTickCount();
|
||||
print_progress_done(); // 改行して行の先頭に戻しておく
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount() - time_start;
|
||||
printf("hash %d.%03d sec\n", time_start / 1000, time_start % 1000);
|
||||
time_start = clock() - time_start;
|
||||
printf("hash %.3f sec\n", (double)time_start / CLOCKS_PER_SEC);
|
||||
#endif
|
||||
return 0;
|
||||
}
|
||||
@@ -1065,7 +1070,7 @@ int create_recovery_file(
|
||||
#endif
|
||||
|
||||
#ifdef TIMER
|
||||
unsigned int time_start = GetTickCount();
|
||||
clock_t time_start = clock();
|
||||
#endif
|
||||
print_progress_text(0, "Constructing recovery file");
|
||||
time_last = GetTickCount();
|
||||
@@ -1258,8 +1263,8 @@ unsigned int time_start = GetTickCount();
|
||||
print_progress_done(); // 改行して行の先頭に戻しておく
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount() - time_start;
|
||||
printf("write %d.%03d sec\n", time_start / 1000, time_start % 1000);
|
||||
time_start = clock() - time_start;
|
||||
printf("write %.3f sec\n", (double)time_start / CLOCKS_PER_SEC);
|
||||
#endif
|
||||
|
||||
return 0;
|
||||
@@ -1824,10 +1829,12 @@ int split_files(
|
||||
}
|
||||
if (ext_len > 0){ // 全て数字の拡張子を持つソース・ファイルがあるなら
|
||||
//printf_cp("\n risky name = %s \n", file_name);
|
||||
wcscpy(file_path, file_name); // 比較用に拡張子を取り除く
|
||||
file_path[name_len] = 0;
|
||||
for (num2 = 0; num2 < file_num; num2++){
|
||||
if (num2 == num)
|
||||
continue;
|
||||
if (_wcsnicmp(list_buf + files[num2].name, file_name, name_len) == 0){
|
||||
if (_wcsicmp(list_buf + files[num2].name, file_path) == 0){
|
||||
//printf_cp(" match name = %s \n", list_buf + files[num2].name);
|
||||
num8 = (files[num2].size + (__int64)split_size - 1) / split_size;
|
||||
split_max = (int)num8;
|
||||
@@ -1838,7 +1845,7 @@ int split_files(
|
||||
if (((split_max < 1000) && (ext_len >= 5)) || ((split_max < 10000) && (ext_len >= 6)))
|
||||
continue; // 拡張子の桁数が異なる
|
||||
// 上書きする危険性があるのでエラーにする
|
||||
printf_cp("split bad file, %s\n", file_name);
|
||||
printf_cp("split bad file, %s\n", list_buf + files[num2].name);
|
||||
*cur_num = -1;
|
||||
*cur_id = 0;
|
||||
return 1;
|
||||
|
||||
@@ -2795,7 +2795,7 @@ void galois_align_xor(
|
||||
void galois_align16_multiply(
|
||||
unsigned char *r1, // Region to multiply (must be aligned by 16)
|
||||
unsigned char *r2, // Products go here
|
||||
unsigned int len, // Byte length (must be multiple of 32)
|
||||
unsigned int len, // Byte length (must be multiple of 16)
|
||||
int factor) // Number to multiply by
|
||||
{
|
||||
if (factor <= 1){
|
||||
|
||||
@@ -6,7 +6,7 @@ extern "C" {
|
||||
#endif
|
||||
|
||||
|
||||
extern unsigned short *galois_log_table;
|
||||
//extern unsigned short *galois_log_table;
|
||||
extern unsigned int cpu_flag;
|
||||
|
||||
int galois_create_table(void); // Returns 0 on success, -1 on failure
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
// lib_opencl.c
|
||||
// Copyright : 2023-10-22 Yutaka Sawada
|
||||
// Copyright : 2024-01-21 Yutaka Sawada
|
||||
// License : GPL
|
||||
|
||||
#ifndef _WIN32_WINNT
|
||||
@@ -75,7 +75,7 @@ typedef cl_int (CL_API_CALL *API_clEnqueueNDRangeKernel)(cl_command_queue, cl_ke
|
||||
extern unsigned int cpu_flag; // declared in common2.h
|
||||
extern int cpu_num;
|
||||
|
||||
#define MAX_DEVICE 3
|
||||
#define MAX_DEVICE 8
|
||||
|
||||
HMODULE hLibOpenCL = NULL;
|
||||
|
||||
@@ -84,7 +84,7 @@ cl_command_queue OpenCL_command = NULL;
|
||||
cl_kernel OpenCL_kernel = NULL;
|
||||
cl_mem OpenCL_src = NULL, OpenCL_dst = NULL, OpenCL_buf = NULL;
|
||||
size_t OpenCL_group_num;
|
||||
int OpenCL_method = 0; // 正=速い機器を選ぶ, 負=遅い機器を選ぶ
|
||||
int OpenCL_method = 0; // 標準では GPU を使わず、動作は自動選択される
|
||||
|
||||
API_clCreateBuffer gfn_clCreateBuffer;
|
||||
API_clReleaseMemObject gfn_clReleaseMemObject;
|
||||
@@ -100,7 +100,11 @@ API_clEnqueueNDRangeKernel gfn_clEnqueueNDRangeKernel;
|
||||
|
||||
/*
|
||||
入力
|
||||
OpenCL_method : どのデバイスを選ぶか
|
||||
OpenCL_method : どのデバイスや関数を選ぶか
|
||||
0x100 = 速い機器を選ぶ, 0x200 = 遅い機器を選ぶ
|
||||
0x10000 = 1ブロックずつ計算する, 0x20000 = 2ブロックずつ計算しようとする
|
||||
0x40000 = 4-byte memory access, 0x80000 = try 16-byte memory access
|
||||
0x100000 = CL_MEM_COPY_HOST_PTR, 0x200000 = CL_MEM_USE_HOST_PTR
|
||||
unit_size : ブロックの単位サイズ
|
||||
src_max : ソース・ブロック個数
|
||||
|
||||
@@ -111,11 +115,12 @@ OpenCL_method : 動作フラグいろいろ
|
||||
*/
|
||||
|
||||
// 0=成功, 1~エラー番号
|
||||
int init_OpenCL(int unit_size, int *src_max)
|
||||
int init_OpenCL(unsigned int unit_size, int *src_max)
|
||||
{
|
||||
char buf[2048], *p_source;
|
||||
int err = 0, i, j;
|
||||
int gpu_power, count;
|
||||
int unified_memory; // non zero = Integrated GPU
|
||||
size_t data_size, alloc_max;
|
||||
//FILE *fp;
|
||||
HRSRC res;
|
||||
@@ -136,6 +141,7 @@ int init_OpenCL(int unit_size, int *src_max)
|
||||
API_clReleaseProgram fn_clReleaseProgram;
|
||||
API_clCreateKernel fn_clCreateKernel;
|
||||
API_clGetKernelWorkGroupInfo fn_clGetKernelWorkGroupInfo;
|
||||
API_clReleaseKernel fn_clReleaseKernel;
|
||||
cl_int ret;
|
||||
cl_uint num_platforms = 0, num_devices = 0, num_groups, param_value;
|
||||
cl_ulong param_value8;
|
||||
@@ -215,6 +221,9 @@ int init_OpenCL(int unit_size, int *src_max)
|
||||
fn_clGetKernelWorkGroupInfo = (API_clGetKernelWorkGroupInfo)GetProcAddress(hLibOpenCL, "clGetKernelWorkGroupInfo");
|
||||
if (fn_clGetKernelWorkGroupInfo == NULL)
|
||||
return err;
|
||||
fn_clReleaseKernel = (API_clReleaseKernel)GetProcAddress(hLibOpenCL, "clReleaseKernel");
|
||||
if (fn_clReleaseKernel == NULL)
|
||||
return err;
|
||||
gfn_clFinish = (API_clFinish)GetProcAddress(hLibOpenCL, "clFinish");
|
||||
if (gfn_clFinish == NULL)
|
||||
return err;
|
||||
@@ -226,12 +235,12 @@ int init_OpenCL(int unit_size, int *src_max)
|
||||
ret = fn_clGetPlatformIDs(MAX_DEVICE, platform_id, &num_platforms);
|
||||
if (ret != CL_SUCCESS)
|
||||
return (ret << 8) | 10;
|
||||
if (OpenCL_method >= 0){ // 選択する順序と初期値を変える
|
||||
OpenCL_method = 1;
|
||||
gpu_power = 0;
|
||||
} else {
|
||||
OpenCL_method = -1;
|
||||
if (num_platforms > MAX_DEVICE)
|
||||
num_platforms = MAX_DEVICE;
|
||||
if (OpenCL_method & 0x200){ // 選択する順序と初期値を変える
|
||||
gpu_power = INT_MIN;
|
||||
} else {
|
||||
gpu_power = 0;
|
||||
}
|
||||
alloc_max = 0;
|
||||
|
||||
@@ -247,6 +256,8 @@ int init_OpenCL(int unit_size, int *src_max)
|
||||
// 環境内の OpenCL 対応機器の数
|
||||
if (fn_clGetDeviceIDs(platform_id[i], CL_DEVICE_TYPE_GPU, MAX_DEVICE, device_id, &num_devices) != CL_SUCCESS)
|
||||
continue;
|
||||
if (num_devices > MAX_DEVICE)
|
||||
num_devices = MAX_DEVICE;
|
||||
|
||||
for (j = 0; j < (int)num_devices; j++){
|
||||
// デバイスが利用可能か確かめる
|
||||
@@ -265,45 +276,42 @@ int init_OpenCL(int unit_size, int *src_max)
|
||||
ret = fn_clGetDeviceInfo(device_id[j], CL_DEVICE_VERSION, sizeof(buf), buf, NULL);
|
||||
if (ret == CL_SUCCESS)
|
||||
printf("Device version = %s\n", buf);
|
||||
ret = fn_clGetDeviceInfo(device_id[j], CL_DEVICE_LOCAL_MEM_SIZE, sizeof(cl_ulong), ¶m_value8, NULL);
|
||||
if (ret == CL_SUCCESS)
|
||||
printf("LOCAL_MEM_SIZE = %I64d KB\n", param_value8 >> 10);
|
||||
|
||||
// 無理とは思うけど、一応チェックする
|
||||
//#define CL_DEVICE_SVM_CAPABILITIES 0x1053
|
||||
//#define CL_DEVICE_SVM_COARSE_GRAIN_BUFFER (1 << 0)
|
||||
//#define CL_DEVICE_SVM_FINE_GRAIN_BUFFER (1 << 1)
|
||||
//#define CL_DEVICE_SVM_FINE_GRAIN_SYSTEM (1 << 2)
|
||||
//#define CL_DEVICE_SVM_ATOMICS (1 << 3)
|
||||
// ret = fn_clGetDeviceInfo(device_id[j], CL_DEVICE_SVM_CAPABILITIES, sizeof(cl_ulong), ¶m_value8, NULL);
|
||||
// if (ret == CL_INVALID_VALUE)
|
||||
// printf("Shared Virtual Memory is not supported\n");
|
||||
// if (ret == CL_SUCCESS)
|
||||
// printf("Shared Virtual Memory = 0x%I64X\n", param_value8);
|
||||
#endif
|
||||
|
||||
// 取得できなくてもエラーにしない
|
||||
ret = fn_clGetDeviceInfo(device_id[j], CL_DEVICE_HOST_UNIFIED_MEMORY, sizeof(cl_uint), ¶m_value, NULL);
|
||||
if (ret == CL_SUCCESS){
|
||||
if (param_value != 0){
|
||||
#ifdef DEBUG_OUTPUT
|
||||
printf("HOST_UNIFIED_MEMORY = %d\n", param_value);
|
||||
#endif
|
||||
param_value = 1;
|
||||
}
|
||||
} else { // CL_DEVICE_HOST_UNIFIED_MEMORY は OpenCL 2.0 以降で非推奨になった
|
||||
param_value = 0;
|
||||
}
|
||||
|
||||
// 取得できない場合はエラーにする
|
||||
ret = fn_clGetDeviceInfo(device_id[j], CL_DEVICE_MAX_MEM_ALLOC_SIZE, sizeof(cl_ulong), ¶m_value8, NULL);
|
||||
if (ret != CL_SUCCESS)
|
||||
continue;
|
||||
#ifdef DEBUG_OUTPUT
|
||||
printf("MAX_MEM_ALLOC_SIZE = %I64d MB\n", param_value8 >> 20);
|
||||
#endif
|
||||
|
||||
ret = fn_clGetDeviceInfo(device_id[j], CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(cl_uint), &num_groups, NULL);
|
||||
if (ret != CL_SUCCESS)
|
||||
continue;
|
||||
ret = fn_clGetDeviceInfo(device_id[j], CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(size_t), &data_size, NULL);
|
||||
if (ret != CL_SUCCESS)
|
||||
continue;
|
||||
// CL_DEVICE_HOST_UNIFIED_MEMORY は OpenCL 2.0 以降で非推奨になったので、参照しない
|
||||
|
||||
#ifdef DEBUG_OUTPUT
|
||||
printf("MAX_COMPUTE_UNITS = %d\n", num_groups);
|
||||
printf("MAX_WORK_GROUP_SIZE = %zd\n", data_size);
|
||||
#endif
|
||||
// MAX_COMPUTE_UNITS * MAX_WORK_GROUP_SIZE で計算力を測る
|
||||
count = (int)data_size * num_groups;
|
||||
count *= OpenCL_method; // 符号を変える
|
||||
// MAX_COMPUTE_UNITS * MAX_WORK_GROUP_SIZE で計算力を測る、外付けGPUなら値を倍にする
|
||||
count = (2 - param_value) * (int)data_size * num_groups;
|
||||
if (OpenCL_method & 0x200) // Prefer slower device
|
||||
count *= -1; // 符号を変える
|
||||
//printf("prev = %d, now = %d\n", gpu_power, count);
|
||||
if ((count > gpu_power) && (data_size >= 256) && // 256以上ないとテーブルを作れない
|
||||
(param_value8 / 8 > (cl_ulong)unit_size)){ // CL_DEVICE_MAX_MEM_ALLOC_SIZE に収まるか
|
||||
@@ -312,6 +320,7 @@ int init_OpenCL(int unit_size, int *src_max)
|
||||
selected_platform = platform_id[i];
|
||||
OpenCL_group_num = num_groups; // ワークグループ数は COMPUTE_UNITS 数にする
|
||||
alloc_max = (size_t)param_value8;
|
||||
unified_memory = param_value; // 0 = discrete GPU, 1 = integrated GPU
|
||||
|
||||
// AMD や Intel の GPU ではメモリー領域が全体の 1/4 とは限らない
|
||||
ret = fn_clGetDeviceInfo(device_id[j], CL_DEVICE_GLOBAL_MEM_SIZE, sizeof(cl_ulong), ¶m_value8, NULL);
|
||||
@@ -353,31 +362,6 @@ int init_OpenCL(int unit_size, int *src_max)
|
||||
if (ret != CL_SUCCESS)
|
||||
return (ret << 8) | 12;
|
||||
|
||||
// 計算方式を選択する
|
||||
if ((((cpu_flag & 0x101) == 1) || ((cpu_flag & 0x110) == 0x10)) && (sse_unit == 32)){
|
||||
OpenCL_method = 2; // SSSE3 & ALTMAP または AVX2 ならデータの並び替え対応版を使う
|
||||
} else if (((cpu_flag & 128) != 0) && (sse_unit == 256)){
|
||||
OpenCL_method = 4; // JIT(SSE2) は bit ごとに上位から 16バイトずつ並ぶ
|
||||
// ローカルのテーブルサイズが異なることに注意
|
||||
// XOR 方式以外は 2KB (4バイト * 256項目 * 2個) 使う
|
||||
// XOR (JIT) は 64バイト (4バイト * 16項目) 使う
|
||||
} else {
|
||||
OpenCL_method = 1; // 並び替えられてないデータ用
|
||||
}
|
||||
|
||||
// work group 数が必要以上に多い場合は減らす
|
||||
if (OpenCL_method == 2){
|
||||
// work item 一個が 8バイトずつ計算する、256個なら work group ごとに 2KB 担当する
|
||||
data_size = unit_size / 2048;
|
||||
} else {
|
||||
// work item 一個が 4バイトずつ計算する、256個なら work group ごとに 1KB 担当する
|
||||
data_size = unit_size / 1024;
|
||||
}
|
||||
if (OpenCL_group_num > data_size){
|
||||
OpenCL_group_num = data_size;
|
||||
printf("Number of work groups is reduced to %zd\n", OpenCL_group_num);
|
||||
}
|
||||
|
||||
// 最大で何ブロック分のメモリー領域を保持できるのか(ここではまだ確保しない)
|
||||
// 後で実際に確保する量はこれよりも少なくなる
|
||||
count = (int)(alloc_max / unit_size); // 確保できるメモリー量から逆算する
|
||||
@@ -389,25 +373,6 @@ int init_OpenCL(int unit_size, int *src_max)
|
||||
printf("src buf : %zd KB (%d blocks), possible\n", data_size >> 10, count);
|
||||
#endif
|
||||
|
||||
// 出力先は1ブロック分だけあればいい
|
||||
// CL_MEM_ALLOC_HOST_PTRを使えばpinned memoryになるらしい
|
||||
data_size = unit_size;
|
||||
OpenCL_dst = gfn_clCreateBuffer(OpenCL_context, CL_MEM_WRITE_ONLY | CL_MEM_ALLOC_HOST_PTR, data_size, NULL, &ret);
|
||||
if (ret != CL_SUCCESS)
|
||||
return (ret << 8) | 13;
|
||||
#ifdef DEBUG_OUTPUT
|
||||
printf("dst buf : %zd KB (%zd Bytes), OK\n", data_size >> 10, data_size);
|
||||
#endif
|
||||
|
||||
// factor は最大個数分 (src_max個)
|
||||
data_size = sizeof(unsigned short) * (*src_max);
|
||||
OpenCL_buf = gfn_clCreateBuffer(OpenCL_context, CL_MEM_READ_ONLY, data_size, NULL, &ret);
|
||||
if (ret != CL_SUCCESS)
|
||||
return (ret << 8) | 14;
|
||||
#ifdef DEBUG_OUTPUT
|
||||
printf("factor buf : %zd Bytes (%d factors), OK\n", data_size, (*src_max));
|
||||
#endif
|
||||
|
||||
/*
|
||||
// テキスト形式の OpenCL C ソース・コードを読み込む
|
||||
err = 4;
|
||||
@@ -508,18 +473,208 @@ int init_OpenCL(int unit_size, int *src_max)
|
||||
return (ret << 8) | 21;
|
||||
}
|
||||
|
||||
// カーネル関数を抽出する
|
||||
wsprintfA(buf, "method%d", OpenCL_method & 7);
|
||||
OpenCL_kernel = fn_clCreateKernel(program, buf, &ret);
|
||||
if (ret != CL_SUCCESS)
|
||||
return (ret << 8) | 22;
|
||||
// 計算方式を選択する
|
||||
if ((((cpu_flag & 0x101) == 1) || ((cpu_flag & 0x110) == 0x10)) && (sse_unit == 32)){
|
||||
int select_method; // SSSE3 & ALTMAP または AVX2 ならデータの並び替え対応版を使う
|
||||
if (OpenCL_method & 0x80000){ // 16-byte and 2 blocks
|
||||
select_method = 12;
|
||||
} else if (OpenCL_method & 0x40000){ // 4-byte and 2 blocks
|
||||
select_method = 10;
|
||||
} else if (OpenCL_method & 0x20000){ // 16-byte
|
||||
select_method = 4;
|
||||
} else if (OpenCL_method & 0x10000){ // 4-byte
|
||||
select_method = 2;
|
||||
} else { // kernel を作って詳細を確かめる
|
||||
size_t item2, item4, item10, item12;
|
||||
cl_kernel kernel2, kernel4, kernel10, kernel12;
|
||||
item2 = item4 = item10 = item12 = 0;
|
||||
// まずは一番重くて速い奴を調べる
|
||||
wsprintfA(buf, "method%d", 12);
|
||||
kernel12 = fn_clCreateKernel(program, buf, &ret);
|
||||
if (ret == CL_SUCCESS){
|
||||
ret = fn_clGetKernelWorkGroupInfo(kernel12, selected_device, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, sizeof(size_t), &item12, NULL);
|
||||
if (ret == CL_SUCCESS){
|
||||
#ifdef DEBUG_OUTPUT
|
||||
printf("CreateKernel : %s\n", buf);
|
||||
printf("\nTesting %s\n", buf);
|
||||
printf("PREFERRED_WORK_GROUP_SIZE_MULTIPLE = %zu\n", item12);
|
||||
#endif
|
||||
}
|
||||
}
|
||||
if (item12 >= 32){ // 32以上あれば余裕で動くとみなす
|
||||
select_method = 12;
|
||||
OpenCL_kernel = kernel12;
|
||||
#ifdef DEBUG_OUTPUT
|
||||
printf("\nSelected method%d\n", select_method);
|
||||
#endif
|
||||
} else { // 他の奴と比較する
|
||||
wsprintfA(buf, "method%d", 2);
|
||||
kernel2 = fn_clCreateKernel(program, buf, &ret);
|
||||
if (ret == CL_SUCCESS){
|
||||
ret = fn_clGetKernelWorkGroupInfo(kernel2, selected_device, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, sizeof(size_t), &item2, NULL);
|
||||
if (ret == CL_SUCCESS){
|
||||
#ifdef DEBUG_OUTPUT
|
||||
printf("\nTesting %s\n", buf);
|
||||
printf("PREFERRED_WORK_GROUP_SIZE_MULTIPLE = %zu\n", item2);
|
||||
#endif
|
||||
}
|
||||
}
|
||||
if (item12 >= item2){
|
||||
select_method = 12;
|
||||
OpenCL_kernel = kernel12;
|
||||
ret = fn_clReleaseKernel(kernel2);
|
||||
#ifdef DEBUG_OUTPUT
|
||||
if (ret != CL_SUCCESS)
|
||||
printf("clReleaseKernel : Failed\n");
|
||||
printf("\nSelected method%d\n", select_method);
|
||||
#endif
|
||||
} else {
|
||||
ret = fn_clReleaseKernel(kernel12);
|
||||
#ifdef DEBUG_OUTPUT
|
||||
if (ret != CL_SUCCESS)
|
||||
printf("clReleaseKernel : Failed\n");
|
||||
#endif
|
||||
wsprintfA(buf, "method%d", 10);
|
||||
kernel10 = fn_clCreateKernel(program, buf, &ret);
|
||||
if (ret == CL_SUCCESS){
|
||||
ret = fn_clGetKernelWorkGroupInfo(kernel10, selected_device, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, sizeof(size_t), &item10, NULL);
|
||||
if (ret == CL_SUCCESS){
|
||||
#ifdef DEBUG_OUTPUT
|
||||
printf("\nTesting %s\n", buf);
|
||||
printf("PREFERRED_WORK_GROUP_SIZE_MULTIPLE = %zu\n", item10);
|
||||
#endif
|
||||
}
|
||||
}
|
||||
if (item10 >= item2){
|
||||
select_method = 10;
|
||||
OpenCL_kernel = kernel10;
|
||||
ret = fn_clReleaseKernel(kernel2);
|
||||
#ifdef DEBUG_OUTPUT
|
||||
if (ret != CL_SUCCESS)
|
||||
printf("clReleaseKernel : Failed\n");
|
||||
printf("\nSelected method%d\n", select_method);
|
||||
#endif
|
||||
} else {
|
||||
wsprintfA(buf, "method%d", 4);
|
||||
kernel4 = fn_clCreateKernel(program, buf, &ret);
|
||||
if (ret == CL_SUCCESS){
|
||||
ret = fn_clGetKernelWorkGroupInfo(kernel4, selected_device, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, sizeof(size_t), &item4, NULL);
|
||||
if (ret == CL_SUCCESS){
|
||||
#ifdef DEBUG_OUTPUT
|
||||
printf("\nTesting %s\n", buf);
|
||||
printf("PREFERRED_WORK_GROUP_SIZE_MULTIPLE = %zu\n", item4);
|
||||
#endif
|
||||
}
|
||||
}
|
||||
if (item4 >= item2){
|
||||
select_method = 4;
|
||||
OpenCL_kernel = kernel4;
|
||||
ret = fn_clReleaseKernel(kernel2);
|
||||
#ifdef DEBUG_OUTPUT
|
||||
if (ret != CL_SUCCESS)
|
||||
printf("clReleaseKernel : Failed\n");
|
||||
printf("\nSelected method%d\n", select_method);
|
||||
#endif
|
||||
} else {
|
||||
select_method = 2;
|
||||
OpenCL_kernel = kernel2;
|
||||
ret = fn_clReleaseKernel(kernel4);
|
||||
#ifdef DEBUG_OUTPUT
|
||||
if (ret != CL_SUCCESS)
|
||||
printf("clReleaseKernel : Failed\n");
|
||||
printf("\nSelected method%d\n", select_method);
|
||||
#endif
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
OpenCL_method |= select_method;
|
||||
} else if (((cpu_flag & 128) != 0) && (sse_unit == 256)){
|
||||
OpenCL_method |= 16; // JIT(SSE2) は bit ごとに上位から 16バイトずつ並ぶ
|
||||
// ローカルのテーブルサイズが異なることに注意
|
||||
// XOR 方式以外は 2KB (4バイト * 256項目 * 2個) 使う
|
||||
// XOR (JIT) は 64バイト (4バイト * 16項目) 使う
|
||||
} else {
|
||||
int select_method; // 並び替えられてないデータ用
|
||||
if (OpenCL_method & 0x40000){ // 4-byte and 2 blocks
|
||||
select_method = 9;
|
||||
} else if (OpenCL_method & 0x10000){ // 4-byte
|
||||
select_method = 1;
|
||||
} else { // kernel を作って詳細を確かめる
|
||||
size_t item1, item9;
|
||||
cl_kernel kernel1, kernel9;
|
||||
item1 = item9 = 0;
|
||||
// まずは一番重くて速い奴を調べる
|
||||
wsprintfA(buf, "method%d", 9);
|
||||
kernel9 = fn_clCreateKernel(program, buf, &ret);
|
||||
if (ret == CL_SUCCESS){
|
||||
ret = fn_clGetKernelWorkGroupInfo(kernel9, selected_device, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, sizeof(size_t), &item9, NULL);
|
||||
if (ret == CL_SUCCESS){
|
||||
#ifdef DEBUG_OUTPUT
|
||||
printf("\nTesting %s\n", buf);
|
||||
printf("PREFERRED_WORK_GROUP_SIZE_MULTIPLE = %zu\n", item9);
|
||||
#endif
|
||||
}
|
||||
}
|
||||
if (item9 >= 32){ // 32以上あれば余裕で動くとみなす
|
||||
select_method = 9;
|
||||
OpenCL_kernel = kernel9;
|
||||
#ifdef DEBUG_OUTPUT
|
||||
printf("\nSelected method%d\n", select_method);
|
||||
#endif
|
||||
} else { // 他の奴と比較する
|
||||
wsprintfA(buf, "method%d", 1);
|
||||
kernel1 = fn_clCreateKernel(program, buf, &ret);
|
||||
if (ret == CL_SUCCESS){
|
||||
ret = fn_clGetKernelWorkGroupInfo(kernel1, selected_device, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, sizeof(size_t), &item1, NULL);
|
||||
if (ret == CL_SUCCESS){
|
||||
#ifdef DEBUG_OUTPUT
|
||||
printf("\nTesting %s\n", buf);
|
||||
printf("PREFERRED_WORK_GROUP_SIZE_MULTIPLE = %zu\n", item1);
|
||||
#endif
|
||||
}
|
||||
}
|
||||
if (item9 >= item1){
|
||||
select_method = 9;
|
||||
OpenCL_kernel = kernel9;
|
||||
ret = fn_clReleaseKernel(kernel1);
|
||||
#ifdef DEBUG_OUTPUT
|
||||
if (ret != CL_SUCCESS)
|
||||
printf("clReleaseKernel : Failed\n");
|
||||
printf("\nSelected method%d\n", select_method);
|
||||
#endif
|
||||
} else {
|
||||
select_method = 1;
|
||||
OpenCL_kernel = kernel1;
|
||||
ret = fn_clReleaseKernel(kernel9);
|
||||
#ifdef DEBUG_OUTPUT
|
||||
if (ret != CL_SUCCESS)
|
||||
printf("clReleaseKernel : Failed\n");
|
||||
printf("\nSelected method%d\n", select_method);
|
||||
#endif
|
||||
}
|
||||
}
|
||||
}
|
||||
OpenCL_method |= select_method;
|
||||
}
|
||||
|
||||
// カーネル関数を抽出する
|
||||
if (OpenCL_kernel == NULL){
|
||||
wsprintfA(buf, "method%d", OpenCL_method & 31);
|
||||
OpenCL_kernel = fn_clCreateKernel(program, buf, &ret);
|
||||
if (ret != CL_SUCCESS)
|
||||
return (ret << 8) | 22;
|
||||
#ifdef DEBUG_OUTPUT
|
||||
printf("CreateKernel : %s\n", buf);
|
||||
ret = fn_clGetKernelWorkGroupInfo(OpenCL_kernel, selected_device, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, sizeof(size_t), &data_size, NULL);
|
||||
if (ret == CL_SUCCESS)
|
||||
printf("PREFERRED_WORK_GROUP_SIZE_MULTIPLE = %zu\n", data_size);
|
||||
#endif
|
||||
}
|
||||
|
||||
// カーネルが実行できる work item 数を調べる
|
||||
ret = fn_clGetKernelWorkGroupInfo(OpenCL_kernel, NULL, CL_KERNEL_WORK_GROUP_SIZE, sizeof(size_t), &data_size, NULL);
|
||||
if ((ret == CL_SUCCESS) && (data_size < 256)){ // 最低でも 256以上は必要
|
||||
ret = fn_clGetKernelWorkGroupInfo(OpenCL_kernel, selected_device, CL_KERNEL_WORK_GROUP_SIZE, sizeof(size_t), &data_size, NULL);
|
||||
if ((ret == CL_SUCCESS) && (data_size < 256)){ // 最低でも 256 以上は必要
|
||||
#ifdef DEBUG_OUTPUT
|
||||
printf("KERNEL_WORK_GROUP_SIZE = %zd\n", data_size);
|
||||
#endif
|
||||
@@ -538,6 +693,60 @@ int init_OpenCL(int unit_size, int *src_max)
|
||||
fn_clUnloadCompiler();
|
||||
}
|
||||
|
||||
// work group 数が必要以上に多い場合は減らす
|
||||
if (OpenCL_method & 4){
|
||||
// work item 一個が 32バイトずつ計算する、256個なら work group ごとに 8KB 担当する
|
||||
data_size = unit_size / 8192;
|
||||
} else if (OpenCL_method & 2){
|
||||
// work item 一個が 8バイトずつ計算する、256個なら work group ごとに 2KB 担当する
|
||||
data_size = unit_size / 2048;
|
||||
} else {
|
||||
// work item 一個が 4バイトずつ計算する、256個なら work group ごとに 1KB 担当する
|
||||
data_size = unit_size / 1024;
|
||||
}
|
||||
if (OpenCL_group_num > data_size){
|
||||
OpenCL_group_num = data_size;
|
||||
printf("Number of work groups is reduced to %zd\n", OpenCL_group_num);
|
||||
}
|
||||
|
||||
// データへのアクセス方法をデバイスによって変える
|
||||
if (OpenCL_method & 0x200000){
|
||||
OpenCL_method |= 32;
|
||||
} else if ((OpenCL_method & 0x100000) == 0){
|
||||
if (unified_memory){
|
||||
OpenCL_method |= 32; // Integrated GPU なら CL_MEM_USE_HOST_PTR を使う
|
||||
} else { // Discrete GPU でも Nvidia のは動作を変える
|
||||
ret = fn_clGetDeviceInfo(selected_device, CL_DEVICE_VERSION, sizeof(buf), buf, NULL);
|
||||
if (ret == CL_SUCCESS){
|
||||
if (strstr(buf, "CUDA") != NULL)
|
||||
OpenCL_method |= 32; // NVIDIA GPU なら CL_MEM_USE_HOST_PTR を使う
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// 出力先は1ブロック分だけあればいい
|
||||
// CL_MEM_ALLOC_HOST_PTRを使えばpinned memoryになるらしい
|
||||
data_size = unit_size;
|
||||
if (OpenCL_method & 8)
|
||||
data_size *= 2; // 2ブロックずつ計算できるように、2倍確保しておく
|
||||
OpenCL_dst = gfn_clCreateBuffer(OpenCL_context, CL_MEM_ALLOC_HOST_PTR, data_size, NULL, &ret);
|
||||
if (ret != CL_SUCCESS)
|
||||
return (ret << 8) | 13;
|
||||
#ifdef DEBUG_OUTPUT
|
||||
printf("dst buf : %zd KB (%zd Bytes), OK\n", data_size >> 10, data_size);
|
||||
#endif
|
||||
|
||||
// factor は最大個数分 (src_max個)
|
||||
data_size = sizeof(unsigned short) * (*src_max);
|
||||
if (OpenCL_method & 8)
|
||||
data_size *= 2; // 2ブロックずつ計算できるように、2倍確保しておく
|
||||
OpenCL_buf = gfn_clCreateBuffer(OpenCL_context, CL_MEM_READ_ONLY, data_size, NULL, &ret);
|
||||
if (ret != CL_SUCCESS)
|
||||
return (ret << 8) | 14;
|
||||
#ifdef DEBUG_OUTPUT
|
||||
printf("factor buf : %zd Bytes (%d factors), OK\n", data_size, (*src_max));
|
||||
#endif
|
||||
|
||||
// カーネル引数を指定する
|
||||
ret = gfn_clSetKernelArg(OpenCL_kernel, 1, sizeof(cl_mem), &OpenCL_dst);
|
||||
if (ret != CL_SUCCESS)
|
||||
@@ -545,13 +754,12 @@ int init_OpenCL(int unit_size, int *src_max)
|
||||
ret = gfn_clSetKernelArg(OpenCL_kernel, 2, sizeof(cl_mem), &OpenCL_buf);
|
||||
if (ret != CL_SUCCESS)
|
||||
return (ret << 8) | 102;
|
||||
if (ret != CL_SUCCESS)
|
||||
return (ret << 8) | 103;
|
||||
|
||||
#ifdef DEBUG_OUTPUT
|
||||
// ワークアイテム数
|
||||
printf("\nMax number of work items = %zd (256 * %zd)\n", OpenCL_group_num * 256, OpenCL_group_num);
|
||||
#endif
|
||||
OpenCL_method &= 0xFF; // 最後に選択設定を消去する
|
||||
|
||||
return 0;
|
||||
}
|
||||
@@ -663,16 +871,24 @@ void info_OpenCL(char *buf, int buf_size)
|
||||
// ソース・ブロックをデバイス側にコピーする
|
||||
int gpu_copy_blocks(
|
||||
unsigned char *data, // ブロックのバッファー (境界は 4096にすること)
|
||||
int unit_size, // 4096の倍数にすること
|
||||
unsigned int unit_size, // 4096の倍数にすること
|
||||
int src_num) // 何ブロックをコピーするのか
|
||||
{
|
||||
size_t data_size;
|
||||
cl_int ret;
|
||||
cl_mem_flags flags;
|
||||
|
||||
// Integrated GPU と Discrete GPU の違いに関係なく、使う分だけ毎回メモリー領域を確保する
|
||||
data_size = (size_t)unit_size * src_num;
|
||||
// Intel GPUならZeroCopyできる、GeForce GPUでもメモリー消費量が少なくてコピーが速い
|
||||
OpenCL_src = gfn_clCreateBuffer(OpenCL_context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, data_size, data, &ret);
|
||||
if (OpenCL_method & 32){ // AMD's APU や Integrated GPU なら ZeroCopy する
|
||||
// 実際に比較してみると GeForce GPU でもメモリー消費量が少なくてコピーが速い
|
||||
// NVIDIA GPU は CL_MEM_USE_HOST_PTR でも VRAM 上にキャッシュするので速いらしい
|
||||
flags = CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR;
|
||||
} else { // Discrete GPU ならデータを VRAM にコピーする
|
||||
// AMD GPU は明示的にコピーするよう指定しないといけない
|
||||
flags = CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR;
|
||||
}
|
||||
OpenCL_src = gfn_clCreateBuffer(OpenCL_context, flags, data_size, data, &ret);
|
||||
if (ret != CL_SUCCESS)
|
||||
return (ret << 8) | 1;
|
||||
#ifdef DEBUG_OUTPUT
|
||||
@@ -691,17 +907,31 @@ int gpu_copy_blocks(
|
||||
int gpu_multiply_blocks(
|
||||
int src_num, // Number of multiplying source blocks
|
||||
unsigned short *mat, // Matrix of numbers to multiply by
|
||||
unsigned short *mat2, // Set to calculate 2 blocks at once
|
||||
unsigned char *buf, // Products go here
|
||||
int len) // Byte length
|
||||
unsigned int len) // Byte length
|
||||
{
|
||||
unsigned __int64 *vram, *src, *dst;
|
||||
size_t global_size, local_size;
|
||||
cl_int ret;
|
||||
|
||||
// 倍率の配列をデバイス側に書き込む
|
||||
ret = gfn_clEnqueueWriteBuffer(OpenCL_command, OpenCL_buf, CL_FALSE, 0, sizeof(short) * src_num, mat, 0, NULL, NULL);
|
||||
if (mat2 == NULL){ // 1ブロック分だけコピーする
|
||||
ret = gfn_clEnqueueWriteBuffer(OpenCL_command, OpenCL_buf, CL_FALSE, 0, sizeof(short) * src_num, mat, 0, NULL, NULL);
|
||||
} else { // 2ブロックずつ計算する場合は、配列のサイズも2倍になる
|
||||
if ((size_t)mat2 == 1){ // アドレスが 1 になることはあり得ないので、識別できる
|
||||
ret = gfn_clEnqueueWriteBuffer(OpenCL_command, OpenCL_buf, CL_FALSE, 0, sizeof(short) * src_num * 2, mat, 0, NULL, NULL);
|
||||
} else { // 2回コピーする
|
||||
size_t data_size = sizeof(short) * src_num;
|
||||
ret = gfn_clEnqueueWriteBuffer(OpenCL_command, OpenCL_buf, CL_FALSE, 0, data_size, mat, 0, NULL, NULL);
|
||||
if (ret != CL_SUCCESS)
|
||||
return (ret << 8) | 10;
|
||||
// もう一つの配列は違う場所からコピーする
|
||||
ret = gfn_clEnqueueWriteBuffer(OpenCL_command, OpenCL_buf, CL_FALSE, data_size, data_size, mat2, 0, NULL, NULL);
|
||||
}
|
||||
}
|
||||
if (ret != CL_SUCCESS)
|
||||
return (ret << 8) | 10;
|
||||
return (ret << 8) | 11;
|
||||
|
||||
// 引数を指定する
|
||||
ret = gfn_clSetKernelArg(OpenCL_kernel, 3, sizeof(int), &src_num);
|
||||
@@ -709,17 +939,17 @@ int gpu_multiply_blocks(
|
||||
return (ret << 8) | 103;
|
||||
|
||||
// カーネル並列実行
|
||||
local_size = 256; // テーブルやキャッシュのため、work item 数は 256に固定する
|
||||
local_size = 256; // テーブルやキャッシュのため、work item 数は 256 に固定する
|
||||
global_size = OpenCL_group_num * 256;
|
||||
//printf("group num = %d, global size = %d, local size = 256 \n", OpenCL_group_num, global_size);
|
||||
//printf("group num = %d, global size = %d, local size = %d \n", OpenCL_group_num, global_size, local_size);
|
||||
ret = gfn_clEnqueueNDRangeKernel(OpenCL_command, OpenCL_kernel, 1, NULL, &global_size, &local_size, 0, NULL, NULL);
|
||||
if (ret != CL_SUCCESS)
|
||||
return (ret << 8) | 11;
|
||||
return (ret << 8) | 12;
|
||||
|
||||
// 出力内容をホスト側に反映させる
|
||||
vram = gfn_clEnqueueMapBuffer(OpenCL_command, OpenCL_dst, CL_TRUE, CL_MAP_READ, 0, len, 0, NULL, NULL, &ret);
|
||||
if (ret != CL_SUCCESS)
|
||||
return (ret << 8) | 12;
|
||||
return (ret << 8) | 13;
|
||||
|
||||
// 8バイトごとに XOR する (SSE2 で XOR しても速くならず)
|
||||
src = vram;
|
||||
@@ -734,7 +964,7 @@ int gpu_multiply_blocks(
|
||||
// ホスト側でデータを変更しなくても、clEnqueueMapBufferと対で呼び出さないといけない
|
||||
ret = gfn_clEnqueueUnmapMemObject(OpenCL_command, OpenCL_dst, vram, 0, NULL, NULL);
|
||||
if (ret != CL_SUCCESS)
|
||||
return (ret << 8) | 13;
|
||||
return (ret << 8) | 14;
|
||||
|
||||
return 0;
|
||||
}
|
||||
@@ -747,12 +977,12 @@ int gpu_finish(void)
|
||||
// 全ての処理が終わるのを待つ
|
||||
ret = gfn_clFinish(OpenCL_command);
|
||||
if (ret != CL_SUCCESS)
|
||||
return (ret << 8) | 20;
|
||||
return (ret << 8) | 30;
|
||||
|
||||
if (OpenCL_src != NULL){ // 確保されてる場合は解除する
|
||||
ret = gfn_clReleaseMemObject(OpenCL_src);
|
||||
if (ret != CL_SUCCESS)
|
||||
return (ret << 8) | 21;
|
||||
return (ret << 8) | 31;
|
||||
OpenCL_src = NULL;
|
||||
}
|
||||
|
||||
|
||||
@@ -10,20 +10,21 @@ extern "C" {
|
||||
|
||||
extern int OpenCL_method;
|
||||
|
||||
int init_OpenCL(int unit_size, int *src_max);
|
||||
int init_OpenCL(unsigned int unit_size, int *src_max);
|
||||
int free_OpenCL(void);
|
||||
void info_OpenCL(char *buf, int buf_size);
|
||||
|
||||
int gpu_copy_blocks(
|
||||
unsigned char *data,
|
||||
int unit_size,
|
||||
unsigned int unit_size,
|
||||
int src_num);
|
||||
|
||||
int gpu_multiply_blocks(
|
||||
int src_num, // Number of multiplying source blocks
|
||||
unsigned short *mat, // Matrix of numbers to multiply by
|
||||
unsigned short *mat2, // Set to calculate 2 blocks at once
|
||||
unsigned char *buf, // Products go here
|
||||
int len); // Byte length
|
||||
unsigned int len); // Byte length
|
||||
|
||||
int gpu_finish(void);
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
// list.c
|
||||
// Copyright : 2023-10-15 Yutaka Sawada
|
||||
// Copyright : 2023-12-12 Yutaka Sawada
|
||||
// License : GPL
|
||||
|
||||
#ifndef _UNICODE
|
||||
@@ -26,6 +26,11 @@
|
||||
|
||||
//#define TIMER // 実験用
|
||||
|
||||
#ifdef TIMER
|
||||
#include <time.h>
|
||||
static double time_sec, time_speed;
|
||||
#endif
|
||||
|
||||
// recovery set のファイルのハッシュ値を調べる (空のファイルは除く)
|
||||
// 0x00 = ファイルが存在して完全である
|
||||
// 0x01 = ファイルが存在しない
|
||||
@@ -296,7 +301,7 @@ int check_file_complete(
|
||||
{
|
||||
int i, rv;
|
||||
#ifdef TIMER
|
||||
unsigned int time_start = GetTickCount();
|
||||
clock_t time_start = clock();
|
||||
#endif
|
||||
|
||||
printf("\nVerifying Input File :\n");
|
||||
@@ -332,14 +337,14 @@ unsigned int time_start = GetTickCount();
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount() - time_start;
|
||||
printf("\n hash %d.%03d sec", time_start / 1000, time_start % 1000);
|
||||
if (time_start > 0){
|
||||
time_start = (int)((total_file_size * 125) / ((__int64)time_start * 131072));
|
||||
printf(", %d MB/s\n", time_start);
|
||||
time_start = clock() - time_start;
|
||||
time_sec = (double)time_start / CLOCKS_PER_SEC;
|
||||
if (time_sec > 0){
|
||||
time_speed = (double)total_file_size / (time_sec * 1048576);
|
||||
} else {
|
||||
printf("\n");
|
||||
time_speed = 0;
|
||||
}
|
||||
printf("\n hash %.3f sec, %.0f MB/s\n", time_sec, time_speed);
|
||||
#endif
|
||||
return 0;
|
||||
}
|
||||
@@ -364,7 +369,7 @@ int check_file_complete_multi(
|
||||
HANDLE hSub[MAX_READ_NUM];
|
||||
FILE_CHECK_TH th[MAX_READ_NUM];
|
||||
#ifdef TIMER
|
||||
unsigned int time_start = GetTickCount();
|
||||
clock_t time_start = clock();
|
||||
#endif
|
||||
|
||||
memset(hSub, 0, sizeof(HANDLE) * MAX_READ_NUM);
|
||||
@@ -630,14 +635,14 @@ unsigned int time_start = GetTickCount();
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount() - time_start;
|
||||
printf("\n hash %d.%03d sec", time_start / 1000, time_start % 1000);
|
||||
if (time_start > 0){
|
||||
time_start = (int)((total_file_size * 125) / ((__int64)time_start * 131072));
|
||||
printf(", %d MB/s\n", time_start);
|
||||
time_start = clock() - time_start;
|
||||
time_sec = (double)time_start / CLOCKS_PER_SEC;
|
||||
if (time_sec > 0){
|
||||
time_speed = (double)total_file_size / (time_sec * 1048576);
|
||||
} else {
|
||||
printf("\n");
|
||||
time_speed = 0;
|
||||
}
|
||||
printf("\n hash %.3f sec, %.0f MB/s\n", time_sec, time_speed);
|
||||
#endif
|
||||
|
||||
error_end:
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
// md5_crc.c
|
||||
// Copyright : 2023-10-29 Yutaka Sawada
|
||||
// Copyright : 2023-12-12 Yutaka Sawada
|
||||
// License : GPL
|
||||
|
||||
#ifndef _UNICODE
|
||||
@@ -21,7 +21,6 @@
|
||||
#include "phmd5.h"
|
||||
#include "md5_crc.h"
|
||||
|
||||
|
||||
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
|
||||
|
||||
// バイト配列の MD5 ハッシュ値を求める
|
||||
@@ -200,8 +199,10 @@ int file_md5_crc32_block(
|
||||
//#define TIMER // 実験用
|
||||
|
||||
#ifdef TIMER
|
||||
static unsigned int time_start, time1_start;
|
||||
static unsigned int time_total = 0, time2_total = 0, time3_total = 0;
|
||||
#include <time.h>
|
||||
static double time_sec, time_speed;
|
||||
static clock_t time_start, time1_start;
|
||||
static clock_t time_total = 0, time2_total = 0, time3_total = 0;
|
||||
#endif
|
||||
|
||||
#define MAX_BUF_SIZE 2097152 // ヒープ領域を使う場合の最大サイズ
|
||||
@@ -224,7 +225,7 @@ int file_hash_crc(
|
||||
HANDLE hFile;
|
||||
OVERLAPPED ol;
|
||||
#ifdef TIMER
|
||||
time1_start = GetTickCount();
|
||||
time1_start = clock();
|
||||
#endif
|
||||
|
||||
// ソース・ファイルを開く
|
||||
@@ -251,11 +252,11 @@ time1_start = GetTickCount();
|
||||
if (file_left < IO_SIZE)
|
||||
read_size = (unsigned int)file_left;
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
off = ReadFile(hFile, buf1, read_size, NULL, &ol);
|
||||
#ifdef TIMER
|
||||
time2_total += GetTickCount() - time_start;
|
||||
time2_total += clock() - time_start;
|
||||
#endif
|
||||
if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
|
||||
print_win32_err();
|
||||
@@ -281,11 +282,11 @@ time2_total += GetTickCount() - time_start;
|
||||
ol.OffsetHigh = (unsigned int)(file_off >> 32);
|
||||
file_off += IO_SIZE;
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
off = ReadFile(hFile, buf, read_size, NULL, &ol);
|
||||
#ifdef TIMER
|
||||
time2_total += GetTickCount() - time_start;
|
||||
time2_total += clock() - time_start;
|
||||
#endif
|
||||
if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
|
||||
print_win32_err();
|
||||
@@ -301,7 +302,7 @@ time2_total += GetTickCount() - time_start;
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
off = 0; // チェックサム計算
|
||||
if (block_left > 0){ // 前回足りなかった分を追加する
|
||||
@@ -338,7 +339,7 @@ time_start = GetTickCount();
|
||||
}
|
||||
}
|
||||
#ifdef TIMER
|
||||
time3_total += GetTickCount() - time_start;
|
||||
time3_total += clock() - time_start;
|
||||
#endif
|
||||
|
||||
// 経過表示
|
||||
@@ -369,16 +370,17 @@ error_end:
|
||||
CloseHandle(ol.hEvent);
|
||||
|
||||
#ifdef TIMER
|
||||
time_total += GetTickCount() - time1_start;
|
||||
time_total += clock() - time1_start;
|
||||
if (*prog_now == total_file_size){
|
||||
printf("\nread %d.%03d sec\n", time2_total / 1000, time2_total % 1000);
|
||||
printf("main %d.%03d sec\n", time3_total / 1000, time3_total % 1000);
|
||||
if (time_total > 0){
|
||||
time_start = (int)((total_file_size * 125) / ((__int64)time_total * 131072));
|
||||
printf("\nread %.3f sec\n", (double)time2_total / CLOCKS_PER_SEC);
|
||||
printf("main %.3f sec\n", (double)time3_total / CLOCKS_PER_SEC);
|
||||
time_sec = (double)time_total / CLOCKS_PER_SEC;
|
||||
if (time_sec > 0){
|
||||
time_speed = (double)total_file_size / (time_sec * 1048576);
|
||||
} else {
|
||||
time_start = 0;
|
||||
time_speed = 0;
|
||||
}
|
||||
printf("total %d.%03d sec, %d MB/s\n", time_total / 1000, time_total % 1000, time_start);
|
||||
printf("total %.3f sec, %.0f MB/s\n", time_sec, time_speed);
|
||||
}
|
||||
#endif
|
||||
return err;
|
||||
@@ -403,7 +405,7 @@ int file_hash_crc(
|
||||
HANDLE hFile;
|
||||
OVERLAPPED ol;
|
||||
#ifdef TIMER
|
||||
time1_start = GetTickCount();
|
||||
time1_start = clock();
|
||||
#endif
|
||||
|
||||
// ソース・ファイルを開く
|
||||
@@ -442,11 +444,11 @@ error_retry_read:
|
||||
if (file_left < IO_SIZE)
|
||||
read_size = (unsigned int)file_left;
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
off = ReadFile(hFile, buf1, read_size, NULL, &ol);
|
||||
#ifdef TIMER
|
||||
time2_total += GetTickCount() - time_start;
|
||||
time2_total += clock() - time_start;
|
||||
#endif
|
||||
if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
|
||||
print_win32_err();
|
||||
@@ -536,11 +538,11 @@ error_retry_pause:
|
||||
ol.OffsetHigh = (unsigned int)(file_off >> 32);
|
||||
file_off += IO_SIZE;
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
off = ReadFile(hFile, buf, read_size, NULL, &ol);
|
||||
#ifdef TIMER
|
||||
time2_total += GetTickCount() - time_start;
|
||||
time2_total += clock() - time_start;
|
||||
#endif
|
||||
if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
|
||||
print_win32_err();
|
||||
@@ -557,7 +559,7 @@ time2_total += GetTickCount() - time_start;
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
off = 0; // チェックサム計算
|
||||
if (block_left > 0){ // 前回足りなかった分を追加する
|
||||
@@ -594,7 +596,7 @@ time_start = GetTickCount();
|
||||
}
|
||||
}
|
||||
#ifdef TIMER
|
||||
time3_total += GetTickCount() - time_start;
|
||||
time3_total += clock() - time_start;
|
||||
#endif
|
||||
|
||||
// 経過表示
|
||||
@@ -625,16 +627,17 @@ error_end:
|
||||
CloseHandle(ol.hEvent);
|
||||
|
||||
#ifdef TIMER
|
||||
time_total += GetTickCount() - time1_start;
|
||||
time_total += clock() - time1_start;
|
||||
if (*prog_now == total_file_size){
|
||||
printf("\nread %d.%03d sec\n", time2_total / 1000, time2_total % 1000);
|
||||
printf("main %d.%03d sec\n", time3_total / 1000, time3_total % 1000);
|
||||
if (time_total > 0){
|
||||
time_start = (int)((total_file_size * 125) / ((__int64)time_total * 131072));
|
||||
printf("\nread %.3f sec\n", (double)time2_total / CLOCKS_PER_SEC);
|
||||
printf("main %.3f sec\n", (double)time3_total / CLOCKS_PER_SEC);
|
||||
time_sec = (double)time_total / CLOCKS_PER_SEC;
|
||||
if (time_sec > 0){
|
||||
time_speed = (double)total_file_size / (time_sec * 1048576);
|
||||
} else {
|
||||
time_start = 0;
|
||||
time_speed = 0;
|
||||
}
|
||||
printf("total %d.%03d sec, %d MB/s\n", time_total / 1000, time_total % 1000, time_start);
|
||||
printf("total %.3f sec, %.0f MB/s\n", time_sec, time_speed);
|
||||
}
|
||||
#endif
|
||||
return err;
|
||||
@@ -660,7 +663,7 @@ int file_hash_crc(
|
||||
HANDLE hFile;
|
||||
OVERLAPPED ol;
|
||||
#ifdef TIMER
|
||||
time1_start = GetTickCount();
|
||||
time1_start = clock();
|
||||
#endif
|
||||
|
||||
// ソース・ファイルを開く
|
||||
@@ -699,11 +702,11 @@ time1_start = GetTickCount();
|
||||
if (file_left < io_size)
|
||||
read_size = (unsigned int)file_left;
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
off = ReadFile(hFile, buf1, read_size, NULL, &ol);
|
||||
#ifdef TIMER
|
||||
time2_total += GetTickCount() - time_start;
|
||||
time2_total += clock() - time_start;
|
||||
#endif
|
||||
if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
|
||||
print_win32_err();
|
||||
@@ -729,11 +732,11 @@ time2_total += GetTickCount() - time_start;
|
||||
ol.OffsetHigh = (unsigned int)(file_off >> 32);
|
||||
file_off += io_size;
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
off = ReadFile(hFile, buf, read_size, NULL, &ol);
|
||||
#ifdef TIMER
|
||||
time2_total += GetTickCount() - time_start;
|
||||
time2_total += clock() - time_start;
|
||||
#endif
|
||||
if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
|
||||
print_win32_err();
|
||||
@@ -749,7 +752,7 @@ time2_total += GetTickCount() - time_start;
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
off = 0; // チェックサム計算
|
||||
if (block_left > 0){ // 前回足りなかった分を追加する
|
||||
@@ -786,7 +789,7 @@ time_start = GetTickCount();
|
||||
}
|
||||
}
|
||||
#ifdef TIMER
|
||||
time3_total += GetTickCount() - time_start;
|
||||
time3_total += clock() - time_start;
|
||||
#endif
|
||||
|
||||
// 経過表示
|
||||
@@ -819,16 +822,17 @@ error_end:
|
||||
_aligned_free(buf1);
|
||||
|
||||
#ifdef TIMER
|
||||
time_total += GetTickCount() - time1_start;
|
||||
time_total += clock() - time1_start;
|
||||
if (*prog_now == total_file_size){
|
||||
printf("\nread %d.%03d sec\n", time2_total / 1000, time2_total % 1000);
|
||||
printf("main %d.%03d sec\n", time3_total / 1000, time3_total % 1000);
|
||||
if (time_total > 0){
|
||||
time_start = (int)((total_file_size * 125) / ((__int64)time_total * 131072));
|
||||
printf("\nread %.3f sec\n", (double)time2_total / CLOCKS_PER_SEC);
|
||||
printf("main %.3f sec\n", (double)time3_total / CLOCKS_PER_SEC);
|
||||
time_sec = (double)time_total / CLOCKS_PER_SEC;
|
||||
if (time_sec > 0){
|
||||
time_speed = (double)total_file_size / (time_sec * 1048576);
|
||||
} else {
|
||||
time_start = 0;
|
||||
time_speed = 0;
|
||||
}
|
||||
printf("total %d.%03d sec, %d MB/s\n", time_total / 1000, time_total % 1000, time_start);
|
||||
printf("total %.3f sec, %.0f MB/s\n", time_sec, time_speed);
|
||||
}
|
||||
#endif
|
||||
return err;
|
||||
@@ -1038,7 +1042,7 @@ int file_hash_check(
|
||||
PHMD5 hash_ctx, block_ctx;
|
||||
OVERLAPPED ol;
|
||||
#ifdef TIMER
|
||||
time1_start = GetTickCount();
|
||||
time1_start = clock();
|
||||
#endif
|
||||
|
||||
prog_last = -1; // 検証中のファイル名を毎回表示する
|
||||
@@ -1062,11 +1066,11 @@ time1_start = GetTickCount();
|
||||
file_left = file_size - 16384; // 本来のファイル・サイズまでしか検査しない
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
off = ReadFile(hFile, buf, len, NULL, &ol);
|
||||
#ifdef TIMER
|
||||
time2_total += GetTickCount() - time_start;
|
||||
time2_total += clock() - time_start;
|
||||
#endif
|
||||
if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
|
||||
print_win32_err();
|
||||
@@ -1141,11 +1145,11 @@ time2_total += GetTickCount() - time_start;
|
||||
if (file_left < IO_SIZE)
|
||||
read_size = (unsigned int)file_left;
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
off = ReadFile(hFile, buf1, read_size, NULL, &ol);
|
||||
#ifdef TIMER
|
||||
time2_total += GetTickCount() - time_start;
|
||||
time2_total += clock() - time_start;
|
||||
#endif
|
||||
if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
|
||||
print_win32_err();
|
||||
@@ -1168,11 +1172,11 @@ time2_total += GetTickCount() - time_start;
|
||||
ol.OffsetHigh = (unsigned int)(file_off >> 32);
|
||||
file_off += IO_SIZE;
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
off = ReadFile(hFile, buf, read_size, NULL, &ol);
|
||||
#ifdef TIMER
|
||||
time2_total += GetTickCount() - time_start;
|
||||
time2_total += clock() - time_start;
|
||||
#endif
|
||||
if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
|
||||
print_win32_err();
|
||||
@@ -1187,7 +1191,7 @@ time2_total += GetTickCount() - time_start;
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
if (s_blk != NULL){
|
||||
off = 0;
|
||||
@@ -1230,7 +1234,7 @@ time_start = GetTickCount();
|
||||
Phmd5Process(&hash_ctx, buf, len); // MD5 計算
|
||||
}
|
||||
#ifdef TIMER
|
||||
time3_total += GetTickCount() - time_start;
|
||||
time3_total += clock() - time_start;
|
||||
#endif
|
||||
|
||||
// 経過表示
|
||||
@@ -1267,15 +1271,16 @@ error_end:
|
||||
CloseHandle(ol.hEvent);
|
||||
|
||||
#ifdef TIMER
|
||||
time_total += GetTickCount() - time1_start;
|
||||
printf("\nread %d.%03d sec\n", time2_total / 1000, time2_total % 1000);
|
||||
printf("main %d.%03d sec\n", time3_total / 1000, time3_total % 1000);
|
||||
if (time_total > 0){
|
||||
time_start = (int)((file_size * 125) / ((__int64)time_total * 131072));
|
||||
time_total += clock() - time1_start;
|
||||
printf("\nread %.3f sec\n", (double)time2_total / CLOCKS_PER_SEC);
|
||||
printf("main %.3f sec\n", (double)time3_total / CLOCKS_PER_SEC);
|
||||
time_sec = (double)time_total / CLOCKS_PER_SEC;
|
||||
if (time_sec > 0){
|
||||
time_speed = (double)file_size / (time_sec * 1048576);
|
||||
} else {
|
||||
time_start = 0;
|
||||
time_speed = 0;
|
||||
}
|
||||
printf("total %d.%03d sec, %d MB/s\n", time_total / 1000, time_total % 1000, time_start);
|
||||
printf("total %.3f sec, %.0f MB/s\n", time_sec, time_speed);
|
||||
#endif
|
||||
return comp_num;
|
||||
}
|
||||
@@ -1536,7 +1541,7 @@ int file_hash_direct(
|
||||
HANDLE hFile;
|
||||
OVERLAPPED ol;
|
||||
#ifdef TIMER
|
||||
time1_start = GetTickCount();
|
||||
time1_start = clock();
|
||||
#endif
|
||||
|
||||
prog_last = -1; // 検証中のファイル名を毎回表示する
|
||||
@@ -1592,11 +1597,11 @@ time1_start = GetTickCount();
|
||||
file_left = file_size - 16384; // 本来のファイル・サイズまでしか検査しない
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
off = ReadFile(hFile, buf, read_size, NULL, &ol);
|
||||
#ifdef TIMER
|
||||
time2_total += GetTickCount() - time_start;
|
||||
time2_total += clock() - time_start;
|
||||
#endif
|
||||
if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
|
||||
comp_num = -1;
|
||||
@@ -1679,11 +1684,11 @@ time2_total += GetTickCount() - time_start;
|
||||
read_size = (read_size + 4095) & ~4095; // 4KB の倍数にする
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
off = ReadFile(hFile, buf1, read_size, NULL, &ol);
|
||||
#ifdef TIMER
|
||||
time2_total += GetTickCount() - time_start;
|
||||
time2_total += clock() - time_start;
|
||||
#endif
|
||||
if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
|
||||
print_win32_err();
|
||||
@@ -1710,11 +1715,11 @@ time2_total += GetTickCount() - time_start;
|
||||
ol.OffsetHigh = (unsigned int)(file_off >> 32);
|
||||
file_off += IO_SIZE;
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
off = ReadFile(hFile, buf, read_size, NULL, &ol);
|
||||
#ifdef TIMER
|
||||
time2_total += GetTickCount() - time_start;
|
||||
time2_total += clock() - time_start;
|
||||
#endif
|
||||
if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
|
||||
print_win32_err();
|
||||
@@ -1729,7 +1734,7 @@ time2_total += GetTickCount() - time_start;
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
if (s_blk != NULL){
|
||||
off = 0;
|
||||
@@ -1771,7 +1776,7 @@ time_start = GetTickCount();
|
||||
Phmd5Process(&hash_ctx, buf, len); // MD5 計算
|
||||
}
|
||||
#ifdef TIMER
|
||||
time3_total += GetTickCount() - time_start;
|
||||
time3_total += clock() - time_start;
|
||||
#endif
|
||||
|
||||
// 経過表示
|
||||
@@ -1812,10 +1817,16 @@ error_end:
|
||||
_aligned_free(buf1);
|
||||
|
||||
#ifdef TIMER
|
||||
time_total += GetTickCount() - time1_start;
|
||||
printf("\nread %d.%03d sec\n", time2_total / 1000, time2_total % 1000);
|
||||
printf("main %d.%03d sec\n", time3_total / 1000, time3_total % 1000);
|
||||
printf("total %d.%03d sec\n", time_total / 1000, time_total % 1000);
|
||||
time_total += clock() - time1_start;
|
||||
printf("\nread %.3f sec\n", (double)time2_total / CLOCKS_PER_SEC);
|
||||
printf("main %.3f sec\n", (double)time3_total / CLOCKS_PER_SEC);
|
||||
time_sec = (double)time_total / CLOCKS_PER_SEC;
|
||||
if (time_sec > 0){
|
||||
time_speed = (double)file_size / (time_sec * 1048576);
|
||||
} else {
|
||||
time_speed = 0;
|
||||
}
|
||||
printf("total %.3f sec, %.0f MB/s\n", time_sec, time_speed);
|
||||
#endif
|
||||
return comp_num;
|
||||
}
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
// par2_cmd.c
|
||||
// Copyright : 2023-10-15 Yutaka Sawada
|
||||
// Copyright : 2023-12-09 Yutaka Sawada
|
||||
// License : GPL
|
||||
|
||||
#ifndef _UNICODE
|
||||
@@ -1479,14 +1479,12 @@ ri= switch_set & 0x00040000
|
||||
} else if (wcsncmp(tmp_p, L"lc", 2) == 0){
|
||||
k = 0;
|
||||
j = 2;
|
||||
while ((j < 2 + 5) && (tmp_p[j] >= '0') && (tmp_p[j] <= '9')){
|
||||
while ((j < 2 + 7) && (tmp_p[j] >= '0') && (tmp_p[j] <= '9')){
|
||||
k = (k * 10) + (tmp_p[j] - '0');
|
||||
j++;
|
||||
}
|
||||
if (k & 256){ // GPU を使う
|
||||
OpenCL_method = 1; // Faster GPU
|
||||
} else if (k & 512){
|
||||
OpenCL_method = -1; // Slower GPU
|
||||
if (k & 0x300){ // GPU を使う
|
||||
OpenCL_method = k & 0x003F0300;
|
||||
}
|
||||
if (k & 1024) // CLMUL と ALTMAP を使わない
|
||||
cpu_flag = (cpu_flag & 0xFFFFFFF7) | 256;
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
// reedsolomon.c
|
||||
// Copyright : 2023-10-26 Yutaka Sawada
|
||||
// Copyright : 2023-12-12 Yutaka Sawada
|
||||
// License : GPL
|
||||
|
||||
#ifndef _UNICODE
|
||||
@@ -27,6 +27,9 @@
|
||||
#include "rs_decode.h"
|
||||
#include "reedsolomon.h"
|
||||
|
||||
#ifdef TIMER
|
||||
#include <time.h>
|
||||
#endif
|
||||
|
||||
// GPU を使う最小データサイズ (MB 単位)
|
||||
// GPU の起動には時間がかかるので、データが小さすぎると逆に遅くなる
|
||||
@@ -739,7 +742,7 @@ int rs_encode(
|
||||
int err = 0;
|
||||
unsigned int len;
|
||||
#ifdef TIMER
|
||||
unsigned int time_total = GetTickCount();
|
||||
clock_t time_total = clock();
|
||||
#endif
|
||||
|
||||
if (galois_create_table()){
|
||||
@@ -755,7 +758,7 @@ unsigned int time_total = GetTickCount();
|
||||
// パリティ計算用の行列演算の準備をする
|
||||
len = sizeof(unsigned short) * source_num;
|
||||
if (OpenCL_method != 0)
|
||||
len *= 2; // GPU の作業領域も確保しておく
|
||||
len *= 3; // GPU の作業領域も確保しておく
|
||||
constant = malloc(len);
|
||||
if (constant == NULL){
|
||||
printf("malloc, %d\n", len);
|
||||
@@ -799,8 +802,8 @@ unsigned int time_total = GetTickCount();
|
||||
err = encode_method2(file_path, header_buf, rcv_hFile, files, s_blk, p_blk, constant);
|
||||
#ifdef TIMER
|
||||
if (err != 1){
|
||||
time_total = GetTickCount() - time_total;
|
||||
printf("total %d.%03d sec\n", time_total / 1000, time_total % 1000);
|
||||
time_total = clock() - time_total;
|
||||
printf("total %.3f sec\n", (double)time_total / CLOCKS_PER_SEC);
|
||||
}
|
||||
#endif
|
||||
|
||||
@@ -830,7 +833,7 @@ int rs_encode_1pass(
|
||||
int err = 0;
|
||||
unsigned int len;
|
||||
#ifdef TIMER
|
||||
unsigned int time_total = GetTickCount();
|
||||
clock_t time_total = clock();
|
||||
#endif
|
||||
|
||||
if (galois_create_table()){
|
||||
@@ -841,7 +844,7 @@ unsigned int time_total = GetTickCount();
|
||||
// パリティ計算用の行列演算の準備をする
|
||||
len = sizeof(unsigned short) * source_num;
|
||||
if (OpenCL_method != 0)
|
||||
len *= 2; // GPU の作業領域も確保しておく
|
||||
len *= 3; // GPU の作業領域も確保しておく
|
||||
constant = malloc(len);
|
||||
if (constant == NULL){
|
||||
printf("malloc, %d\n", len);
|
||||
@@ -888,8 +891,8 @@ unsigned int time_total = GetTickCount();
|
||||
if (err < 0){
|
||||
printf("switching to 2-pass processing, %d\n", err);
|
||||
} else if (err != 1){
|
||||
time_total = GetTickCount() - time_total;
|
||||
printf("total %d.%03d sec\n", time_total / 1000, time_total % 1000);
|
||||
time_total = clock() - time_total;
|
||||
printf("total %.3f sec\n", (double)time_total / CLOCKS_PER_SEC);
|
||||
}
|
||||
#endif
|
||||
|
||||
@@ -913,7 +916,7 @@ int rs_decode(
|
||||
int err = 0, i, j, k;
|
||||
unsigned int len;
|
||||
#ifdef TIMER
|
||||
unsigned int time_matrix = 0, time_total = GetTickCount();
|
||||
clock_t time_matrix = 0, time_total = clock();
|
||||
#endif
|
||||
|
||||
if (galois_create_table()){
|
||||
@@ -948,7 +951,7 @@ unsigned int time_matrix = 0, time_total = GetTickCount();
|
||||
id = mat + (block_lost * source_num);
|
||||
|
||||
#ifdef TIMER
|
||||
time_matrix = GetTickCount();
|
||||
time_matrix = clock();
|
||||
#endif
|
||||
// 復元用の行列を計算する
|
||||
print_progress_text(0, "Computing matrix");
|
||||
@@ -989,7 +992,7 @@ time_matrix = GetTickCount();
|
||||
//for (i = 0; i < block_lost; i++)
|
||||
// printf("id[%d] = %d\n", i, id[i]);
|
||||
#ifdef TIMER
|
||||
time_matrix = GetTickCount() - time_matrix;
|
||||
time_matrix = clock() - time_matrix;
|
||||
#endif
|
||||
|
||||
#ifdef TIMER
|
||||
@@ -1032,9 +1035,9 @@ time_matrix = GetTickCount() - time_matrix;
|
||||
err = decode_method2(file_path, block_lost, rcv_hFile, files, s_blk, p_blk, mat);
|
||||
#ifdef TIMER
|
||||
if (err != 1){
|
||||
time_total = GetTickCount() - time_total;
|
||||
printf("total %d.%03d sec\n", time_total / 1000, time_total % 1000);
|
||||
printf("matrix %d.%03d sec\n", time_matrix / 1000, time_matrix % 1000);
|
||||
time_total = clock() - time_total;
|
||||
printf("total %.3f sec\n", (double)time_total / CLOCKS_PER_SEC);
|
||||
printf("matrix %.3f sec\n", (double)time_matrix / CLOCKS_PER_SEC);
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
1 RT_STRING ".\\source.cl"
|
||||
|
||||
1 VERSIONINFO
|
||||
FILEVERSION 1,3,3,1
|
||||
FILEVERSION 1,3,3,3
|
||||
PRODUCTVERSION 1,3,3,0
|
||||
FILEOS 0x40004
|
||||
FILETYPE 0x1
|
||||
@@ -11,9 +11,9 @@ BLOCK "StringFileInfo"
|
||||
BLOCK "040904B0"
|
||||
{
|
||||
VALUE "FileDescription", "PAR2 client"
|
||||
VALUE "LegalCopyright", "Copyright (C) 2023 Yutaka Sawada"
|
||||
VALUE "LegalCopyright", "Copyright (C) 2024 Yutaka Sawada"
|
||||
VALUE "ProductName", "par2j"
|
||||
VALUE "FileVersion", "1.3.3.1"
|
||||
VALUE "FileVersion", "1.3.3.3"
|
||||
VALUE "ProductVersion", "1.3.3.0"
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
// rs_decode.c
|
||||
// Copyright : 2023-10-29 Yutaka Sawada
|
||||
// Copyright : 2023-12-13 Yutaka Sawada
|
||||
// License : GPL
|
||||
|
||||
#ifndef _UNICODE
|
||||
@@ -28,7 +28,9 @@
|
||||
|
||||
|
||||
#ifdef TIMER
|
||||
static unsigned int time_start, time_read = 0, time_write = 0, time_calc = 0;
|
||||
#include <time.h>
|
||||
static double time_sec, time_speed;
|
||||
static clock_t time_start, time_read = 0, time_write = 0, time_calc = 0;
|
||||
static unsigned int read_count, write_count = 0, skip_count;
|
||||
#endif
|
||||
|
||||
@@ -60,7 +62,7 @@ static DWORD WINAPI thread_decode2(LPVOID lpParameter)
|
||||
RS_TH *th;
|
||||
#ifdef TIMER
|
||||
unsigned int loop_count2a = 0, loop_count2b = 0;
|
||||
unsigned int time_start2, time_encode2a = 0, time_encode2b = 0;
|
||||
clock_t time_start2, time_encode2a = 0, time_encode2b = 0;
|
||||
#endif
|
||||
|
||||
th = (RS_TH *)lpParameter;
|
||||
@@ -78,7 +80,7 @@ unsigned int time_start2, time_encode2a = 0, time_encode2b = 0;
|
||||
WaitForSingleObject(hRun, INFINITE); // 計算開始の合図を待つ
|
||||
while (th->now < INT_MAX / 2){
|
||||
#ifdef TIMER
|
||||
time_start2 = GetTickCount();
|
||||
time_start2 = clock();
|
||||
#endif
|
||||
s_buf = th->buf;
|
||||
factor = th->mat;
|
||||
@@ -95,7 +97,7 @@ loop_count2a++;
|
||||
#endif
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_encode2a += GetTickCount() - time_start2;
|
||||
time_encode2a += clock() - time_start2;
|
||||
#endif
|
||||
} else { // 消失ブロックを部分的に保持する場合
|
||||
// スレッドごとに復元する消失ブロックの chunk を変える
|
||||
@@ -136,7 +138,7 @@ loop_count2b += src_num;
|
||||
#endif
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_encode2b += GetTickCount() - time_start2;
|
||||
time_encode2b += clock() - time_start2;
|
||||
#endif
|
||||
}
|
||||
//_mm_sfence(); // メモリーへの書き込みを完了する
|
||||
@@ -146,19 +148,21 @@ time_encode2b += GetTickCount() - time_start2;
|
||||
#ifdef TIMER
|
||||
loop_count2b /= chunk_num; // chunk数で割ってブロック数にする
|
||||
printf("sub-thread : total loop = %d\n", loop_count2a + loop_count2b);
|
||||
if (time_encode2a > 0){
|
||||
i = (int)((__int64)loop_count2a * unit_size * 125 / ((__int64)time_encode2a * 131072));
|
||||
time_sec = (double)time_encode2a / CLOCKS_PER_SEC;
|
||||
if (time_sec > 0){
|
||||
time_speed = ((double)loop_count2a * unit_size) / (time_sec * 1048576);
|
||||
} else {
|
||||
i = 0;
|
||||
time_speed = 0;
|
||||
}
|
||||
if (loop_count2a > 0)
|
||||
printf(" 1st decode %d.%03d sec, %d loop, %d MB/s\n", time_encode2a / 1000, time_encode2a % 1000, loop_count2a, i);
|
||||
if (time_encode2b > 0){
|
||||
i = (int)((__int64)loop_count2b * unit_size * 125 / ((__int64)time_encode2b * 131072));
|
||||
printf(" 1st decode %.3f sec, %d loop, %.0f MB/s\n", time_sec, loop_count2a, time_speed);
|
||||
time_sec = (double)time_encode2b / CLOCKS_PER_SEC;
|
||||
if (time_sec > 0){
|
||||
time_speed = ((double)loop_count2b * unit_size) / (time_sec * 1048576);
|
||||
} else {
|
||||
i = 0;
|
||||
time_speed = 0;
|
||||
}
|
||||
printf(" 2nd decode %d.%03d sec, %d loop, %d MB/s\n", time_encode2b / 1000, time_encode2b % 1000, loop_count2b, i);
|
||||
printf(" 2nd decode %.3f sec, %d loop, %.0f MB/s\n", time_sec, loop_count2b, time_speed);
|
||||
#endif
|
||||
|
||||
// 終了処理
|
||||
@@ -178,7 +182,7 @@ static DWORD WINAPI thread_decode3(LPVOID lpParameter)
|
||||
RS_TH *th;
|
||||
#ifdef TIMER
|
||||
unsigned int loop_count2a = 0, loop_count2b = 0;
|
||||
unsigned int time_start2, time_encode2a = 0, time_encode2b = 0;
|
||||
clock_t time_start2, time_encode2a = 0, time_encode2b = 0;
|
||||
#endif
|
||||
|
||||
th = (RS_TH *)lpParameter;
|
||||
@@ -197,7 +201,7 @@ unsigned int time_start2, time_encode2a = 0, time_encode2b = 0;
|
||||
WaitForSingleObject(hRun, INFINITE); // 計算開始の合図を待つ
|
||||
while (th->now < INT_MAX / 2){
|
||||
#ifdef TIMER
|
||||
time_start2 = GetTickCount();
|
||||
time_start2 = clock();
|
||||
#endif
|
||||
s_buf = th->buf;
|
||||
factor = th->mat;
|
||||
@@ -214,7 +218,7 @@ loop_count2a++;
|
||||
#endif
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_encode2a += GetTickCount() - time_start2;
|
||||
time_encode2a += clock() - time_start2;
|
||||
#endif
|
||||
} else { // 全ての消失ブロックを保持する場合
|
||||
// スレッドごとに復元する消失ブロックの chunk を変える
|
||||
@@ -250,7 +254,7 @@ loop_count2b += src_num;
|
||||
#endif
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_encode2b += GetTickCount() - time_start2;
|
||||
time_encode2b += clock() - time_start2;
|
||||
#endif
|
||||
}
|
||||
//_mm_sfence(); // メモリーへの書き込みを完了する
|
||||
@@ -260,19 +264,21 @@ time_encode2b += GetTickCount() - time_start2;
|
||||
#ifdef TIMER
|
||||
loop_count2b /= chunk_num; // chunk数で割ってブロック数にする
|
||||
printf("sub-thread : total loop = %d\n", loop_count2a + loop_count2b);
|
||||
if (time_encode2a > 0){
|
||||
i = (int)((__int64)loop_count2a * unit_size * 125 / ((__int64)time_encode2a * 131072));
|
||||
time_sec = (double)time_encode2a / CLOCKS_PER_SEC;
|
||||
if (time_sec > 0){
|
||||
time_speed = ((double)loop_count2a * unit_size) / (time_sec * 1048576);
|
||||
} else {
|
||||
i = 0;
|
||||
time_speed = 0;
|
||||
}
|
||||
if (loop_count2a > 0)
|
||||
printf(" 1st decode %d.%03d sec, %d loop, %d MB/s\n", time_encode2a / 1000, time_encode2a % 1000, loop_count2a, i);
|
||||
if (time_encode2b > 0){
|
||||
i = (int)((__int64)loop_count2b * unit_size * 125 / ((__int64)time_encode2b * 131072));
|
||||
printf(" 1st decode %.3f sec, %d loop, %.0f MB/s\n", time_sec, loop_count2a, time_speed);
|
||||
time_sec = (double)time_encode2b / CLOCKS_PER_SEC;
|
||||
if (time_sec > 0){
|
||||
time_speed = ((double)loop_count2b * unit_size) / (time_sec * 1048576);
|
||||
} else {
|
||||
i = 0;
|
||||
time_speed = 0;
|
||||
}
|
||||
printf(" 2nd decode %d.%03d sec, %d loop, %d MB/s\n", time_encode2b / 1000, time_encode2b % 1000, loop_count2b, i);
|
||||
printf(" 2nd decode %.3f sec, %d loop, %.0f MB/s\n", time_sec, loop_count2b, time_speed);
|
||||
#endif
|
||||
|
||||
// 終了処理
|
||||
@@ -292,7 +298,8 @@ static DWORD WINAPI thread_decode_gpu(LPVOID lpParameter)
|
||||
HANDLE hRun, hEnd;
|
||||
RS_TH *th;
|
||||
#ifdef TIMER
|
||||
unsigned int time_start2, time_encode2 = 0, loop_count2 = 0;
|
||||
unsigned int loop_count2 = 0;
|
||||
clock_t time_start2, time_encode2 = 0;
|
||||
#endif
|
||||
|
||||
th = (RS_TH *)lpParameter;
|
||||
@@ -307,7 +314,7 @@ unsigned int time_start2, time_encode2 = 0, loop_count2 = 0;
|
||||
WaitForSingleObject(hRun, INFINITE); // 計算開始の合図を待つ
|
||||
while (th->now < INT_MAX / 2){
|
||||
#ifdef TIMER
|
||||
time_start2 = GetTickCount();
|
||||
time_start2 = clock();
|
||||
#endif
|
||||
// GPUはソース・ブロック読み込み中に呼ばれない
|
||||
s_buf = th->buf;
|
||||
@@ -321,22 +328,58 @@ time_start2 = GetTickCount();
|
||||
InterlockedExchange(&(th->now), INT_MAX / 3); // サブ・スレッドの計算を中断する
|
||||
}
|
||||
|
||||
// スレッドごとに復元する消失ブロックを変える
|
||||
while ((j = InterlockedIncrement(&(th->now))) < block_lost){ // j = ++th_now
|
||||
// 倍率は逆行列から部分的にコピーする
|
||||
i = gpu_multiply_blocks(src_num, factor + source_num * j, g_buf + (size_t)unit_size * j, unit_size);
|
||||
if (i != 0){
|
||||
th->len = i;
|
||||
InterlockedExchange(&(th->now), INT_MAX / 3); // サブ・スレッドの計算を中断する
|
||||
break;
|
||||
}
|
||||
// 一つの GPUスレッドが全ての消失ブロックを処理する
|
||||
if (OpenCL_method & 8){ // 2ブロックずつ計算する
|
||||
// 消失ブロック数が奇数なら、最初の一個だけ別に計算する
|
||||
if (block_lost & 1){
|
||||
InterlockedIncrement(&(th->now)); // 常に j = 0 となる
|
||||
|
||||
// 倍率は逆行列から部分的にコピーする
|
||||
i = gpu_multiply_blocks(src_num, factor, NULL, g_buf, unit_size);
|
||||
if (i != 0){
|
||||
th->len = i;
|
||||
InterlockedExchange(&(th->now), INT_MAX / 3); // サブ・スレッドの計算を中断する
|
||||
break;
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
loop_count2 += src_num;
|
||||
#endif
|
||||
}
|
||||
|
||||
// 残りのブロックは二個ずつ計算する
|
||||
while ((j = InterlockedAdd(&(th->now), 2)) < block_lost){ // th_now += 2, j = th_now
|
||||
j--; // +2 してるから、最初のブロックは -1 する
|
||||
|
||||
// 倍率は逆行列から部分的に2回コピーする
|
||||
i = gpu_multiply_blocks(src_num, factor + source_num * j, factor + source_num * (j + 1), g_buf + (size_t)unit_size * j, unit_size * 2);
|
||||
if (i != 0){
|
||||
th->len = i;
|
||||
InterlockedExchange(&(th->now), INT_MAX / 3); // サブ・スレッドの計算を中断する
|
||||
break;
|
||||
}
|
||||
#ifdef TIMER
|
||||
loop_count2 += src_num * 2;
|
||||
#endif
|
||||
}
|
||||
|
||||
} else { // 以前からの1ブロックずつ計算する方式
|
||||
while ((j = InterlockedIncrement(&(th->now))) < block_lost){ // j = ++th_now
|
||||
// 倍率は逆行列から部分的にコピーする(2ブロックずつの場合はブロック数をマイナスにする)
|
||||
i = gpu_multiply_blocks(src_num, factor + source_num * j, NULL, g_buf + (size_t)unit_size * j, unit_size);
|
||||
if (i != 0){
|
||||
th->len = i;
|
||||
InterlockedExchange(&(th->now), INT_MAX / 3); // サブ・スレッドの計算を中断する
|
||||
break;
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
loop_count2 += src_num;
|
||||
#endif
|
||||
}
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_encode2 += GetTickCount() - time_start2;
|
||||
time_encode2 += clock() - time_start2;
|
||||
#endif
|
||||
// 最後にVRAMを解放する
|
||||
i = gpu_finish();
|
||||
@@ -349,12 +392,13 @@ time_encode2 += GetTickCount() - time_start2;
|
||||
}
|
||||
#ifdef TIMER
|
||||
printf("gpu-thread :\n");
|
||||
if (time_encode2 > 0){
|
||||
i = (int)((__int64)loop_count2 * unit_size * 125 / ((__int64)time_encode2 * 131072));
|
||||
time_sec = (double)time_encode2 / CLOCKS_PER_SEC;
|
||||
if (time_sec > 0){
|
||||
time_speed = ((double)loop_count2 * unit_size) / (time_sec * 1048576);
|
||||
} else {
|
||||
i = 0;
|
||||
time_speed = 0;
|
||||
}
|
||||
printf(" 2nd decode %d.%03d sec, %d loop, %d MB/s\n", time_encode2 / 1000, time_encode2 % 1000, loop_count2, i);
|
||||
printf(" 2nd decode %.3f sec, %d loop, %.0f MB/s\n", time_sec, loop_count2, time_speed);
|
||||
#endif
|
||||
|
||||
// 終了処理
|
||||
@@ -430,7 +474,7 @@ int decode_method1( // ソース・ブロックが一個だけの場合
|
||||
block_off = 0;
|
||||
while (block_off < block_size){
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
// パリティ・ブロックを読み込む
|
||||
len = block_size - block_off;
|
||||
@@ -447,18 +491,18 @@ time_start = GetTickCount();
|
||||
// パリティ・ブロックのチェックサムを計算する
|
||||
checksum16_altmap(buf, buf + io_size, io_size);
|
||||
#ifdef TIMER
|
||||
time_read += GetTickCount() - time_start;
|
||||
time_read += clock() - time_start;
|
||||
#endif
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
// 失われたソース・ブロックを復元する
|
||||
memset(work_buf, 0, unit_size);
|
||||
// factor で割ると元に戻る
|
||||
galois_align_multiply(buf, work_buf, unit_size, galois_divide(1, galois_power(2, id)));
|
||||
#ifdef TIMER
|
||||
time_calc += GetTickCount() - time_start;
|
||||
time_calc += clock() - time_start;
|
||||
#endif
|
||||
|
||||
// 経過表示
|
||||
@@ -472,7 +516,7 @@ time_calc += GetTickCount() - time_start;
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
// 復元されたソース・ブロックのチェックサムを検証する
|
||||
checksum16_return(work_buf, hash, io_size);
|
||||
@@ -491,7 +535,7 @@ time_start = GetTickCount();
|
||||
goto error_end;
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_write += GetTickCount() - time_start;
|
||||
time_write += clock() - time_start;
|
||||
#endif
|
||||
|
||||
block_off += io_size;
|
||||
@@ -499,9 +543,9 @@ time_write += GetTickCount() - time_start;
|
||||
print_progress_done(); // 末尾ブロックの断片化によっては 100% で完了するとは限らない
|
||||
|
||||
#ifdef TIMER
|
||||
printf("read %d.%03d sec\n", time_read / 1000, time_read % 1000);
|
||||
printf("write %d.%03d sec\n", time_write / 1000, time_write % 1000);
|
||||
printf("decode %d.%03d sec\n", time_calc / 1000, time_calc % 1000);
|
||||
printf("read %.3f sec\n", (double)time_read / CLOCKS_PER_SEC);
|
||||
printf("write %.3f sec\n", (double)time_write / CLOCKS_PER_SEC);
|
||||
printf("decode %.3f sec\n", (double)time_calc / CLOCKS_PER_SEC);
|
||||
#endif
|
||||
|
||||
error_end:
|
||||
@@ -623,7 +667,7 @@ int decode_method2( // ソース・データを全て読み込む場合
|
||||
#ifdef TIMER
|
||||
read_count = 0;
|
||||
skip_count = 0;
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
last_file = -1;
|
||||
recv_now = 0; // 何番目の代替ブロックか
|
||||
@@ -760,7 +804,7 @@ skip_count++;
|
||||
hFile = NULL;
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_read += GetTickCount() - time_start;
|
||||
time_read += clock() - time_start;
|
||||
#endif
|
||||
|
||||
WaitForMultipleObjects(cpu_num1, hEnd, TRUE, INFINITE); // サブ・スレッドの計算終了の合図を待つ
|
||||
@@ -845,7 +889,7 @@ skip_count++;
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
// 復元されたブロックを書き込む
|
||||
work_buf = p_buf;
|
||||
@@ -916,7 +960,7 @@ write_count++;
|
||||
}
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_write += GetTickCount() - time_start;
|
||||
time_write += clock() - time_start;
|
||||
#endif
|
||||
|
||||
part_off += part_num; // 次の消失ブロック位置にする
|
||||
@@ -930,9 +974,9 @@ time_write += GetTickCount() - time_start;
|
||||
print_progress_done();
|
||||
|
||||
#ifdef TIMER
|
||||
printf("read %d.%03d sec\n", time_read / 1000, time_read % 1000);
|
||||
printf("read %.3f sec\n", (double)time_read / CLOCKS_PER_SEC);
|
||||
j = ((block_size + io_size - 1) / io_size) * block_lost;
|
||||
printf("write %d.%03d sec, count = %d/%d\n", time_write / 1000, time_write % 1000, write_count, j);
|
||||
printf("write %.3f sec, count = %d/%d\n", (double)time_write / CLOCKS_PER_SEC, write_count, j);
|
||||
if (prog_num != prog_base)
|
||||
printf(" prog_num = %I64d, prog_base = %I64d\n", prog_num, prog_base);
|
||||
#endif
|
||||
@@ -1063,7 +1107,7 @@ int decode_method3( // 復元するブロックを全て保持できる場合
|
||||
|
||||
#ifdef TIMER
|
||||
read_count = 0;
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
last_file = -1;
|
||||
for (i = 0; i < read_num; i++){ // スライスを一個ずつ読み込んでメモリー上に配置していく
|
||||
@@ -1173,7 +1217,7 @@ read_count++;
|
||||
hFile = NULL;
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_read += GetTickCount() - time_start;
|
||||
time_read += clock() - time_start;
|
||||
#endif
|
||||
|
||||
WaitForMultipleObjects(cpu_num1, hEnd, TRUE, INFINITE); // サブ・スレッドの計算終了の合図を待つ
|
||||
@@ -1238,7 +1282,7 @@ time_read += GetTickCount() - time_start;
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
// 復元されたブロックを書き込む
|
||||
work_buf = p_buf;
|
||||
@@ -1297,7 +1341,7 @@ time_start = GetTickCount();
|
||||
}
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_write += GetTickCount() - time_start;
|
||||
time_write += clock() - time_start;
|
||||
#endif
|
||||
// 最後の書き込みファイルを閉じる
|
||||
CloseHandle(hFile);
|
||||
@@ -1305,8 +1349,8 @@ time_write += GetTickCount() - time_start;
|
||||
print_progress_done();
|
||||
|
||||
#ifdef TIMER
|
||||
printf("read %d.%03d sec\n", time_read / 1000, time_read % 1000);
|
||||
printf("write %d.%03d sec\n", time_write / 1000, time_write % 1000);
|
||||
printf("read %.3f sec\n", (double)time_read / CLOCKS_PER_SEC);
|
||||
printf("write %.3f sec\n", (double)time_write / CLOCKS_PER_SEC);
|
||||
if (prog_num != prog_base)
|
||||
printf(" prog_num = %I64d, prog_base = %I64d\n", prog_num, prog_base);
|
||||
#endif
|
||||
@@ -1463,7 +1507,7 @@ int decode_method4( // 全てのブロックを断片的に保持する場合 (G
|
||||
#ifdef TIMER
|
||||
read_count = 0;
|
||||
skip_count = 0;
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
last_file = -1;
|
||||
recv_now = 0; // 何番目の代替ブロックか
|
||||
@@ -1600,7 +1644,7 @@ skip_count++;
|
||||
hFile = NULL;
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_read += GetTickCount() - time_start;
|
||||
time_read += clock() - time_start;
|
||||
#endif
|
||||
|
||||
memset(g_buf, 0, (size_t)unit_size * block_lost); // 待機中に GPU用の領域をゼロ埋めしておく
|
||||
@@ -1731,10 +1775,21 @@ skip_count++;
|
||||
#endif
|
||||
} else if (src_off + src_num + src_max > source_num){
|
||||
src_num = source_num - src_off - src_max;
|
||||
if ((src_num < src_max) && (src_num + src_max <= vram_max) && (gpu_end * 2 > cpu_end)){
|
||||
src_num += src_max; // GPU担当量が少なくて、余裕がある場合は、残りも全て任せる
|
||||
if (src_num < src_max){
|
||||
if ((src_num + src_max <= vram_max) && (gpu_end * 2 > cpu_end)){
|
||||
src_num += src_max; // GPU担当量が少なくて、余裕がある場合は、残りも全て任せる
|
||||
#ifdef TIMER
|
||||
printf("GPU last +: src_off = %d, src_num = %d + %d\n", src_off, src_num - src_max, src_max);
|
||||
#endif
|
||||
} else if (src_num < src_max / 4){
|
||||
src_num = src_max / 4; // src_num が小さくなり過ぎないようにする
|
||||
#ifdef TIMER
|
||||
printf("GPU last ?: src_off = %d, src_num = %d\n", src_off, src_num);
|
||||
} else {
|
||||
printf("GPU last -: src_off = %d, src_num = %d\n", src_off, src_num);
|
||||
#endif
|
||||
}
|
||||
#ifdef TIMER
|
||||
printf("GPU last +: src_off = %d, src_num = %d + %d\n", src_off, src_num - src_max, src_max);
|
||||
} else {
|
||||
printf("GPU last 2: src_off = %d, src_num = %d\n", src_off, src_num);
|
||||
#endif
|
||||
@@ -1834,7 +1889,7 @@ skip_count++;
|
||||
prog_num += th->size * block_lost;
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
// 復元されたブロックを書き込む
|
||||
work_buf = p_buf;
|
||||
@@ -1907,7 +1962,7 @@ write_count++;
|
||||
}
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_write += GetTickCount() - time_start;
|
||||
time_write += clock() - time_start;
|
||||
#endif
|
||||
|
||||
block_off += io_size;
|
||||
@@ -1918,9 +1973,9 @@ time_write += GetTickCount() - time_start;
|
||||
print_progress_done();
|
||||
|
||||
#ifdef TIMER
|
||||
printf("read %d.%03d sec\n", time_read / 1000, time_read % 1000);
|
||||
printf("read %.3f sec\n", (double)time_read / CLOCKS_PER_SEC);
|
||||
j = ((block_size + io_size - 1) / io_size) * block_lost;
|
||||
printf("write %d.%03d sec, count = %d/%d\n", time_write / 1000, time_write % 1000, write_count, j);
|
||||
printf("write %.3f sec, count = %d/%d\n", (double)time_write / CLOCKS_PER_SEC, write_count, j);
|
||||
if (prog_num != prog_base)
|
||||
printf(" prog_num = %I64d, prog_base = %I64d\n", prog_num, prog_base);
|
||||
#endif
|
||||
@@ -2085,7 +2140,7 @@ int decode_method5( // 復元するブロックだけ保持する場合 (GPU対
|
||||
|
||||
#ifdef TIMER
|
||||
read_count = 0;
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
last_file = -1;
|
||||
for (i = 0; i < read_num; i++){ // スライスを一個ずつ読み込んでメモリー上に配置していく
|
||||
@@ -2195,7 +2250,7 @@ read_count++;
|
||||
hFile = NULL;
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_read += GetTickCount() - time_start;
|
||||
time_read += clock() - time_start;
|
||||
#endif
|
||||
|
||||
if (source_off == 0)
|
||||
@@ -2318,10 +2373,21 @@ time_read += GetTickCount() - time_start;
|
||||
#endif
|
||||
} else if (src_off + src_num + src_max > read_num){
|
||||
src_num = read_num - src_off - src_max;
|
||||
if ((src_num < src_max) && (src_num + src_max <= vram_max) && (gpu_end * 2 > cpu_end)){
|
||||
src_num += src_max; // GPU担当量が少なくて、余裕がある場合は、残りも全て任せる
|
||||
if (src_num < src_max){
|
||||
if ((src_num + src_max <= vram_max) && (gpu_end * 2 > cpu_end)){
|
||||
src_num += src_max; // GPU担当量が少なくて、余裕がある場合は、残りも全て任せる
|
||||
#ifdef TIMER
|
||||
printf("GPU last +: src_off = %d, src_num = %d + %d\n", src_off, src_num - src_max, src_max);
|
||||
#endif
|
||||
} else if (src_num < src_max / 4){
|
||||
src_num = src_max / 4; // src_num が小さくなり過ぎないようにする
|
||||
#ifdef TIMER
|
||||
printf("GPU last ?: src_off = %d, src_num = %d\n", src_off, src_num);
|
||||
} else {
|
||||
printf("GPU last -: src_off = %d, src_num = %d\n", src_off, src_num);
|
||||
#endif
|
||||
}
|
||||
#ifdef TIMER
|
||||
printf("GPU last +: src_off = %d, src_num = %d + %d\n", src_off, src_num - src_max, src_max);
|
||||
} else {
|
||||
printf("GPU last 2: src_off = %d, src_num = %d\n", src_off, src_num);
|
||||
#endif
|
||||
@@ -2424,7 +2490,7 @@ time_read += GetTickCount() - time_start;
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
// 復元されたブロックを書き込む
|
||||
work_buf = p_buf;
|
||||
@@ -2485,7 +2551,7 @@ time_start = GetTickCount();
|
||||
}
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_write += GetTickCount() - time_start;
|
||||
time_write += clock() - time_start;
|
||||
#endif
|
||||
// 最後の書き込みファイルを閉じる
|
||||
CloseHandle(hFile);
|
||||
@@ -2493,8 +2559,8 @@ time_write += GetTickCount() - time_start;
|
||||
print_progress_done();
|
||||
|
||||
#ifdef TIMER
|
||||
printf("read %d.%03d sec\n", time_read / 1000, time_read % 1000);
|
||||
printf("write %d.%03d sec\n", time_write / 1000, time_write % 1000);
|
||||
printf("read %.3f sec\n", (double)time_read / CLOCKS_PER_SEC);
|
||||
printf("write %.3f sec\n", (double)time_write / CLOCKS_PER_SEC);
|
||||
if (prog_num != prog_base)
|
||||
printf(" prog_num = %I64d, prog_base = %I64d\n", prog_num, prog_base);
|
||||
#endif
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
// rs_encode.c
|
||||
// Copyright : 2023-10-29 Yutaka Sawada
|
||||
// Copyright : 2023-12-18 Yutaka Sawada
|
||||
// License : GPL
|
||||
|
||||
#ifndef _UNICODE
|
||||
@@ -29,7 +29,9 @@
|
||||
|
||||
|
||||
#ifdef TIMER
|
||||
static unsigned int time_start, time_read = 0, time_write = 0, time_calc = 0;
|
||||
#include <time.h>
|
||||
static double time_sec, time_speed;
|
||||
static clock_t time_start, time_read = 0, time_write = 0, time_calc = 0;
|
||||
static unsigned int read_count, skip_count;
|
||||
#endif
|
||||
|
||||
@@ -61,7 +63,7 @@ static DWORD WINAPI thread_encode2(LPVOID lpParameter)
|
||||
RS_TH *th;
|
||||
#ifdef TIMER
|
||||
unsigned int loop_count2a = 0, loop_count2b = 0;
|
||||
unsigned int time_start2, time_encode2a = 0, time_encode2b = 0;
|
||||
clock_t time_start2, time_encode2a = 0, time_encode2b = 0;
|
||||
#endif
|
||||
|
||||
th = (RS_TH *)lpParameter;
|
||||
@@ -80,7 +82,7 @@ unsigned int time_start2, time_encode2a = 0, time_encode2b = 0;
|
||||
WaitForSingleObject(hRun, INFINITE); // 計算開始の合図を待つ
|
||||
while (th->now < INT_MAX / 2){
|
||||
#ifdef TIMER
|
||||
time_start2 = GetTickCount();
|
||||
time_start2 = clock();
|
||||
#endif
|
||||
s_buf = th->buf;
|
||||
src_off = th->off; // ソース・ブロック番号
|
||||
@@ -98,7 +100,7 @@ loop_count2a++;
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
time_encode2a += GetTickCount() - time_start2;
|
||||
time_encode2a += clock() - time_start2;
|
||||
#endif
|
||||
} else { // パリティ・ブロックを部分的に保持する場合
|
||||
// スレッドごとに作成するパリティ・ブロックの chunk を変える
|
||||
@@ -143,7 +145,7 @@ loop_count2b += src_num;
|
||||
#endif
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_encode2b += GetTickCount() - time_start2;
|
||||
time_encode2b += clock() - time_start2;
|
||||
#endif
|
||||
}
|
||||
//_mm_sfence(); // メモリーへの書き込みを完了する
|
||||
@@ -153,19 +155,21 @@ time_encode2b += GetTickCount() - time_start2;
|
||||
#ifdef TIMER
|
||||
loop_count2b /= chunk_num; // chunk数で割ってブロック数にする
|
||||
printf("sub-thread : total loop = %d\n", loop_count2a + loop_count2b);
|
||||
if (time_encode2a > 0){
|
||||
i = (int)((__int64)loop_count2a * unit_size * 125 / ((__int64)time_encode2a * 131072));
|
||||
time_sec = (double)time_encode2a / CLOCKS_PER_SEC;
|
||||
if (time_sec > 0){
|
||||
time_speed = ((double)loop_count2a * unit_size) / (time_sec * 1048576);
|
||||
} else {
|
||||
i = 0;
|
||||
time_speed = 0;
|
||||
}
|
||||
if (loop_count2a > 0)
|
||||
printf(" 1st encode %d.%03d sec, %d loop, %d MB/s\n", time_encode2a / 1000, time_encode2a % 1000, loop_count2a, i);
|
||||
if (time_encode2b > 0){
|
||||
i = (int)((__int64)loop_count2b * unit_size * 125 / ((__int64)time_encode2b * 131072));
|
||||
printf(" 1st encode %.3f sec, %d loop, %.0f MB/s\n", time_sec, loop_count2a, time_speed);
|
||||
time_sec = (double)time_encode2b / CLOCKS_PER_SEC;
|
||||
if (time_sec > 0){
|
||||
time_speed = ((double)loop_count2b * unit_size) / (time_sec * 1048576);
|
||||
} else {
|
||||
i = 0;
|
||||
time_speed = 0;
|
||||
}
|
||||
printf(" 2nd encode %d.%03d sec, %d loop, %d MB/s\n", time_encode2b / 1000, time_encode2b % 1000, loop_count2b, i);
|
||||
printf(" 2nd encode %.3f sec, %d loop, %.0f MB/s\n", time_sec, loop_count2b, time_speed);
|
||||
#endif
|
||||
|
||||
// 終了処理
|
||||
@@ -185,7 +189,7 @@ static DWORD WINAPI thread_encode3(LPVOID lpParameter)
|
||||
RS_TH *th;
|
||||
#ifdef TIMER
|
||||
unsigned int loop_count2a = 0, loop_count2b = 0;
|
||||
unsigned int time_start2, time_encode2a = 0, time_encode2b = 0;
|
||||
clock_t time_start2, time_encode2a = 0, time_encode2b = 0;
|
||||
#endif
|
||||
|
||||
th = (RS_TH *)lpParameter;
|
||||
@@ -204,7 +208,7 @@ unsigned int time_start2, time_encode2a = 0, time_encode2b = 0;
|
||||
WaitForSingleObject(hRun, INFINITE); // 計算開始の合図を待つ
|
||||
while (th->now < INT_MAX / 2){
|
||||
#ifdef TIMER
|
||||
time_start2 = GetTickCount();
|
||||
time_start2 = clock();
|
||||
#endif
|
||||
s_buf = th->buf;
|
||||
src_off = th->off; // ソース・ブロック番号
|
||||
@@ -221,7 +225,7 @@ loop_count2a++;
|
||||
#endif
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_encode2a += GetTickCount() - time_start2;
|
||||
time_encode2a += clock() - time_start2;
|
||||
#endif
|
||||
} else { // 全てのパリティ・ブロックを保持する場合
|
||||
// スレッドごとに作成するパリティ・ブロックの chunk を変える
|
||||
@@ -261,7 +265,7 @@ loop_count2b += src_num;
|
||||
#endif
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_encode2b += GetTickCount() - time_start2;
|
||||
time_encode2b += clock() - time_start2;
|
||||
#endif
|
||||
}
|
||||
//_mm_sfence(); // メモリーへの書き込みを完了する
|
||||
@@ -271,19 +275,21 @@ time_encode2b += GetTickCount() - time_start2;
|
||||
#ifdef TIMER
|
||||
loop_count2b /= chunk_num; // chunk数で割ってブロック数にする
|
||||
printf("sub-thread : total loop = %d\n", loop_count2a + loop_count2b);
|
||||
if (time_encode2a > 0){
|
||||
i = (int)((__int64)loop_count2a * unit_size * 125 / ((__int64)time_encode2a * 131072));
|
||||
time_sec = (double)time_encode2a / CLOCKS_PER_SEC;
|
||||
if (time_sec > 0){
|
||||
time_speed = ((double)loop_count2a * unit_size) / (time_sec * 1048576);
|
||||
} else {
|
||||
i = 0;
|
||||
time_speed = 0;
|
||||
}
|
||||
if (loop_count2a > 0)
|
||||
printf(" 1st encode %d.%03d sec, %d loop, %d MB/s\n", time_encode2a / 1000, time_encode2a % 1000, loop_count2a, i);
|
||||
if (time_encode2b > 0){
|
||||
i = (int)((__int64)loop_count2b * unit_size * 125 / ((__int64)time_encode2b * 131072));
|
||||
printf(" 1st encode %.3f sec, %d loop, %.0f MB/s\n", time_sec, loop_count2a, time_speed);
|
||||
time_sec = (double)time_encode2b / CLOCKS_PER_SEC;
|
||||
if (time_sec > 0){
|
||||
time_speed = ((double)loop_count2b * unit_size) / (time_sec * 1048576);
|
||||
} else {
|
||||
i = 0;
|
||||
time_speed = 0;
|
||||
}
|
||||
printf(" 2nd encode %d.%03d sec, %d loop, %d MB/s\n", time_encode2b / 1000, time_encode2b % 1000, loop_count2b, i);
|
||||
printf(" 2nd encode %.3f sec, %d loop, %.0f MB/s\n", time_sec, loop_count2b, time_speed);
|
||||
#endif
|
||||
|
||||
// 終了処理
|
||||
@@ -303,7 +309,8 @@ static DWORD WINAPI thread_encode_gpu(LPVOID lpParameter)
|
||||
HANDLE hRun, hEnd;
|
||||
RS_TH *th;
|
||||
#ifdef TIMER
|
||||
unsigned int time_start2, time_encode2 = 0, loop_count2 = 0;
|
||||
unsigned int loop_count2 = 0;
|
||||
clock_t time_start2, time_encode2 = 0;
|
||||
#endif
|
||||
|
||||
th = (RS_TH *)lpParameter;
|
||||
@@ -320,7 +327,7 @@ unsigned int time_start2, time_encode2 = 0, loop_count2 = 0;
|
||||
WaitForSingleObject(hRun, INFINITE); // 計算開始の合図を待つ
|
||||
while (th->now < INT_MAX / 2){
|
||||
#ifdef TIMER
|
||||
time_start2 = GetTickCount();
|
||||
time_start2 = clock();
|
||||
#endif
|
||||
// GPUはソース・ブロック読み込み中に呼ばれない
|
||||
s_buf = th->buf;
|
||||
@@ -335,24 +342,71 @@ time_start2 = GetTickCount();
|
||||
}
|
||||
|
||||
// 一つの GPUスレッドが全てのパリティ・ブロックを処理する
|
||||
while ((j = InterlockedIncrement(&(th->now))) < parity_num){ // j = ++th_now
|
||||
// factor は定数行列の乗数になる
|
||||
for (i = 0; i < src_num; i++)
|
||||
factor[i] = galois_power(constant[src_off + i], first_num + j);
|
||||
if (OpenCL_method & 8){ // 2ブロックずつ計算する
|
||||
// パリティ・ブロック数が奇数なら、最初の一個だけ別に計算する
|
||||
if (parity_num & 1){
|
||||
InterlockedIncrement(&(th->now)); // 常に j = 0 となる
|
||||
|
||||
// VRAM上のソース・ブロックごとにパリティを追加していく
|
||||
i = gpu_multiply_blocks(src_num, factor, g_buf + (size_t)unit_size * j, unit_size);
|
||||
if (i != 0){
|
||||
th->len = i;
|
||||
InterlockedExchange(&(th->now), INT_MAX / 3); // サブ・スレッドの計算を中断する
|
||||
break;
|
||||
}
|
||||
// factor は定数行列の乗数になる
|
||||
for (i = 0; i < src_num; i++)
|
||||
factor[i] = galois_power(constant[src_off + i], first_num);
|
||||
|
||||
// VRAM上のソース・ブロックごとにパリティを追加していく
|
||||
i = gpu_multiply_blocks(src_num, factor, NULL, g_buf, unit_size);
|
||||
if (i != 0){
|
||||
th->len = i;
|
||||
InterlockedExchange(&(th->now), INT_MAX / 3); // サブ・スレッドの計算を中断する
|
||||
break;
|
||||
}
|
||||
#ifdef TIMER
|
||||
loop_count2 += src_num;
|
||||
#endif
|
||||
}
|
||||
}
|
||||
|
||||
// 残りのブロックは二個ずつ計算する
|
||||
while ((j = InterlockedAdd(&(th->now), 2)) < parity_num){ // th_now += 2, j = th_now
|
||||
j--; // +2 してるから、最初のブロックは -1 する
|
||||
|
||||
// factor は定数行列の乗数になる
|
||||
for (i = 0; i < src_num; i++){
|
||||
int c = constant[src_off + i]; // 同じ定数だけど、何乗するかが異なる
|
||||
factor[i] = galois_power(c, first_num + j);
|
||||
factor[src_num + i] = galois_power(c, first_num + j + 1);
|
||||
}
|
||||
|
||||
// VRAM上のソース・ブロックごとにパリティを追加していく
|
||||
i = gpu_multiply_blocks(src_num, factor, (void *)1, g_buf + (size_t)unit_size * j, unit_size * 2);
|
||||
if (i != 0){
|
||||
th->len = i;
|
||||
InterlockedExchange(&(th->now), INT_MAX / 3); // サブ・スレッドの計算を中断する
|
||||
break;
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_encode2 += GetTickCount() - time_start2;
|
||||
loop_count2 += src_num * 2;
|
||||
#endif
|
||||
}
|
||||
|
||||
} else { // 以前からの1ブロックずつ計算する方式
|
||||
while ((j = InterlockedIncrement(&(th->now))) < parity_num){ // j = ++th_now
|
||||
// factor は定数行列の乗数になる
|
||||
for (i = 0; i < src_num; i++)
|
||||
factor[i] = galois_power(constant[src_off + i], first_num + j);
|
||||
|
||||
// VRAM上のソース・ブロックごとにパリティを追加していく
|
||||
i = gpu_multiply_blocks(src_num, factor, NULL, g_buf + (size_t)unit_size * j, unit_size);
|
||||
if (i != 0){
|
||||
th->len = i;
|
||||
InterlockedExchange(&(th->now), INT_MAX / 3); // サブ・スレッドの計算を中断する
|
||||
break;
|
||||
}
|
||||
#ifdef TIMER
|
||||
loop_count2 += src_num;
|
||||
#endif
|
||||
}
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
time_encode2 += clock() - time_start2;
|
||||
#endif
|
||||
// 最後にVRAMを解放する
|
||||
i = gpu_finish();
|
||||
@@ -365,12 +419,13 @@ time_encode2 += GetTickCount() - time_start2;
|
||||
}
|
||||
#ifdef TIMER
|
||||
printf("gpu-thread :\n");
|
||||
if (time_encode2 > 0){
|
||||
i = (int)((__int64)loop_count2 * unit_size * 125 / ((__int64)time_encode2 * 131072));
|
||||
time_sec = (double)time_encode2 / CLOCKS_PER_SEC;
|
||||
if (time_sec > 0){
|
||||
time_speed = ((double)loop_count2 * unit_size) / (time_sec * 1048576);
|
||||
} else {
|
||||
i = 0;
|
||||
time_speed = 0;
|
||||
}
|
||||
printf(" 2nd encode %d.%03d sec, %d loop, %d MB/s\n", time_encode2 / 1000, time_encode2 % 1000, loop_count2, i);
|
||||
printf(" 2nd encode %.3f sec, %d loop, %.0f MB/s\n", time_sec, loop_count2, time_speed);
|
||||
#endif
|
||||
|
||||
// 終了処理
|
||||
@@ -452,7 +507,7 @@ int encode_method1( // ソース・ブロックが一個だけの場合
|
||||
block_off = 0;
|
||||
while (block_off < block_size){
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
// ソース・ブロックを読み込む
|
||||
len = s_blk[0].size - block_off;
|
||||
@@ -469,7 +524,7 @@ time_start = GetTickCount();
|
||||
s_blk[0].crc = crc_update(s_blk[0].crc, buf, len); // without pad
|
||||
checksum16_altmap(buf, buf + io_size, io_size);
|
||||
#ifdef TIMER
|
||||
time_read += GetTickCount() - time_start;
|
||||
time_read += clock() - time_start;
|
||||
#endif
|
||||
|
||||
// リカバリ・ファイルに書き込むサイズ
|
||||
@@ -482,13 +537,13 @@ time_read += GetTickCount() - time_start;
|
||||
// パリティ・ブロックごとに
|
||||
for (i = 0; i < parity_num; i++){
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
memset(work_buf, 0, unit_size);
|
||||
// factor は 2の乗数になる
|
||||
galois_align_multiply(buf, work_buf, unit_size, galois_power(2, first_num + i));
|
||||
#ifdef TIMER
|
||||
time_calc += GetTickCount() - time_start;
|
||||
time_calc += clock() - time_start;
|
||||
#endif
|
||||
|
||||
// 経過表示
|
||||
@@ -502,7 +557,7 @@ time_calc += GetTickCount() - time_start;
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
// パリティ・ブロックのチェックサムを検証する
|
||||
checksum16_return(work_buf, hash, io_size);
|
||||
@@ -535,7 +590,7 @@ time_start = GetTickCount();
|
||||
goto error_end;
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_write += GetTickCount() - time_start;
|
||||
time_write += clock() - time_start;
|
||||
#endif
|
||||
}
|
||||
|
||||
@@ -565,7 +620,7 @@ time_write += GetTickCount() - time_start;
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
// 最後に Recovery Slice packet のヘッダーを書き込む
|
||||
for (i = 0; i < parity_num; i++){
|
||||
@@ -581,14 +636,14 @@ time_start = GetTickCount();
|
||||
}
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_write += GetTickCount() - time_start;
|
||||
time_write += clock() - time_start;
|
||||
#endif
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
printf("read %d.%03d sec\n", time_read / 1000, time_read % 1000);
|
||||
printf("write %d.%03d sec\n", time_write / 1000, time_write % 1000);
|
||||
printf("encode %d.%03d sec\n", time_calc / 1000, time_calc % 1000);
|
||||
printf("read %.3f sec\n", (double)time_read / CLOCKS_PER_SEC);
|
||||
printf("write %.3f sec\n", (double)time_write / CLOCKS_PER_SEC);
|
||||
printf("encode %.3f sec\n", (double)time_calc / CLOCKS_PER_SEC);
|
||||
#endif
|
||||
|
||||
error_end:
|
||||
@@ -729,7 +784,7 @@ int encode_method2( // ソース・データを全て読み込む場合
|
||||
#ifdef TIMER
|
||||
read_count = 0;
|
||||
skip_count = 0;
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
last_file = -1;
|
||||
for (i = 0; i < source_num; i++){
|
||||
@@ -830,7 +885,7 @@ skip_count++;
|
||||
CloseHandle(hFile);
|
||||
hFile = NULL;
|
||||
#ifdef TIMER
|
||||
time_read += GetTickCount() - time_start;
|
||||
time_read += clock() - time_start;
|
||||
#endif
|
||||
|
||||
WaitForMultipleObjects(cpu_num1, hEnd, TRUE, INFINITE); // サブ・スレッドの計算終了の合図を待つ
|
||||
@@ -930,7 +985,7 @@ skip_count++;
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
// パリティ・ブロックを書き込む
|
||||
work_buf = p_buf;
|
||||
@@ -979,7 +1034,7 @@ time_start = GetTickCount();
|
||||
}
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_write += GetTickCount() - time_start;
|
||||
time_write += clock() - time_start;
|
||||
#endif
|
||||
|
||||
part_off += part_num; // 次のパリティ位置にする
|
||||
@@ -1025,7 +1080,7 @@ time_write += GetTickCount() - time_start;
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
// 最後に Recovery Slice packet のヘッダーを書き込む
|
||||
for (i = 0; i < parity_num; i++){
|
||||
@@ -1041,13 +1096,13 @@ time_start = GetTickCount();
|
||||
}
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_write += GetTickCount() - time_start;
|
||||
time_write += clock() - time_start;
|
||||
#endif
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
printf("read %d.%03d sec\n", time_read / 1000, time_read % 1000);
|
||||
printf("write %d.%03d sec\n", time_write / 1000, time_write % 1000);
|
||||
printf("read %.3f sec\n", (double)time_read / CLOCKS_PER_SEC);
|
||||
printf("write %.3f sec\n", (double)time_write / CLOCKS_PER_SEC);
|
||||
if (prog_num != prog_base)
|
||||
printf(" prog_num = %I64d, prog_base = %I64d\n", prog_num, prog_base);
|
||||
#endif
|
||||
@@ -1186,7 +1241,7 @@ int encode_method3( // パリティ・ブロックを全て保持して、一度
|
||||
src_off = source_off - 1; // まだ計算して無い印
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
for (i = 0; i < read_num; i++){ // スライスを一個ずつ読み込んでメモリー上に配置していく
|
||||
// ソース・ブロックを読み込む
|
||||
@@ -1318,7 +1373,7 @@ time_start = GetTickCount();
|
||||
memcpy(common_buf + packet_off + 16, file_md_ctx.hash, 16);
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_read += GetTickCount() - time_start;
|
||||
time_read += clock() - time_start;
|
||||
#endif
|
||||
|
||||
WaitForMultipleObjects(cpu_num1, hEnd, TRUE, INFINITE); // サブ・スレッドの計算終了の合図を待つ
|
||||
@@ -1393,19 +1448,19 @@ time_read += GetTickCount() - time_start;
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
memcpy(common_buf + common_size, common_buf, common_size); // 後の半分に前半のをコピーする
|
||||
// 最後にパリティ・ブロックのチェックサムを検証して、リカバリ・ファイルに書き込む
|
||||
err = create_recovery_file_1pass(file_path, recovery_path, packet_limit, block_distri,
|
||||
packet_num, common_buf, common_size, footer_buf, footer_size, rcv_hFile, p_buf, NULL, unit_size);
|
||||
#ifdef TIMER
|
||||
time_write = GetTickCount() - time_start;
|
||||
time_write = clock() - time_start;
|
||||
#endif
|
||||
|
||||
#ifdef TIMER
|
||||
printf("read %d.%03d sec\n", time_read / 1000, time_read % 1000);
|
||||
printf("write %d.%03d sec\n", time_write / 1000, time_write % 1000);
|
||||
printf("read %.3f sec\n", (double)time_read / CLOCKS_PER_SEC);
|
||||
printf("write %.3f sec\n", (double)time_write / CLOCKS_PER_SEC);
|
||||
if (prog_num != prog_base - prog_write * parity_num)
|
||||
printf(" prog_num = %I64d != %I64d\n", prog_num, prog_base - prog_write * parity_num);
|
||||
#endif
|
||||
@@ -1577,7 +1632,7 @@ int encode_method4( // 全てのブロックを断片的に保持する場合 (G
|
||||
#ifdef TIMER
|
||||
read_count = 0;
|
||||
skip_count = 0;
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
last_file = -1;
|
||||
for (i = 0; i < source_num; i++){
|
||||
@@ -1678,7 +1733,7 @@ skip_count++;
|
||||
CloseHandle(hFile);
|
||||
hFile = NULL;
|
||||
#ifdef TIMER
|
||||
time_read += GetTickCount() - time_start;
|
||||
time_read += clock() - time_start;
|
||||
#endif
|
||||
|
||||
memset(g_buf, 0, (size_t)unit_size * parity_num); // 待機中に GPU用の領域をゼロ埋めしておく
|
||||
@@ -1817,11 +1872,21 @@ skip_count++;
|
||||
#endif
|
||||
} else if (src_off + src_num + src_max > source_num){
|
||||
src_num = source_num - src_off - src_max;
|
||||
// src_num が 0にならないように、src_num == src_max なら上の last1 にする
|
||||
if ((src_num < src_max) && (src_num + src_max <= vram_max) && (gpu_end * 2 > cpu_end)){
|
||||
src_num += src_max; // GPU担当量が少なくて、余裕がある場合は、残りも全て任せる
|
||||
if (src_num < src_max){
|
||||
if ((src_num + src_max <= vram_max) && (gpu_end * 2 > cpu_end)){
|
||||
src_num += src_max; // GPU担当量が少なくて、余裕がある場合は、残りも全て任せる
|
||||
#ifdef TIMER
|
||||
printf("GPU last +: src_off = %d, src_num = %d + %d\n", src_off, src_num - src_max, src_max);
|
||||
#endif
|
||||
} else if (src_num < src_max / 4){
|
||||
src_num = src_max / 4; // src_num が小さくなり過ぎないようにする
|
||||
#ifdef TIMER
|
||||
printf("GPU last ?: src_off = %d, src_num = %d\n", src_off, src_num);
|
||||
} else {
|
||||
printf("GPU last -: src_off = %d, src_num = %d\n", src_off, src_num);
|
||||
#endif
|
||||
}
|
||||
#ifdef TIMER
|
||||
printf("GPU last +: src_off = %d, src_num = %d + %d\n", src_off, src_num - src_max, src_max);
|
||||
} else {
|
||||
printf("GPU last 2: src_off = %d, src_num = %d\n", src_off, src_num);
|
||||
#endif
|
||||
@@ -1921,7 +1986,7 @@ skip_count++;
|
||||
prog_num += th->size * parity_num;
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
// パリティ・ブロックを書き込む
|
||||
work_buf = p_buf;
|
||||
@@ -1972,7 +2037,7 @@ time_start = GetTickCount();
|
||||
}
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_write += GetTickCount() - time_start;
|
||||
time_write += clock() - time_start;
|
||||
#endif
|
||||
|
||||
block_off += io_size;
|
||||
@@ -2015,7 +2080,7 @@ time_write += GetTickCount() - time_start;
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
// 最後に Recovery Slice packet のヘッダーを書き込む
|
||||
for (i = 0; i < parity_num; i++){
|
||||
@@ -2031,13 +2096,13 @@ time_start = GetTickCount();
|
||||
}
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_write += GetTickCount() - time_start;
|
||||
time_write += clock() - time_start;
|
||||
#endif
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
printf("read %d.%03d sec\n", time_read / 1000, time_read % 1000);
|
||||
printf("write %d.%03d sec\n", time_write / 1000, time_write % 1000);
|
||||
printf("read %.3f sec\n", (double)time_read / CLOCKS_PER_SEC);
|
||||
printf("write %.3f sec\n", (double)time_write / CLOCKS_PER_SEC);
|
||||
if (prog_num != prog_base)
|
||||
printf(" prog_num = %I64d, prog_base = %I64d\n", prog_num, prog_base);
|
||||
#endif
|
||||
@@ -2210,7 +2275,7 @@ int encode_method5( // ソース・ブロックの一部とパリティ・ブロ
|
||||
src_off = source_off - 1; // まだ計算して無い印
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
for (i = 0; i < read_num; i++){ // スライスを一個ずつ読み込んでメモリー上に配置していく
|
||||
// ソース・ブロックを読み込む
|
||||
@@ -2341,7 +2406,7 @@ time_start = GetTickCount();
|
||||
memcpy(common_buf + packet_off + 16, file_md_ctx.hash, 16);
|
||||
}
|
||||
#ifdef TIMER
|
||||
time_read += GetTickCount() - time_start;
|
||||
time_read += clock() - time_start;
|
||||
#endif
|
||||
|
||||
if (source_off == 0)
|
||||
@@ -2462,10 +2527,21 @@ time_read += GetTickCount() - time_start;
|
||||
#endif
|
||||
} else if (src_off + src_num + src_max > read_num){
|
||||
src_num = read_num - src_off - src_max;
|
||||
if ((src_num < src_max) && (src_num + src_max <= vram_max) && (gpu_end * 2 > cpu_end)){
|
||||
src_num += src_max; // GPU担当量が少なくて、余裕がある場合は、残りも全て任せる
|
||||
if (src_num < src_max){
|
||||
if ((src_num + src_max <= vram_max) && (gpu_end * 2 > cpu_end)){
|
||||
src_num += src_max; // GPU担当量が少なくて、余裕がある場合は、残りも全て任せる
|
||||
#ifdef TIMER
|
||||
printf("GPU last +: src_off = %d, src_num = %d + %d\n", src_off, src_num - src_max, src_max);
|
||||
#endif
|
||||
} else if (src_num < src_max / 4){
|
||||
src_num = src_max / 4; // src_num が小さくなり過ぎないようにする
|
||||
#ifdef TIMER
|
||||
printf("GPU last ?: src_off = %d, src_num = %d\n", src_off, src_num);
|
||||
} else {
|
||||
printf("GPU last -: src_off = %d, src_num = %d\n", src_off, src_num);
|
||||
#endif
|
||||
}
|
||||
#ifdef TIMER
|
||||
printf("GPU last +: src_off = %d, src_num = %d + %d\n", src_off, src_num - src_max, src_max);
|
||||
} else {
|
||||
printf("GPU last 2: src_off = %d, src_num = %d\n", src_off, src_num);
|
||||
#endif
|
||||
@@ -2568,19 +2644,19 @@ time_read += GetTickCount() - time_start;
|
||||
}
|
||||
|
||||
#ifdef TIMER
|
||||
time_start = GetTickCount();
|
||||
time_start = clock();
|
||||
#endif
|
||||
memcpy(common_buf + common_size, common_buf, common_size); // 後の半分に前半のをコピーする
|
||||
// 最後にパリティ・ブロックのチェックサムを検証して、リカバリ・ファイルに書き込む
|
||||
err = create_recovery_file_1pass(file_path, recovery_path, packet_limit, block_distri,
|
||||
packet_num, common_buf, common_size, footer_buf, footer_size, rcv_hFile, p_buf, g_buf, unit_size);
|
||||
#ifdef TIMER
|
||||
time_write = GetTickCount() - time_start;
|
||||
time_write = clock() - time_start;
|
||||
#endif
|
||||
|
||||
#ifdef TIMER
|
||||
printf("read %d.%03d sec\n", time_read / 1000, time_read % 1000);
|
||||
printf("write %d.%03d sec\n", time_write / 1000, time_write % 1000);
|
||||
printf("read %.3f sec\n", (double)time_read / CLOCKS_PER_SEC);
|
||||
printf("write %.3f sec\n", (double)time_write / CLOCKS_PER_SEC);
|
||||
if (prog_num != prog_base - prog_write * parity_num)
|
||||
printf(" prog_num = %I64d != %I64d\n", prog_num, prog_base - prog_write * parity_num);
|
||||
#endif
|
||||
|
||||
@@ -1,10 +1,11 @@
|
||||
void calc_table(__local uint *mtab, int id, int factor)
|
||||
{
|
||||
int i, sum = 0;
|
||||
int i, sum;
|
||||
|
||||
for (i = 0; i < 8; i++){
|
||||
sum = (id & (1 << i)) ? (sum ^ factor) : sum;
|
||||
factor = (factor & 0x8000) ? ((factor << 1) ^ 0x1100B) : (factor << 1);
|
||||
sum = ((id << 31) >> 31) & factor;
|
||||
for (i = 1; i < 8; i++){
|
||||
factor = (factor << 1) ^ (((factor << 16) >> 31) & 0x1100B);
|
||||
sum ^= ((id << (31 - i)) >> 31) & factor;
|
||||
}
|
||||
mtab[id] = sum;
|
||||
|
||||
@@ -14,6 +15,30 @@ void calc_table(__local uint *mtab, int id, int factor)
|
||||
mtab[id + 256] = sum;
|
||||
}
|
||||
|
||||
void calc_table2(__local uint *mtab, int id, int factor, int factor2)
|
||||
{
|
||||
int i, sum, sum2, mask;
|
||||
|
||||
mask = (id << 31) >> 31;
|
||||
sum = mask & factor;
|
||||
sum2 = mask & factor2;
|
||||
for (i = 1; i < 8; i++){
|
||||
factor = (factor << 1) ^ (((factor << 16) >> 31) & 0x1100B);
|
||||
factor2 = (factor2 << 1) ^ (((factor2 << 16) >> 31) & 0x1100B);
|
||||
mask = (id << (31 - i)) >> 31;
|
||||
sum ^= mask & factor;
|
||||
sum2 ^= mask & factor2;
|
||||
}
|
||||
mtab[id] = sum | (sum2 << 16);
|
||||
|
||||
sum = (sum << 4) ^ (((sum << 16) >> 31) & 0x88058) ^ (((sum << 17) >> 31) & 0x4402C) ^ (((sum << 18) >> 31) & 0x22016) ^ (((sum << 19) >> 31) & 0x1100B);
|
||||
sum = (sum << 4) ^ (((sum << 16) >> 31) & 0x88058) ^ (((sum << 17) >> 31) & 0x4402C) ^ (((sum << 18) >> 31) & 0x22016) ^ (((sum << 19) >> 31) & 0x1100B);
|
||||
sum2 = (sum2 << 4) ^ (((sum2 << 16) >> 31) & 0x88058) ^ (((sum2 << 17) >> 31) & 0x4402C) ^ (((sum2 << 18) >> 31) & 0x22016) ^ (((sum2 << 19) >> 31) & 0x1100B);
|
||||
sum2 = (sum2 << 4) ^ (((sum2 << 16) >> 31) & 0x88058) ^ (((sum2 << 17) >> 31) & 0x4402C) ^ (((sum2 << 18) >> 31) & 0x22016) ^ (((sum2 << 19) >> 31) & 0x1100B);
|
||||
|
||||
mtab[id + 256] = sum | (sum2 << 16);
|
||||
}
|
||||
|
||||
__kernel void method1(
|
||||
__global uint *src,
|
||||
__global uint *dst,
|
||||
@@ -31,6 +56,7 @@ __kernel void method1(
|
||||
dst[i] = 0;
|
||||
|
||||
for (blk = 0; blk < blk_num; blk++){
|
||||
barrier(CLK_LOCAL_MEM_FENCE);
|
||||
calc_table(mtab, table_id, factors[blk]);
|
||||
barrier(CLK_LOCAL_MEM_FENCE);
|
||||
|
||||
@@ -42,7 +68,6 @@ __kernel void method1(
|
||||
dst[i] ^= sum;
|
||||
}
|
||||
src += BLK_SIZE;
|
||||
barrier(CLK_LOCAL_MEM_FENCE);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -65,6 +90,7 @@ __kernel void method2(
|
||||
}
|
||||
|
||||
for (blk = 0; blk < blk_num; blk++){
|
||||
barrier(CLK_LOCAL_MEM_FENCE);
|
||||
calc_table(mtab, table_id, factors[blk]);
|
||||
barrier(CLK_LOCAL_MEM_FENCE);
|
||||
|
||||
@@ -82,11 +108,182 @@ __kernel void method2(
|
||||
dst[pos + 4] ^= ((sum1 & 0xFF00FF00) >> 8) | (sum2 & 0xFF00FF00);
|
||||
}
|
||||
src += BLK_SIZE;
|
||||
barrier(CLK_LOCAL_MEM_FENCE);
|
||||
}
|
||||
}
|
||||
|
||||
__kernel void method4(
|
||||
__global uint4 *src,
|
||||
__global uint4 *dst,
|
||||
__global ushort *factors,
|
||||
int blk_num)
|
||||
{
|
||||
__local uint mtab[512];
|
||||
int i, blk;
|
||||
uchar4 r0, r1, r2, r3, r4, r5, r6, r7;
|
||||
uchar16 lo, hi;
|
||||
const int work_id = get_global_id(0) * 2;
|
||||
const int work_size = get_global_size(0) * 2;
|
||||
const int table_id = get_local_id(0);
|
||||
|
||||
for (i = work_id; i < BLK_SIZE / 4; i += work_size){
|
||||
dst[i ] = 0;
|
||||
dst[i + 1] = 0;
|
||||
}
|
||||
|
||||
for (blk = 0; blk < blk_num; blk++){
|
||||
barrier(CLK_LOCAL_MEM_FENCE);
|
||||
calc_table(mtab, table_id, factors[blk]);
|
||||
barrier(CLK_LOCAL_MEM_FENCE);
|
||||
|
||||
for (i = work_id; i < BLK_SIZE / 4; i += work_size){
|
||||
lo = as_uchar16(src[i ]);
|
||||
hi = as_uchar16(src[i + 1]);
|
||||
r0 = (uchar4)(as_uchar2((ushort)(mtab[lo.s0] ^ mtab[256 + hi.s0])), as_uchar2((ushort)(mtab[lo.s1] ^ mtab[256 + hi.s1])));
|
||||
r1 = (uchar4)(as_uchar2((ushort)(mtab[lo.s2] ^ mtab[256 + hi.s2])), as_uchar2((ushort)(mtab[lo.s3] ^ mtab[256 + hi.s3])));
|
||||
r2 = (uchar4)(as_uchar2((ushort)(mtab[lo.s4] ^ mtab[256 + hi.s4])), as_uchar2((ushort)(mtab[lo.s5] ^ mtab[256 + hi.s5])));
|
||||
r3 = (uchar4)(as_uchar2((ushort)(mtab[lo.s6] ^ mtab[256 + hi.s6])), as_uchar2((ushort)(mtab[lo.s7] ^ mtab[256 + hi.s7])));
|
||||
r4 = (uchar4)(as_uchar2((ushort)(mtab[lo.s8] ^ mtab[256 + hi.s8])), as_uchar2((ushort)(mtab[lo.s9] ^ mtab[256 + hi.s9])));
|
||||
r5 = (uchar4)(as_uchar2((ushort)(mtab[lo.sa] ^ mtab[256 + hi.sa])), as_uchar2((ushort)(mtab[lo.sb] ^ mtab[256 + hi.sb])));
|
||||
r6 = (uchar4)(as_uchar2((ushort)(mtab[lo.sc] ^ mtab[256 + hi.sc])), as_uchar2((ushort)(mtab[lo.sd] ^ mtab[256 + hi.sd])));
|
||||
r7 = (uchar4)(as_uchar2((ushort)(mtab[lo.se] ^ mtab[256 + hi.se])), as_uchar2((ushort)(mtab[lo.sf] ^ mtab[256 + hi.sf])));
|
||||
dst[i ] ^= as_uint4((uchar16)(r0.x, r0.z, r1.x, r1.z, r2.x, r2.z, r3.x, r3.z, r4.x, r4.z, r5.x, r5.z, r6.x, r6.z, r7.x, r7.z));
|
||||
dst[i + 1] ^= as_uint4((uchar16)(r0.y, r0.w, r1.y, r1.w, r2.y, r2.w, r3.y, r3.w, r4.y, r4.w, r5.y, r5.w, r6.y, r6.w, r7.y, r7.w));
|
||||
}
|
||||
src += BLK_SIZE / 4;
|
||||
}
|
||||
}
|
||||
|
||||
__kernel void method9(
|
||||
__global uint *src,
|
||||
__global uint *dst,
|
||||
__global ushort *factors,
|
||||
int blk_num)
|
||||
{
|
||||
__local uint mtab[512];
|
||||
int i, blk;
|
||||
uint v, sum, sum2;
|
||||
const int work_id = get_global_id(0);
|
||||
const int work_size = get_global_size(0);
|
||||
const int table_id = get_local_id(0);
|
||||
|
||||
for (i = work_id; i < BLK_SIZE; i += work_size){
|
||||
dst[i] = 0;
|
||||
dst[i + BLK_SIZE] = 0;
|
||||
}
|
||||
|
||||
for (blk = 0; blk < blk_num; blk++){
|
||||
barrier(CLK_LOCAL_MEM_FENCE);
|
||||
calc_table2(mtab, table_id, factors[blk], factors[blk_num + blk]);
|
||||
barrier(CLK_LOCAL_MEM_FENCE);
|
||||
|
||||
for (i = work_id; i < BLK_SIZE; i += work_size){
|
||||
v = src[i];
|
||||
sum = mtab[(uchar)v] ^ mtab[256 + (uchar)(v >> 8)];
|
||||
sum2 = mtab[(uchar)(v >> 16)] ^ mtab[256 + (v >> 24)];
|
||||
dst[i] ^= (sum & 0xFFFF) | (sum2 << 16);
|
||||
dst[i + BLK_SIZE] ^= (sum >> 16) | (sum2 & 0xFFFF0000);
|
||||
}
|
||||
src += BLK_SIZE;
|
||||
}
|
||||
}
|
||||
|
||||
__kernel void method10(
|
||||
__global uint *src,
|
||||
__global uint *dst,
|
||||
__global ushort *factors,
|
||||
int blk_num)
|
||||
{
|
||||
__local uint mtab[512];
|
||||
int i, blk, pos;
|
||||
uint lo, hi, t0, t1, t2, t3;
|
||||
const int work_id = get_global_id(0) * 2;
|
||||
const int work_size = get_global_size(0) * 2;
|
||||
const int table_id = get_local_id(0);
|
||||
|
||||
for (i = work_id; i < BLK_SIZE; i += work_size){
|
||||
dst[i ] = 0;
|
||||
dst[i + 1] = 0;
|
||||
dst[i + BLK_SIZE ] = 0;
|
||||
dst[i + BLK_SIZE + 1] = 0;
|
||||
}
|
||||
|
||||
for (blk = 0; blk < blk_num; blk++){
|
||||
barrier(CLK_LOCAL_MEM_FENCE);
|
||||
calc_table2(mtab, table_id, factors[blk], factors[blk_num + blk]);
|
||||
barrier(CLK_LOCAL_MEM_FENCE);
|
||||
|
||||
for (i = work_id; i < BLK_SIZE; i += work_size){
|
||||
pos = (i & ~7) + ((i & 7) >> 1);
|
||||
lo = src[pos ];
|
||||
hi = src[pos + 4];
|
||||
t0 = mtab[(uchar)lo] ^ mtab[256 + (uchar)hi];
|
||||
t1 = mtab[(uchar)(lo >> 8)] ^ mtab[256 + (uchar)(hi >> 8)];
|
||||
t2 = mtab[(uchar)(lo >> 16)] ^ mtab[256 + (uchar)(hi >> 16)];
|
||||
t3 = mtab[lo >> 24] ^ mtab[256 + (hi >> 24)];
|
||||
dst[pos ] ^= (uchar)t0 | ((t1 << 8) & 0xFF00) | ((t2 << 16) & 0xFF0000) | (t3 << 24);
|
||||
dst[pos + 4] ^= (uchar)(t0 >> 8) | (t1 & 0xFF00) | ((t2 << 8) & 0xFF0000) | ((t3 << 16) & 0xFF000000);
|
||||
dst[pos + BLK_SIZE ] ^= (uchar)(t0 >> 16) | ((t1 >> 8) & 0xFF00) | (t2 & 0xFF0000) | ((t3 << 8) & 0xFF000000);
|
||||
dst[pos + BLK_SIZE + 4] ^= (t0 >> 24) | ((t1 >> 16) & 0xFF00) | ((t2 >> 8) & 0xFF0000) | (t3 & 0xFF000000);
|
||||
}
|
||||
src += BLK_SIZE;
|
||||
}
|
||||
}
|
||||
|
||||
__kernel void method12(
|
||||
__global uint4 *src,
|
||||
__global uint4 *dst,
|
||||
__global ushort *factors,
|
||||
int blk_num)
|
||||
{
|
||||
__local uint mtab[512];
|
||||
int i, blk;
|
||||
uchar4 r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, rA, rB, rC, rD, rE, rF;
|
||||
uchar16 lo, hi;
|
||||
const int work_id = get_global_id(0) * 2;
|
||||
const int work_size = get_global_size(0) * 2;
|
||||
const int table_id = get_local_id(0);
|
||||
|
||||
for (i = work_id; i < BLK_SIZE / 4; i += work_size){
|
||||
dst[i ] = 0;
|
||||
dst[i + 1] = 0;
|
||||
dst[i + BLK_SIZE / 4 ] = 0;
|
||||
dst[i + BLK_SIZE / 4 + 1] = 0;
|
||||
}
|
||||
|
||||
for (blk = 0; blk < blk_num; blk++){
|
||||
barrier(CLK_LOCAL_MEM_FENCE);
|
||||
calc_table2(mtab, table_id, factors[blk], factors[blk_num + blk]);
|
||||
barrier(CLK_LOCAL_MEM_FENCE);
|
||||
|
||||
for (i = work_id; i < BLK_SIZE / 4; i += work_size){
|
||||
lo = as_uchar16(src[i ]);
|
||||
hi = as_uchar16(src[i + 1]);
|
||||
r0 = as_uchar4(mtab[lo.s0] ^ mtab[256 + hi.s0]);
|
||||
r1 = as_uchar4(mtab[lo.s1] ^ mtab[256 + hi.s1]);
|
||||
r2 = as_uchar4(mtab[lo.s2] ^ mtab[256 + hi.s2]);
|
||||
r3 = as_uchar4(mtab[lo.s3] ^ mtab[256 + hi.s3]);
|
||||
r4 = as_uchar4(mtab[lo.s4] ^ mtab[256 + hi.s4]);
|
||||
r5 = as_uchar4(mtab[lo.s5] ^ mtab[256 + hi.s5]);
|
||||
r6 = as_uchar4(mtab[lo.s6] ^ mtab[256 + hi.s6]);
|
||||
r7 = as_uchar4(mtab[lo.s7] ^ mtab[256 + hi.s7]);
|
||||
r8 = as_uchar4(mtab[lo.s8] ^ mtab[256 + hi.s8]);
|
||||
r9 = as_uchar4(mtab[lo.s9] ^ mtab[256 + hi.s9]);
|
||||
rA = as_uchar4(mtab[lo.sa] ^ mtab[256 + hi.sa]);
|
||||
rB = as_uchar4(mtab[lo.sb] ^ mtab[256 + hi.sb]);
|
||||
rC = as_uchar4(mtab[lo.sc] ^ mtab[256 + hi.sc]);
|
||||
rD = as_uchar4(mtab[lo.sd] ^ mtab[256 + hi.sd]);
|
||||
rE = as_uchar4(mtab[lo.se] ^ mtab[256 + hi.se]);
|
||||
rF = as_uchar4(mtab[lo.sf] ^ mtab[256 + hi.sf]);
|
||||
dst[i ] ^= as_uint4((uchar16)(r0.x, r1.x, r2.x, r3.x, r4.x, r5.x, r6.x, r7.x, r8.x, r9.x, rA.x, rB.x, rC.x, rD.x, rE.x, rF.x));
|
||||
dst[i + 1] ^= as_uint4((uchar16)(r0.y, r1.y, r2.y, r3.y, r4.y, r5.y, r6.y, r7.y, r8.y, r9.y, rA.y, rB.y, rC.y, rD.y, rE.y, rF.y));
|
||||
dst[i + BLK_SIZE / 4 ] ^= as_uint4((uchar16)(r0.z, r1.z, r2.z, r3.z, r4.z, r5.z, r6.z, r7.z, r8.z, r9.z, rA.z, rB.z, rC.z, rD.z, rE.z, rF.z));
|
||||
dst[i + BLK_SIZE / 4 + 1] ^= as_uint4((uchar16)(r0.w, r1.w, r2.w, r3.w, r4.w, r5.w, r6.w, r7.w, r8.w, r9.w, rA.w, rB.w, rC.w, rD.w, rE.w, rF.w));
|
||||
}
|
||||
src += BLK_SIZE / 4;
|
||||
}
|
||||
}
|
||||
|
||||
__kernel void method16(
|
||||
__global uint *src,
|
||||
__global uint *dst,
|
||||
__global ushort *factors,
|
||||
@@ -94,7 +291,7 @@ __kernel void method4(
|
||||
{
|
||||
__local int table[16];
|
||||
__local uint cache[256];
|
||||
int i, j, blk, pos, sht, mask;
|
||||
int i, j, blk, pos, mask, tmp;
|
||||
uint sum;
|
||||
const int work_id = get_global_id(0);
|
||||
const int work_size = get_global_size(0);
|
||||
@@ -104,11 +301,12 @@ __kernel void method4(
|
||||
|
||||
for (blk = 0; blk < blk_num; blk++){
|
||||
if (get_local_id(0) == 0){
|
||||
pos = factors[blk] << 16;
|
||||
table[0] = pos;
|
||||
tmp = factors[blk];
|
||||
table[0] = tmp;
|
||||
for (j = 1; j < 16; j++){
|
||||
pos = (pos << 1) ^ ((pos >> 31) & 0x100B0000);
|
||||
table[j] = pos;
|
||||
mask = (tmp & 0x8000) ? 0x1100B : 0;
|
||||
tmp = (tmp << 1) ^ mask;
|
||||
table[j] = tmp;
|
||||
}
|
||||
}
|
||||
barrier(CLK_LOCAL_MEM_FENCE);
|
||||
@@ -119,10 +317,11 @@ __kernel void method4(
|
||||
barrier(CLK_LOCAL_MEM_FENCE);
|
||||
|
||||
sum = 0;
|
||||
sht = (i & 60) >> 2;
|
||||
tmp = (i & 60) >> 2;
|
||||
tmp = 0x8000 >> tmp;
|
||||
pos &= ~60;
|
||||
for (j = 15; j >= 0; j--){
|
||||
mask = (table[j] << sht) >> 31;
|
||||
mask = (table[j] & tmp) ? 0xFFFFFFFF : 0;
|
||||
sum ^= mask & cache[pos];
|
||||
pos += 4;
|
||||
}
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
// verify.c
|
||||
// Copyright : 2022-10-14 Yutaka Sawada
|
||||
// Copyright : 2024-06-09 Yutaka Sawada
|
||||
// License : GPL
|
||||
|
||||
#ifndef _UNICODE
|
||||
@@ -1253,21 +1253,22 @@ static int search_block_slide(
|
||||
slice_ctx *sc)
|
||||
{
|
||||
unsigned char *buf, hash[16], hash2[16], err_mag, *short_use;
|
||||
int i, j, find_num, find_flag, find_next, find_last, short_next;
|
||||
int i, j, find_num, find_flag, find_next, find_last, short_next, short2_next, tmp_next;
|
||||
int block_count, short_count, tiny_count, tiny_skip, num, i1, i2, i3, i4;
|
||||
int *order, *index, index_shift;
|
||||
unsigned int len, off, end_off, err_off;
|
||||
unsigned int prev_crc, fail_count, rear_off, overlap_count;
|
||||
unsigned int crc, *crcs, *short_crcs;
|
||||
unsigned int time_last, time_slide;
|
||||
__int64 file_off, file_next, short_off, fail_off;
|
||||
__int64 file_off, file_next, short_off, short2_off, tmp_off, fail_off;
|
||||
|
||||
if (file_size + 1 < last_off + (__int64)(sc->min_size))
|
||||
return 0; // 小さすぎるファイルは調べない
|
||||
find_num = 0; // このファイル内で何ブロック見つけたか
|
||||
find_next = -1; // 次に見つかると予想したブロックの番号
|
||||
find_last = -1; // 最後に見つけたブロックの番号 (-1=不明)
|
||||
short_next = -1;
|
||||
short_next = -1; // 予想される末尾ブロックの番号
|
||||
short2_next = -1;
|
||||
fail_count = 0; // CRC は一致したけど MD5 が違った回数
|
||||
fail_off = 0;
|
||||
rear_off = 0;
|
||||
@@ -1278,24 +1279,32 @@ static int search_block_slide(
|
||||
find_last = find_next - 1; // 最後に見つけたブロックの番号
|
||||
if ((last_off >= files[num1].size) || (last_off + block_size > file_size + 1))
|
||||
find_next = -1; // 予想位置がファイル・サイズを超えると駄目
|
||||
if ((last_size < block_size) && (files[num1].b_num >= 2) && // 末尾の半端なブロックの番号と想定位置
|
||||
(last_off < files[num1].size) && (files[num1].size <= file_size + 1)){
|
||||
short_next = files[num1].b_off + files[num1].b_num - 1; // 末尾ブロックの番号
|
||||
if (find_next == short_next)
|
||||
if ((last_size < block_size) && (last_off < files[num1].size)){ // 末尾の半端なブロックの番号と想定位置
|
||||
tmp_next = files[num1].b_off + files[num1].b_num - 1; // 末尾ブロックの番号
|
||||
if (find_next == tmp_next)
|
||||
find_next = -1; // 予想が重複したら末尾ブロックとして探す
|
||||
short_off = files[num1].size - last_size;
|
||||
// ファイルサイズが1ブロック未満でも、同じサイズならエラー訂正を試みる
|
||||
} else if ((last_off == 0) && (file_size == files[num1].size) && (file_size < (__int64)block_size)){
|
||||
short_off = 0;
|
||||
short_next = files[num1].b_off;
|
||||
if ((files[num1].b_num >= 2) && (files[num1].size <= file_size + 1)){ // 本来の位置を調べる
|
||||
short_next = tmp_next;
|
||||
short_off = files[num1].size - last_size;
|
||||
} else if ((last_off == 0) && (file_size == last_size)){ // ファイルが1ブロック未満でも、同じサイズならエラー訂正を試みる
|
||||
short_next = tmp_next;
|
||||
short_off = 0;
|
||||
}
|
||||
if (last_size < file_size){ // 末尾を調べる
|
||||
short2_next = tmp_next;
|
||||
short2_off = file_size - last_size;
|
||||
}
|
||||
}
|
||||
if (file_size > files[num1].size){
|
||||
rear_off = (unsigned int)((file_size - files[num1].size) % (__int64)block_size);
|
||||
} else if (file_size < files[num1].size){
|
||||
rear_off = block_size - (unsigned int)((files[num1].size - file_size) % (__int64)block_size);
|
||||
}
|
||||
//printf("file = %d, find_next = %d, find_last = %d\n", num1, find_next, find_last);
|
||||
//printf("short_off = %I64d, short_next = %d, rear_off = %d\n", short_off, short_next, rear_off);
|
||||
/* printf("file = %d, find_next = %d, find_last = %d, rear_off = %d\n", num1, find_next, find_last, rear_off);
|
||||
if (short_next >= 0)
|
||||
printf("short_off = %I64d, short_next = %d\n", short_off, short_next);
|
||||
if (short2_next >= 0)
|
||||
printf("short2_off = %I64d, short2_next = %d\n", short2_off, short2_next);*/
|
||||
}
|
||||
file_off = last_off; // 検査開始位置から調べる
|
||||
buf = sc->buf;
|
||||
@@ -1374,8 +1383,11 @@ static int search_block_slide(
|
||||
if (last_off < file_off + last_size)
|
||||
last_off = file_off + last_size; // 一番大きな半端なブロックの終端
|
||||
find_next = -2; // 小さなファイルが見つかった = ブロック検出の予想が外れた
|
||||
if (i == short_next)
|
||||
short_next = -1; // 末尾ブロックは検出済み
|
||||
if (i == short_next){ // この末尾ブロックは検出済み
|
||||
short_next = -1;
|
||||
} else if (i == short2_next){
|
||||
short2_next = -1;
|
||||
}
|
||||
|
||||
// 経過表示
|
||||
if (GetTickCount() - time_last >= UPDATE_TIME){
|
||||
@@ -1416,7 +1428,7 @@ static int search_block_slide(
|
||||
|
||||
// ブロック・サイズごとに探す
|
||||
if (((block_count > 0) && ((file_off + (__int64)block_size <= file_size)
|
||||
|| (find_next >= 0))) || (short_next >= 0)){ // ブロックの位置を予想して探す
|
||||
|| (find_next >= 0))) || (short_next >= 0) || (short2_next >= 0)){ // ブロックの位置を予想して探す
|
||||
// 前からスライドさせながらチェックサムを比較する
|
||||
//printf("slide search from %I64d, file %d, next = %d\n", file_off, num1, find_next);
|
||||
off = 0; // buf 内でのオフセット
|
||||
@@ -1445,8 +1457,13 @@ static int search_block_slide(
|
||||
while (off < end_off){
|
||||
find_flag = -2;
|
||||
// 次の番号のブロックがその位置にあるかを先に調べる (発見済みでも)
|
||||
if ((short_next >= 0) && (file_off + off == short_off)){ // 半端なブロックなら
|
||||
i = short_next;
|
||||
if (((short_next >= 0) && (file_off + off == short_off)) ||
|
||||
((short2_next >= 0) && (file_off + off == short2_off))){ // 半端なブロックなら
|
||||
if ((short_next >= 0) && (file_off + off == short_off)){
|
||||
i = short_next;
|
||||
} else {
|
||||
i = short2_next;
|
||||
}
|
||||
num = s_blk[i].file;
|
||||
if ((short_use[num] & 4) == 0){ // パディング部分を取り除いた CRC-32 を逆算する
|
||||
short_crcs[num] = crc_reverse_zero(s_blk[i].crc, block_size - s_blk[i].size);
|
||||
@@ -1456,7 +1473,8 @@ static int search_block_slide(
|
||||
find_flag = correct_error(buf + off, s_blk[i].size, s_blk[i].hash, short_crcs[num], &err_off, &err_mag);
|
||||
if (find_flag == 0)
|
||||
find_flag = 2;
|
||||
} else if ((find_next >= 0) && (file_off + off == last_off)){ // フルサイズのブロックなら
|
||||
}
|
||||
if ((find_flag < 0) && (find_next >= 0) && (file_off + off == last_off)){ // フルサイズのブロックなら
|
||||
i = find_next;
|
||||
if (crc == s_blk[i].crc){
|
||||
data_md5(buf + off, block_size, hash);
|
||||
@@ -1661,20 +1679,81 @@ static int search_block_slide(
|
||||
find_next = i + 1;
|
||||
if ((find_next >= source_num) || (s_blk[find_next].file != num)){
|
||||
// 最後までいった、またはファイルが異なる
|
||||
short_next = -1;
|
||||
find_next = -1;
|
||||
if ((short_next >= 0) && ((s_blk[short_next].exist & 0x1000) != 0))
|
||||
short_next = -1;
|
||||
if ((short2_next >= 0) && ((s_blk[short2_next].exist & 0x1000) != 0))
|
||||
short2_next = -1;
|
||||
} else if (s_blk[find_next].size < block_size){ // 半端なブロックは別に調べる
|
||||
short_next = find_next;
|
||||
short_off = file_off + off + block_size;
|
||||
//printf("short_off = %I64d, short_next = %d, file = %d\n", short_off, short_next, num);
|
||||
if (file_off + off + block_size + s_blk[find_next].size <= file_size){ // ファイル内に収まってる時だけ
|
||||
tmp_next = find_next;
|
||||
tmp_off = file_off + off + block_size;
|
||||
if (find_flag <= 3){ // 順当な位置で見つけた場合
|
||||
if ((tmp_next == short_next) && (tmp_off == short_off)){
|
||||
// 予測済みのと一致するなら何もしない
|
||||
} else if ((short_next >= 0) && (short2_next < 0)){ // 予測と異なるけど、別のが空いてるなら、そっちに記録する
|
||||
//printf("short2_off = %I64d, short2_next = %d, file = %d\n", tmp_off, tmp_next, num);
|
||||
short2_next = tmp_next;
|
||||
short2_off = tmp_off;
|
||||
} else {
|
||||
if ((short_next >= 0) && (tmp_next == short2_next) && (tmp_off == short2_off)){ // 既に予測済みのと一致するなら入れ替える
|
||||
short2_next = short_next;
|
||||
short2_off = short_off;
|
||||
//printf("exchange short2_off = %I64d, short2_next = %d\n", short2_off, short2_next);
|
||||
}
|
||||
//printf("short_off = %I64d, short_next = %d, file = %d\n", tmp_off, tmp_next, num);
|
||||
short_next = tmp_next;
|
||||
short_off = tmp_off;
|
||||
}
|
||||
} else if ((short_next < 0) &&
|
||||
(((__int64)block_size * (__int64)(tmp_next - files[num].b_off) == tmp_off) ||
|
||||
(tmp_off + s_blk[tmp_next].size == file_size))){
|
||||
// 検出ブロックが順当でなくても、末尾ブロックの開始位置や末端がファイル・サイズに一致すれば
|
||||
//printf("short_off = %I64d, short_next = %d, file = %d\n", tmp_off, tmp_next, num);
|
||||
short_next = tmp_next;
|
||||
short_off = tmp_off;
|
||||
} else {
|
||||
//printf("short2_off = %I64d, short2_next = %d, file = %d\n", tmp_off, tmp_next, num);
|
||||
short2_next = tmp_next;
|
||||
short2_off = tmp_off;
|
||||
}
|
||||
}
|
||||
find_next = -1;
|
||||
} else {
|
||||
short_next = files[num].b_off + files[num].b_num - 1; // 末尾ブロックの番号
|
||||
if (s_blk[short_next].size < block_size){ // 半端なブロックは別に調べる
|
||||
short_off = file_off + off + (__int64)(short_next - i) * (__int64)block_size;
|
||||
//printf("short_off = %I64d, short_next = %d, file = %d\n", short_off, short_next, num);
|
||||
} else {
|
||||
short_next = -1;
|
||||
tmp_next = files[num].b_off + files[num].b_num - 1; // 末尾ブロックの番号
|
||||
if (s_blk[tmp_next].size < block_size){ // 半端なブロックは別に調べる
|
||||
tmp_off = file_off + off + (__int64)(tmp_next - i) * (__int64)block_size;
|
||||
if (tmp_off + s_blk[tmp_next].size <= file_size){ // ファイル内に収まってる時だけ
|
||||
if (find_flag <= 3){ // 順当な位置で見つけた場合
|
||||
if ((tmp_next == short_next) && (tmp_off == short_off)){
|
||||
// 予測済みのと一致するなら何もしない
|
||||
} else if ((short_next >= 0) && (short2_next < 0)){ // 予測と異なるけど、別のが空いてるなら、そっちに記録する
|
||||
//printf("far short2_off = %I64d, short2_next = %d, file = %d\n", tmp_off, tmp_next, num);
|
||||
short2_next = tmp_next;
|
||||
short2_off = tmp_off;
|
||||
} else {
|
||||
if ((short_next >= 0) && (tmp_next == short2_next) && (tmp_off == short2_off)){ // 既に予測済みのと一致するなら入れ替える
|
||||
short2_next = short_next;
|
||||
short2_off = short_off;
|
||||
//printf("exchange short2_off = %I64d, short2_next = %d\n", short2_off, short2_next);
|
||||
}
|
||||
//printf("far short_off = %I64d, short_next = %d, file = %d\n", tmp_off, tmp_next, num);
|
||||
short_next = tmp_next;
|
||||
short_off = tmp_off;
|
||||
}
|
||||
} else if ((short_next < 0) &&
|
||||
(((__int64)block_size * (__int64)(tmp_next - files[num].b_off) == tmp_off) ||
|
||||
(tmp_off + s_blk[tmp_next].size == file_size))){
|
||||
// 検出ブロックが順当でなくても、末尾ブロックの開始位置や末端がファイル・サイズに一致すれば
|
||||
//printf("far short_off = %I64d, short_next = %d, file = %d\n", tmp_off, tmp_next, num);
|
||||
short_next = tmp_next;
|
||||
short_off = tmp_off;
|
||||
} else if ((short2_next != tmp_next) || (short2_off != tmp_off)){
|
||||
//printf("far short2_off = %I64d, short2_next = %d, file = %d\n", tmp_off, tmp_next, num);
|
||||
short2_next = tmp_next;
|
||||
short2_off = tmp_off;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
tiny_skip = 0; // 小さなファイルをブロック直後に一回だけ探す
|
||||
|
||||
@@ -1,2 +1,2 @@
|
||||
#define FILE_VERSION "1.3.3.1" // ファイルのバージョン番号
|
||||
#define FILE_VERSION "1.3.3.3" // ファイルのバージョン番号
|
||||
#define PRODUCT_VERSION "1.3.3" // 製品のバージョン番号
|
||||
|
||||
Reference in New Issue
Block a user