Release note of version 1.3.3.3

Update release day
Fixed bug in verifying external files
2024-06-15 13:50:52 +09:00 · 2024-06-15 13:42:41 +09:00 · 2024-06-09 12:48:50 +09:00 · 2024-06-09 12:45:37 +09:00 · 2024-04-13 11:45:37 +09:00 · 2024-04-13 11:44:29 +09:00
32 changed files with 1315 additions and 531 deletions
--- a/README.md
+++ b/README.md
@@ -1,64 +1,55 @@
 # MultiPar

-### v1.3.3.1 is public
+### v1.3.3.3 is public

-&nbsp; This is a testing version to improve speed of PAR2 calculation. 
-Because the new method isn't tested so much, there may be a bug, failure, or mistake. 
-Be careful to use this non-stable version. 
-When you don't want to test by yourself, you should not use this yet. 
-If you see a problem, please report the incident. 
-I will try to solve as possible as I can.
+&nbsp; I fixed a few rare bugs in this version. 
+While most users were not affected by those problems, 
+those who saw the matter would better use new version. 
+If there is a problem still, I will fix as possible as I can. 
+I updated some help documents about Batch script. 
+I mentioned the location of help files in ReadMe text.

-&nbsp; CPU's L3 cache optimization depends on hardware environment. 
-It's difficult to guess the best setting for unknown type. 
-It seems to work well on Intel and AMD 's most CPUs. 
-Thanks Anime Tosho and MikeSW17 for long tests. 
-But, I'm not sure the perfomance of rare strange kind CPUs. 
-If you want to compare speed of different settings on your CPU, 
-you may try samples (TestBlock_2023-08-31.zip) in "MultiPar_sample" folder 
-on [OneDrive](https://1drv.ms/u/s!AtGhNMUyvbWOg0cF2UHcs709Icv4).
+&nbsp; New version supports a PC with max 8 OpenCL devices. 
+Thanks [Yi Gu for reporting bug in a rare environment](https://github.com/Yutaka-Sawada/MultiPar/issues/110). 
+I didn't think a user put so many OpenCL devices on a PC. 
+It will detect a Graphics board correctly.

-&nbsp; I improved GPU implementation very much. 
-Thanks [Slava46 and K2M74 for many tests](https://github.com/Yutaka-Sawada/MultiPar/issues/99). 
-While I almost gave up to increase speed, their effort encouraged me to try many ways. 
-Without their aid, I could not implement this GPU function. 
-OpenCL perfomance is varied in every graphics boards. 
-If you have a fast graphics board, enabling "GPU acceleration" would be faster. 
-If it's not so fast (or is slow) on your PC, just un-check the feature.
+&nbsp; I improved source file splitting feature at creating PAR2 files. 
+Thanks [AreteOne for reporting bug and suggestion of improvment](https://github.com/Yutaka-Sawada/MultiPar/issues/117). 
+When file extension is a number, it didn't handle properly. 
+If someone saw strange behavior at file splitting ago, it should have been solved in this version.

-&nbsp; I saw a new feature of Inno Setup 6, which changes install mode. 
-It shows a dialog to ask which install mode. 
-Then, a user can install MultiPar in "Program Files" directory by selecting "Install for all users". 
-This method may be easier than starting installer by "Run as administrator". 
-I test the selection dialog at this version. 
-If there is no problem nor complaint from users, I use this style in later versions, too.
+&nbsp; I fixed a bug in verifying external files. 
+It might not find the last slice in a source file, when the file data is redundant. 
+Thanks [dle-fr for reporting bug and testing many times](https://github.com/Yutaka-Sawada/MultiPar/issues/130). 
+This solution may improve verification of damaged files, too. 
+When source files are mostly random data like commpressed archive, there was no problem.


-[ Changes from 1.3.3.0 to 1.3.3.1 ]  
+[ Changes from 1.3.3.2 to 1.3.3.3 ]  

 Installer update
- It shows dialog to select "per user" or "per machine" installation.
+- Inno Setup was updated from v6.2.2 to v6.3.1.

 PAR2 client update
- Change
-  - Max number of threads to read files on SSD was increased to 6.
-
- Improvement
-  - GPU acceleration would become faster.
+- Bug fix
+  - Fixed a bug in GPU acceleration, when there are many OpenCL devices.
+  - Failure of splitting source files with numerical extension was fixed.
+  - Faulty prediction of the last block in a file with repeated data was fixed.


 [ Hash value ]  

-MultiPar1331.zip  
-MD5: ECFC1570C839DD30A2492A7B05C2AD6E  
-SHA1: 5E0E4CC38DAA995294A93ECA10AEB3AE84596170  
+MultiPar1333.zip  
+MD5: 01A201CA340C33053E6D7D2604D54019  
+SHA1: F7C30A7BDEB4152820C9CFF8D0E3DA719F69D7C6  

-MultiPar1331_setup.exe  
-MD5: A55E6FA5A6853CB42E3410F35706BAD9  
-SHA1: 8D46BD6702E82ABA9ACCFA5223B2763B4DCEFE9E  
+MultiPar1333_setup.exe  
+MD5: 33F9E441F5C1B2C00040E9BAFA7CC1A9  
+SHA1: 6CEBED8CECC9AAC5E8070CD5E8D1EDF7BBBC523A  
 &nbsp; To install under "Program Files" or "Program Files (x86)" directory, 
 you must select "Install for all users" at the first dialog.

 &nbsp; Old versions and source code packages are available at 
 [GitHub](https://github.com/Yutaka-Sawada/MultiPar/releases) or 
-[OneDrive](https://1drv.ms/u/s!AtGhNMUyvbWOaSo1n_R8awJ_hg0).
+[OneDrive](https://1drv.ms/f/c/8eb5bd32c534a1d1/QtGhNMUyvbUggI5pAAAAAAAAKjWf9HxrAn-GDQ).
--- a/alpha/MultiPar.exe
+++ b/alpha/MultiPar.exe
--- a/alpha/ReadMe_English.txt
+++ b/alpha/ReadMe_English.txt
@@ -25,7 +25,13 @@ Be careful to use those special features.

 [ System requirement ]

- MultiPar requires a PC with Windows Vista or later (Windows 7, 8, 10).
+ MultiPar requires a PC with Windows Vista or later (Windows 7, 8, 10, 11).
+
+[ Usage manual or Help documents ]
+
+ There are some usage manual or help documents in "help" folder.
+English pages exists in "help/0409" folder.
+You may open the manual by pushing "F1-key", while using MultiPar.


 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
--- a/alpha/ReadMe_Japanese.txt
+++ b/alpha/ReadMe_Japanese.txt
@@ -54,7 +54,13 @@ QuickPar など他の PAR クライアントはコメント機能に対応して

 [ 動作環境 ]

-　Windows Vista かそれ以降 (Windows 7, 8, 10) のパソコンが必要です。
+　Windows Vista かそれ以降 (Windows 7, 8, 10, 11) のパソコンが必要です。
+
+[ 使い方や解説文章 ]
+
+　ほとんど英語ですが、help フォルダーの中に使い方や解説文章が入ってます。
+日本語のページは help\0411 フォルダーの中にあります。
+MultiPar の使用中に、F1-key を押すと、マニュアルが表示されます。


 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
--- a/alpha/Update_English.txt
+++ b/alpha/Update_English.txt
@@ -1,5 +1,31 @@
 Release note of v1.3.3 tree

+ par2j's "lc" option was changed to support more threads.
+ Windows Vista will be removed from supported OS.
+
+/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
+
+[ Changes from 1.3.3.2 to 1.3.3.3 ] (2024/06/15)
+
+Installer update
+ Inno Setup was updated from v6.2.2 to v6.3.1.
+
+PAR2 client update
+ Bug fix
+  Fixed a bug in GPU acceleration, when there are many OpenCL devices.
+  Failure of splitting source files with numerical extension was fixed.
+  Faulty prediction of the last block in a file with repeated data was fixed.
+
+/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
+
+[ Changes from 1.3.3.1 to 1.3.3.2 ] (2024/01/10)
+
+PAR2 client update
+ Improvement
+  GPU acceleration will work well on AMD graphics boards.
+
+/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
+
 [ Changes from 1.3.3.0 to 1.3.3.1 ] (2023/11/11)

 Installer update
--- a/alpha/Update_Japanese.txt
+++ b/alpha/Update_Japanese.txt
@@ -1,6 +1,6 @@
-v1.3.3 の更新情報 (2023/11/11)
+v1.3.3 の更新情報 (2024/06/15)

-　まだ動作実験中ですので、不安な人は前のバージョンを使ってください。
+　Windows Vista を動作対象の OS から外す予定です。

 [ 1.3.2 から 1.3.3 への変更点 ]

--- a/alpha/help/0409/batch.htm
+++ b/alpha/help/0409/batch.htm
@@ -16,7 +16,7 @@ textarea{width:100%;}

 <p>&nbsp
 Because MultiPar consists of PAR clients and GUI, 
-it is possible to use a PAR client on Command prompt. 
+it is possible to use a PAR client on Command Prompt. 
 Read a manual of command line for the details of command and option. 
 It's available by batch file (or command script).
 </p>
@@ -27,13 +27,57 @@ Modify the options for PAR2 files, and set the path of <tt>par2j.exe</tt>.
 Write absolute path like;<br>
 <code>SET par2_path="C:\something directory\MultiPar\par2j.exe"</code><br>
 Read manuals (<tt>Command_***.txt</tt>) to know the detail of options. 
-Read Windows OS 's help for Command prompt's usage.
+Read Windows OS 's help for Command Prompt's usage.
 </p>
 <p>&nbsp
 Save a sample script to a file like <tt>batch.bat</tt> or <tt>batch.cmd</tt>. 
-To specify path of files or folders, Drag & Drop them on the batch file. 
-When you put a shortcut icon of the batch file in "SendTo" menu, 
-you can call the batch file by selecting files then Righ-Click & SendTo.
+There are some ways to specify a file or folder.<br>
+<ol>
+<li>Type everytime by keyboard on Command Prompt
+<p>&nbsp
+If you use the batch file at Command Prompt, change directory to the batch file at first. 
+Next, type <tt>batch.bat "path of the file or folder"</tt>. 
+If there is no space in the path, no need to cover by <tt>""</tt>. 
+The path may be absolute path or relative path from the batch file. 
+Normally absolute path would be safe.
+</p>
+<p>&nbsp
+If you want to specify multiple files or folders, 
+type <tt>batch.bat "path of the first file or folder" "path of second file or folder" "path of third file or folder"</tt>. 
+You may add some pathes after a script filename.
+</p>
+</li>
+<li>Drag & Drop a file or folder by mouse
+<p>&nbsp
+On Windows Explorer, drag a file or folder and drop it on your batch file. 
+If you put the batch file on Desktop, it will be easy to Drag & Drop.
+</p>
+<p>&nbsp
+If you want to specify multiple files or folders, 
+select them at once on Windows Explorer, and Drag & Drop the group on the batch file. 
+While dragging, number of items may be shown.
+</p>
+</li>
+<li>Select SendTo at Windows Explorer's right click menu
+<p>&nbsp
+At first, create Short-cut icon of your batch file. 
+Name it to be something easy to understand like "Create PAR2". 
+Then, put the Short-cut icon in your <tt>SendTo</tt> folder. 
+You can open the <tt>SendTo</tt> folder by typing <tt>shell:sendto</tt> on path-box of Windows Explorer.
+</p>
+<p>&nbsp
+After you select a file or folder on Windows Explorer, click mouse's right button. 
+Select "SendTo" item on the right-click menu. 
+As your created Short-cut item will exist in the SendTo sub-menu, select the Short-cut icon.
+</p>
+<p>&nbsp
+If you want to specify multiple files or folders, 
+select them at once on Windows Explorer. 
+While mouse cursor over there, click mouse's right button. 
+Later action is same as single item.
+</p>
+</li>
+</ol>
 </p>
 <p>&nbsp
 If you want to confirm the result of scripting, 
--- a/alpha/help/0409/index.htm
+++ b/alpha/help/0409/index.htm
@@ -51,7 +51,7 @@ There are command-line manuals in "<tt>help</tt>" folder.
 </table>

 <hr>
-<small>last update 2023/11/11 for version 1.3.3.1</small>
+<small>last update 2024/04/13 for version 1.3.3.3</small>

 </body>
 </html>
--- a/alpha/help/0409/python.htm
+++ b/alpha/help/0409/python.htm
@@ -173,7 +173,8 @@ It will verify each PAR2 set continuously.
 It's possible to stop queue on GUI.
 <tr><td>Script file<td><tt>queue_verify.py</tt>
 <tr><td>Caution<td>You must select MultiPar Option: "Re-use verification result" to be "For 3 days" or longer, 
-and you should check "Don't search subfolders" in "Verification and Repair options" section on "Client behavior" tab.
+and you should check "Don't search subfolders" in "Verification and Repair options" section on "Client behavior" tab. 
+Furthermore, you must set proper <tt>save_path</tt> in the script.
 </table>
 </p>
 <p>&nbsp
--- a/alpha/help/0409/readme1.htm
+++ b/alpha/help/0409/readme1.htm
@@ -31,7 +31,14 @@ Be careful to use those special features.

 <h3>System requirement</h3>
 <p>&nbsp
-MultiPar requires a PC with Windows Vista or later (Windows 7, 8, 10).
+MultiPar requires a PC with Windows Vista or later (Windows 7, 8, 10, 11).
+</p>
+
+<h3>Usage manual or Help documents</h3>
+<p>&nbsp
+There are some usage manual or help documents in "help" folder.
+English pages exists in "help/0409" folder.
+You may open the manual by pushing "F1-key", while using MultiPar.
 </p>

 </body>
--- a/alpha/help/0411/index.htm
+++ b/alpha/help/0411/index.htm
@@ -51,7 +51,7 @@
 </table>

 <hr>
-<small><EFBFBD>ŏI<EFBFBD>X<EFBFBD>V 2023/11/11 (<28>o<EFBFBD>[<5B>W<EFBFBD><57><EFBFBD><EFBFBD> 1.3.3.1)</small>
+<small><EFBFBD>ŏI<EFBFBD>X<EFBFBD>V 2024/04/13 (<28>o<EFBFBD>[<5B>W<EFBFBD><57><EFBFBD><EFBFBD> 1.3.3.3)</small>

 </body>
 </html>
--- a/alpha/help/0411/readme1.htm
+++ b/alpha/help/0411/readme1.htm
@@ -25,7 +25,14 @@ QuickPar

 <h3><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD></h3>
 <p>&nbsp
-Windows Vista <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ȍ~ (Windows 7, 8, 10) <20>̃p<CC83>\<5C>R<EFBFBD><52><EFBFBD><EFBFBD><EFBFBD>K<EFBFBD>v<EFBFBD>ł<EFBFBD><C582>B
+Windows Vista <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ȍ~ (Windows 7, 8, 10, 11) <20>̃p<CC83>\<5C>R<EFBFBD><52><EFBFBD><EFBFBD><EFBFBD>K<EFBFBD>v<EFBFBD>ł<EFBFBD><C582>B
+</p>
+
+<h3><EFBFBD>g<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD></h3>
+<p>&nbsp
+<EFBFBD>قƂ<EFBFBD><EFBFBD>ǉp<EFBFBD><EFBFBD><EFBFBD>ł<EFBFBD><EFBFBD><EFBFBD><EFBFBD>Ahelp <20>t<EFBFBD>H<EFBFBD><48><EFBFBD>_<EFBFBD>[<5B>̒<EFBFBD><CC92>Ɏg<C98E><67><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>͂<EFBFBD><CD82><EFBFBD><EFBFBD><EFBFBD><EFBFBD>Ă܂<C482><DC82>B
+<EFBFBD><EFBFBD><EFBFBD>{<7B><><EFBFBD>̃y<CC83>[<5B>W<EFBFBD><57> help\0411 <20>t<EFBFBD>H<EFBFBD><48><EFBFBD>_<EFBFBD>[<5B>̒<EFBFBD><CC92>ɂ<EFBFBD><C982><EFBFBD><EFBFBD>܂<EFBFBD><DC82>B
+MultiPar <20>̎g<CC8E>p<EFBFBD><70><EFBFBD>ɁAF1-key <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ƁA<C681>}<7D>j<EFBFBD><6A><EFBFBD>A<EFBFBD><41><EFBFBD><EFBFBD><EFBFBD>\<5C><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>܂<EFBFBD><DC82>B
 </p>

 </body>
--- a/alpha/help/Command_GUI.txt
+++ b/alpha/help/Command_GUI.txt
@@ -1,4 +1,4 @@
-[ MultiPar GUI - version 1.3.3.0 or later ]
+[ MultiPar GUI - version 1.3.3.3 or later ]

 Usage: MultiPar.exe [command] [/base path] [/list path] [files]

@@ -99,6 +99,11 @@ write a line of "RecoveryFileLimit=1" under "[Option]" section.
 It's same as an option:
 "Variable (limited to size of largest data file)" on QuickPar.

+ If you want to limit size of Recovery Files by a specific value,
+write a line of "RecoveryFileLimit=2" under "[Option]" section.
+"Limit Size to" value is enabled, even when "Split Files" isn't checked.
+This setting is useful, if you don't want to split source files.
+
 If you want to enable "Most Resent Used List",
 write this line "MRUMax=5" under "[Path]" section.
 You may change the number of items after "MRUMax=".
--- a/alpha/help/Command_par2j.txt
+++ b/alpha/help/Command_par2j.txt
@@ -1,4 +1,4 @@
-[ par2j.exe - version 1.3.3.0 or later ]
+[ par2j.exe - version 1.3.3.2 or later ]

 Type "par2j.exe" to see version, test integrity, and show usage below.

@@ -367,16 +367,24 @@ The format is "/lc#" (# is from 1 to 32 as the number of using threads).
 253: It uses 3/4 number of physical Cores.
 254: It uses one less threads than number of physical Cores.
  0: It uses the number of physical Cores.
-255: It uses one more threads than number of physical Cores.
+255: It tries to use more threads than number of physical Cores.

- You may set additional combinations;
-+1024 to disable CLMUL (and use old SSSE3 code), 
-+2048 to disable JIT (for SSE2), 
-+4096 to disable SSSE3,
-+8192 to disable AVX2,
-+256 or +512 (slower device) to enable GPU acceleration.
+ You may set additional combinations for CPU feature;
+1024 to disable CLMUL (and use slower SSSE3 code)
+2048 to disable JIT (for SSE2)
+4096 to disable SSSE3
+8192 to disable AVX2

- for example,  /lc1 to use single Core, /lc45 to use half Cores and GPU
+ You may set additional combinations for GPU control;
+256 or +512 (slower device) to enable GPU acceleration
+65536 for classic method
+131072 for 16-byte memory access
+262144 for 4-byte memory access and calculate 2 blocks at once
+524288 for 16-byte memory access and calculate 2 blocks at once
+1048576 for CL_MEM_COPY_HOST_PTR or +2097152 for CL_MEM_USE_HOST_PTR
+(When you set exclusive bits, larger value will be used.)
+
+ for example,  /lc1 to use single Core, /lc508 to use half Cores and GPU

 /m :
 Set this, if you want to set memory usage.
--- a/alpha/par2j.exe
+++ b/alpha/par2j.exe
--- a/alpha/par2j64.exe
+++ b/alpha/par2j64.exe
--- a/source/par2j/Command_par2j.txt
+++ b/source/par2j/Command_par2j.txt
@@ -1,4 +1,4 @@
-[ par2j.exe - version 1.3.3.1 or later ]
+[ par2j.exe - version 1.3.3.2 or later ]

 Type "par2j.exe" to see version, test integrity, and show usage below.

@@ -369,14 +369,22 @@ The format is "/lc#" (# is from 1 to 32 as the number of using threads).
  0: It uses the number of physical Cores.
 255: It tries to use more threads than number of physical Cores.

- You may set additional combinations;
-+1024 to disable CLMUL (and use slower SSSE3 code), 
-+2048 to disable JIT (for SSE2), 
-+4096 to disable SSSE3,
-+8192 to disable AVX2,
-+256 or +512 (slower device) to enable GPU acceleration.
+ You may set additional combinations for CPU feature;
+1024 to disable CLMUL (and use slower SSSE3 code)
+2048 to disable JIT (for SSE2)
+4096 to disable SSSE3
+8192 to disable AVX2

- for example,  /lc1 to use single Core, /lc45 to use half Cores and GPU
+ You may set additional combinations for GPU control;
+256 or +512 (slower device) to enable GPU acceleration
+65536 for classic method
+131072 for 16-byte memory access
+262144 for 4-byte memory access and calculate 2 blocks at once
+524288 for 16-byte memory access and calculate 2 blocks at once
+1048576 for CL_MEM_COPY_HOST_PTR or +2097152 for CL_MEM_USE_HOST_PTR
+(When you set exclusive bits, larger value will be used.)
+
+ for example,  /lc1 to use single Core, /lc508 to use half Cores and GPU

 /m :
 Set this, if you want to set memory usage.
--- a/source/par2j/create.c
+++ b/source/par2j/create.c
@@ -1,5 +1,5 @@
 // create.c
-// Copyright : 2023-10-22 Yutaka Sawada
+// Copyright : 2024-02-09 Yutaka Sawada
 // License : GPL

 #ifndef _UNICODE
@@ -26,6 +26,11 @@

 //#define TIMER // 実験用

+#ifdef TIMER
+#include <time.h>
+static double time_sec, time_speed;
+#endif
+
 // ソート時に項目を比較する
 static int sort_cmp(const void *elem1, const void *elem2)
 {
@@ -196,7 +201,7 @@ int set_common_packet(
 	__int64 prog_now = 0;

 #ifdef TIMER
-unsigned int time_start = GetTickCount();
+clock_t time_start = clock();
 #endif
 	print_progress_text(0, "Computing file hash");

@@ -305,14 +310,14 @@ unsigned int time_start = GetTickCount();
 	off += (64 + main_packet_size);

 #ifdef TIMER
-time_start = GetTickCount() - time_start;
-printf("hash %d.%03d sec", time_start / 1000, time_start % 1000);
-if (time_start > 0){
-	time_start = (int)((total_file_size * 125) / ((__int64)time_start * 131072));
-	printf(", %d MB/s\n", time_start);
+time_start = clock() - time_start;
+time_sec = (double)time_start / CLOCKS_PER_SEC;
+if (time_sec > 0){
+	time_speed = (double)total_file_size / (time_sec * 1048576);
 } else {
-	printf("\n");
+	time_speed = 0;
 }
+printf("hash %.3f sec, %.0f MB/s\n", time_sec, time_speed);
 #endif

 error_end:
@@ -341,7 +346,7 @@ int set_common_packet_multi(
 	FILE_HASH_TH th[MAX_MULTI_READ];

 #ifdef TIMER
-unsigned int time_start = GetTickCount();
+clock_t time_start = clock();
 #endif

 	memset(hSub, 0, sizeof(HANDLE) * MAX_MULTI_READ);
@@ -545,14 +550,14 @@ unsigned int time_start = GetTickCount();
 	}
 	print_progress_done();	// 改行して行の先頭に戻しておく
 #ifdef TIMER
-time_start = GetTickCount() - time_start;
-printf("hash %d.%03d sec", time_start / 1000, time_start % 1000);
-if (time_start > 0){
-	time_start = (int)((total_file_size * 125) / ((__int64)time_start * 131072));
-	printf(", %d MB/s\n", time_start);
+time_start = clock() - time_start;
+time_sec = (double)time_start / CLOCKS_PER_SEC;
+if (time_sec > 0){
+	time_speed = (double)total_file_size / (time_sec * 1048576);
 } else {
-	printf("\n");
+	time_speed = 0;
 }
+printf("hash %.3f sec, %.0f MB/s\n", time_sec, time_speed);
 #endif

 error_end:
@@ -700,7 +705,7 @@ int set_common_packet_hash(
 	__int64 prog_now = 0;

 #ifdef TIMER
-unsigned int time_start = GetTickCount();
+clock_t time_start = clock();
 #endif
 	print_progress_text(0, "Computing file hash");

@@ -740,8 +745,8 @@ unsigned int time_start = GetTickCount();
 	print_progress_done();	// 改行して行の先頭に戻しておく

 #ifdef TIMER
-time_start = GetTickCount() - time_start;
-printf("hash %d.%03d sec\n", time_start / 1000, time_start % 1000);
+time_start = clock() - time_start;
+printf("hash %.3f sec\n", (double)time_start / CLOCKS_PER_SEC);
 #endif
 	return 0;
 }
@@ -1065,7 +1070,7 @@ int create_recovery_file(
 #endif

 #ifdef TIMER
-unsigned int time_start = GetTickCount();
+clock_t time_start = clock();
 #endif
 	print_progress_text(0, "Constructing recovery file");
 	time_last = GetTickCount();
@@ -1258,8 +1263,8 @@ unsigned int time_start = GetTickCount();
 	print_progress_done();	// 改行して行の先頭に戻しておく

 #ifdef TIMER
-time_start = GetTickCount() - time_start;
-printf("write %d.%03d sec\n", time_start / 1000, time_start % 1000);
+time_start = clock() - time_start;
+printf("write %.3f sec\n", (double)time_start / CLOCKS_PER_SEC);
 #endif

 	return 0;
@@ -1824,10 +1829,12 @@ int split_files(
 		}
 		if (ext_len > 0){	// 全て数字の拡張子を持つソース・ファイルがあるなら
 			//printf_cp("\n risky name = %s \n", file_name);
+			wcscpy(file_path, file_name);	// 比較用に拡張子を取り除く
+			file_path[name_len] = 0;
 			for (num2 = 0; num2 < file_num; num2++){
 				if (num2 == num)
 					continue;
-				if (_wcsnicmp(list_buf + files[num2].name, file_name, name_len) == 0){
+				if (_wcsicmp(list_buf + files[num2].name, file_path) == 0){
 					//printf_cp(" match name = %s \n", list_buf + files[num2].name);
 					num8 = (files[num2].size + (__int64)split_size - 1) / split_size;
 					split_max = (int)num8;
@@ -1838,7 +1845,7 @@ int split_files(
 					if (((split_max < 1000) && (ext_len >= 5)) || ((split_max < 10000) && (ext_len >= 6)))
 						continue;	// 拡張子の桁数が異なる
 					// 上書きする危険性があるのでエラーにする
-					printf_cp("split bad file, %s\n", file_name);
+					printf_cp("split bad file, %s\n", list_buf + files[num2].name);
 					*cur_num = -1;
 					*cur_id = 0;
 					return 1;
--- a/source/par2j/gf16.c
+++ b/source/par2j/gf16.c
@@ -2795,7 +2795,7 @@ void galois_align_xor(
 void galois_align16_multiply(
 	unsigned char *r1,	// Region to multiply (must be aligned by 16)
 	unsigned char *r2,	// Products go here
-	unsigned int len,	// Byte length (must be multiple of 32)
+	unsigned int len,	// Byte length (must be multiple of 16)
 	int factor)			// Number to multiply by
 {
 	if (factor <= 1){
--- a/source/par2j/gf16.h
+++ b/source/par2j/gf16.h
@@ -6,7 +6,7 @@ extern "C" {
 #endif


-extern unsigned short *galois_log_table;
+//extern unsigned short *galois_log_table;
 extern unsigned int cpu_flag;

 int galois_create_table(void);	// Returns 0 on success, -1 on failure
--- a/source/par2j/lib_opencl.c
+++ b/source/par2j/lib_opencl.c
@@ -1,5 +1,5 @@
 // lib_opencl.c
-// Copyright : 2023-10-22 Yutaka Sawada
+// Copyright : 2024-01-21 Yutaka Sawada
 // License : GPL

 #ifndef _WIN32_WINNT
@@ -75,7 +75,7 @@ typedef cl_int (CL_API_CALL *API_clEnqueueNDRangeKernel)(cl_command_queue, cl_ke
 extern unsigned int cpu_flag;	// declared in common2.h
 extern int cpu_num;

-#define MAX_DEVICE 3
+#define MAX_DEVICE 8

 HMODULE hLibOpenCL = NULL;

@@ -84,7 +84,7 @@ cl_command_queue OpenCL_command = NULL;
 cl_kernel OpenCL_kernel = NULL;
 cl_mem OpenCL_src = NULL, OpenCL_dst = NULL, OpenCL_buf = NULL;
 size_t OpenCL_group_num;
-int OpenCL_method = 0;	// 正=速い機器を選ぶ, 負=遅い機器を選ぶ
+int OpenCL_method = 0;	// 標準では GPU を使わず、動作は自動選択される

 API_clCreateBuffer gfn_clCreateBuffer;
 API_clReleaseMemObject gfn_clReleaseMemObject;
@@ -100,7 +100,11 @@ API_clEnqueueNDRangeKernel gfn_clEnqueueNDRangeKernel;

 /*
 入力
-OpenCL_method : どのデバイスを選ぶか
+OpenCL_method : どのデバイスや関数を選ぶか
+    0x100 = 速い機器を選ぶ,          0x200 = 遅い機器を選ぶ
+  0x10000 = 1ブロックずつ計算する, 0x20000 = 2ブロックずつ計算しようとする
+  0x40000 = 4-byte memory access,  0x80000 = try 16-byte memory access
+ 0x100000 = CL_MEM_COPY_HOST_PTR, 0x200000 = CL_MEM_USE_HOST_PTR
 unit_size : ブロックの単位サイズ
 src_max : ソース・ブロック個数

@@ -111,11 +115,12 @@ OpenCL_method : 動作フラグいろいろ
 */

 // 0=成功, 1～エラー番号
-int init_OpenCL(int unit_size, int *src_max)
+int init_OpenCL(unsigned int unit_size, int *src_max)
 {
 	char buf[2048], *p_source;
 	int err = 0, i, j;
 	int gpu_power, count;
+	int unified_memory;	// non zero = Integrated GPU
 	size_t data_size, alloc_max;
 	//FILE *fp;
 	HRSRC res;
@@ -136,6 +141,7 @@ int init_OpenCL(int unit_size, int *src_max)
 	API_clReleaseProgram fn_clReleaseProgram;
 	API_clCreateKernel fn_clCreateKernel;
 	API_clGetKernelWorkGroupInfo fn_clGetKernelWorkGroupInfo;
+	API_clReleaseKernel fn_clReleaseKernel;
 	cl_int ret;
 	cl_uint num_platforms = 0, num_devices = 0, num_groups, param_value;
 	cl_ulong param_value8;
@@ -215,6 +221,9 @@ int init_OpenCL(int unit_size, int *src_max)
 	fn_clGetKernelWorkGroupInfo = (API_clGetKernelWorkGroupInfo)GetProcAddress(hLibOpenCL, "clGetKernelWorkGroupInfo");
 	if (fn_clGetKernelWorkGroupInfo == NULL)
 		return err;
+	fn_clReleaseKernel = (API_clReleaseKernel)GetProcAddress(hLibOpenCL, "clReleaseKernel");
+	if (fn_clReleaseKernel == NULL)
+		return err;
 	gfn_clFinish = (API_clFinish)GetProcAddress(hLibOpenCL, "clFinish");
 	if (gfn_clFinish == NULL)
 		return err;
@@ -226,12 +235,12 @@ int init_OpenCL(int unit_size, int *src_max)
 	ret = fn_clGetPlatformIDs(MAX_DEVICE, platform_id, &num_platforms);
 	if (ret != CL_SUCCESS)
 		return (ret << 8) | 10;
-	if (OpenCL_method >= 0){	// 選択する順序と初期値を変える
-		OpenCL_method = 1;
-		gpu_power = 0;
-	} else {
-		OpenCL_method = -1;
+	if (num_platforms > MAX_DEVICE)
+		num_platforms = MAX_DEVICE;
+	if (OpenCL_method & 0x200){	// 選択する順序と初期値を変える
 		gpu_power = INT_MIN;
+	} else {
+		gpu_power = 0;
 	}
 	alloc_max = 0;

@@ -247,6 +256,8 @@ int init_OpenCL(int unit_size, int *src_max)
 		// 環境内の OpenCL 対応機器の数
 		if (fn_clGetDeviceIDs(platform_id[i], CL_DEVICE_TYPE_GPU, MAX_DEVICE, device_id, &num_devices) != CL_SUCCESS)
 			continue;
+		if (num_devices > MAX_DEVICE)
+			num_devices = MAX_DEVICE;

 		for (j = 0; j < (int)num_devices; j++){
 			// デバイスが利用可能か確かめる
@@ -265,45 +276,42 @@ int init_OpenCL(int unit_size, int *src_max)
 			ret = fn_clGetDeviceInfo(device_id[j], CL_DEVICE_VERSION, sizeof(buf), buf, NULL);
 			if (ret == CL_SUCCESS)
 				printf("Device version = %s\n", buf);
-			ret = fn_clGetDeviceInfo(device_id[j], CL_DEVICE_LOCAL_MEM_SIZE, sizeof(cl_ulong), &param_value8, NULL);
-			if (ret == CL_SUCCESS)
-				printf("LOCAL_MEM_SIZE = %I64d KB\n", param_value8 >> 10);
-
-			// 無理とは思うけど、一応チェックする
-//#define CL_DEVICE_SVM_CAPABILITIES                  0x1053
-//#define CL_DEVICE_SVM_COARSE_GRAIN_BUFFER           (1 << 0)
-//#define CL_DEVICE_SVM_FINE_GRAIN_BUFFER             (1 << 1)
-//#define CL_DEVICE_SVM_FINE_GRAIN_SYSTEM             (1 << 2)
-//#define CL_DEVICE_SVM_ATOMICS                       (1 << 3)
-//			ret = fn_clGetDeviceInfo(device_id[j], CL_DEVICE_SVM_CAPABILITIES, sizeof(cl_ulong), &param_value8, NULL);
-//			if (ret == CL_INVALID_VALUE)
-//				printf("Shared Virtual Memory is not supported\n");
-//			if (ret == CL_SUCCESS)
-//				printf("Shared Virtual Memory = 0x%I64X\n", param_value8);
 #endif

+			// 取得できなくてもエラーにしない
+			ret = fn_clGetDeviceInfo(device_id[j], CL_DEVICE_HOST_UNIFIED_MEMORY, sizeof(cl_uint), &param_value, NULL);
+			if (ret == CL_SUCCESS){
+				if (param_value != 0){
+#ifdef DEBUG_OUTPUT
+					printf("HOST_UNIFIED_MEMORY = %d\n", param_value);
+#endif
+					param_value = 1;
+				}
+			} else {	// CL_DEVICE_HOST_UNIFIED_MEMORY は OpenCL 2.0 以降で非推奨になった
+				param_value = 0;
+			}
+
+			// 取得できない場合はエラーにする
 			ret = fn_clGetDeviceInfo(device_id[j], CL_DEVICE_MAX_MEM_ALLOC_SIZE, sizeof(cl_ulong), &param_value8, NULL);
 			if (ret != CL_SUCCESS)
 				continue;
 #ifdef DEBUG_OUTPUT
 			printf("MAX_MEM_ALLOC_SIZE = %I64d MB\n", param_value8 >> 20);
 #endif
-
 			ret = fn_clGetDeviceInfo(device_id[j], CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(cl_uint), &num_groups, NULL);
 			if (ret != CL_SUCCESS)
 				continue;
 			ret = fn_clGetDeviceInfo(device_id[j], CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(size_t), &data_size, NULL);
 			if (ret != CL_SUCCESS)
 				continue;
-			// CL_DEVICE_HOST_UNIFIED_MEMORY は OpenCL 2.0 以降で非推奨になったので、参照しない
-
 #ifdef DEBUG_OUTPUT
 			printf("MAX_COMPUTE_UNITS = %d\n", num_groups);
 			printf("MAX_WORK_GROUP_SIZE = %zd\n", data_size);
 #endif
-			// MAX_COMPUTE_UNITS * MAX_WORK_GROUP_SIZE で計算力を測る
-			count = (int)data_size * num_groups;
-			count *= OpenCL_method;	// 符号を変える
+			// MAX_COMPUTE_UNITS * MAX_WORK_GROUP_SIZE で計算力を測る、外付けGPUなら値を倍にする
+			count = (2 - param_value) * (int)data_size * num_groups;
+			if (OpenCL_method & 0x200)	// Prefer slower device
+				count *= -1;	// 符号を変える
 			//printf("prev = %d, now = %d\n", gpu_power, count);
 			if ((count > gpu_power) && (data_size >= 256) &&	// 256以上ないとテーブルを作れない
 					(param_value8 / 8 > (cl_ulong)unit_size)){	// CL_DEVICE_MAX_MEM_ALLOC_SIZE に収まるか
@@ -312,6 +320,7 @@ int init_OpenCL(int unit_size, int *src_max)
 				selected_platform = platform_id[i];
 				OpenCL_group_num = num_groups;	// ワークグループ数は COMPUTE_UNITS 数にする
 				alloc_max = (size_t)param_value8;
+				unified_memory = param_value;	// 0 = discrete GPU, 1 = integrated GPU

 				// AMD や Intel の GPU ではメモリー領域が全体の 1/4 とは限らない
 				ret = fn_clGetDeviceInfo(device_id[j], CL_DEVICE_GLOBAL_MEM_SIZE, sizeof(cl_ulong), &param_value8, NULL);
@@ -353,31 +362,6 @@ int init_OpenCL(int unit_size, int *src_max)
 	if (ret != CL_SUCCESS)
 		return (ret << 8) | 12;

-	// 計算方式を選択する
-	if ((((cpu_flag & 0x101) == 1) || ((cpu_flag & 0x110) == 0x10)) && (sse_unit == 32)){
-		OpenCL_method = 2;	// SSSE3 & ALTMAP または AVX2 ならデータの並び替え対応版を使う
-	} else if (((cpu_flag & 128) != 0) && (sse_unit == 256)){
-		OpenCL_method = 4;	// JIT(SSE2) は bit ごとに上位から 16バイトずつ並ぶ
-		// ローカルのテーブルサイズが異なることに注意
-		// XOR 方式以外は 2KB (4バイト * 256項目 * 2個) 使う
-		// XOR (JIT) は 64バイト (4バイト * 16項目) 使う
-	} else {
-		OpenCL_method = 1;	// 並び替えられてないデータ用
-	}
-
-	// work group 数が必要以上に多い場合は減らす
-	if (OpenCL_method == 2){
-		// work item 一個が 8バイトずつ計算する、256個なら work group ごとに 2KB 担当する
-		data_size = unit_size / 2048;
-	} else {
-		// work item 一個が 4バイトずつ計算する、256個なら work group ごとに 1KB 担当する
-		data_size = unit_size / 1024;
-	}
-	if (OpenCL_group_num > data_size){
-		OpenCL_group_num = data_size;
-		printf("Number of work groups is reduced to %zd\n", OpenCL_group_num);
-	}
-
 	// 最大で何ブロック分のメモリー領域を保持できるのか（ここではまだ確保しない）
 	// 後で実際に確保する量はこれよりも少なくなる
 	count = (int)(alloc_max / unit_size);	// 確保できるメモリー量から逆算する
@@ -389,25 +373,6 @@ int init_OpenCL(int unit_size, int *src_max)
 	printf("src buf : %zd KB (%d blocks), possible\n", data_size >> 10, count);
 #endif

-	// 出力先は1ブロック分だけあればいい
-	// CL_MEM_ALLOC_HOST_PTRを使えばpinned memoryになるらしい
-	data_size = unit_size;
-	OpenCL_dst = gfn_clCreateBuffer(OpenCL_context, CL_MEM_WRITE_ONLY | CL_MEM_ALLOC_HOST_PTR, data_size, NULL, &ret);
-	if (ret != CL_SUCCESS)
-		return (ret << 8) | 13;
-#ifdef DEBUG_OUTPUT
-	printf("dst buf : %zd KB (%zd Bytes), OK\n", data_size >> 10, data_size);
-#endif
-
-	// factor は最大個数分 (src_max個)
-	data_size = sizeof(unsigned short) * (*src_max);
-	OpenCL_buf = gfn_clCreateBuffer(OpenCL_context, CL_MEM_READ_ONLY, data_size, NULL, &ret);
-	if (ret != CL_SUCCESS)
-		return (ret << 8) | 14;
-#ifdef DEBUG_OUTPUT
-	printf("factor buf : %zd Bytes (%d factors), OK\n", data_size, (*src_max));
-#endif
-
 /*
 	// テキスト形式の OpenCL C ソース・コードを読み込む
 	err = 4;
@@ -508,18 +473,208 @@ int init_OpenCL(int unit_size, int *src_max)
 		return (ret << 8) | 21;
 	}

-	// カーネル関数を抽出する
-	wsprintfA(buf, "method%d", OpenCL_method & 7);
-	OpenCL_kernel = fn_clCreateKernel(program, buf, &ret);
-	if (ret != CL_SUCCESS)
-		return (ret << 8) | 22;
+	// 計算方式を選択する
+	if ((((cpu_flag & 0x101) == 1) || ((cpu_flag & 0x110) == 0x10)) && (sse_unit == 32)){
+		int select_method;	// SSSE3 & ALTMAP または AVX2 ならデータの並び替え対応版を使う
+		if (OpenCL_method & 0x80000){	// 16-byte and 2 blocks
+			select_method = 12;
+		} else if (OpenCL_method & 0x40000){	// 4-byte and 2 blocks
+			select_method = 10;
+		} else if (OpenCL_method & 0x20000){	// 16-byte
+			select_method = 4;
+		} else if (OpenCL_method & 0x10000){	// 4-byte
+			select_method = 2;
+		} else {	// kernel を作って詳細を確かめる
+			size_t item2, item4, item10, item12;
+			cl_kernel kernel2, kernel4, kernel10, kernel12;
+			item2 = item4 = item10 = item12 = 0;
+			// まずは一番重くて速い奴を調べる
+			wsprintfA(buf, "method%d", 12);
+			kernel12 = fn_clCreateKernel(program, buf, &ret);
+			if (ret == CL_SUCCESS){
+				ret = fn_clGetKernelWorkGroupInfo(kernel12, selected_device, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, sizeof(size_t), &item12, NULL);
+				if (ret == CL_SUCCESS){
 #ifdef DEBUG_OUTPUT
-	printf("CreateKernel : %s\n", buf);
+					printf("\nTesting %s\n", buf);
+					printf("PREFERRED_WORK_GROUP_SIZE_MULTIPLE = %zu\n", item12);
 #endif
+				}
+			}
+			if (item12 >= 32){	// 32以上あれば余裕で動くとみなす
+				select_method = 12;
+				OpenCL_kernel = kernel12;
+#ifdef DEBUG_OUTPUT
+				printf("\nSelected method%d\n", select_method);
+#endif
+			} else {	// 他の奴と比較する
+				wsprintfA(buf, "method%d", 2);
+				kernel2 = fn_clCreateKernel(program, buf, &ret);
+				if (ret == CL_SUCCESS){
+					ret = fn_clGetKernelWorkGroupInfo(kernel2, selected_device, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, sizeof(size_t), &item2, NULL);
+					if (ret == CL_SUCCESS){
+#ifdef DEBUG_OUTPUT
+						printf("\nTesting %s\n", buf);
+						printf("PREFERRED_WORK_GROUP_SIZE_MULTIPLE = %zu\n", item2);
+#endif
+					}
+				}
+				if (item12 >= item2){
+					select_method = 12;
+					OpenCL_kernel = kernel12;
+					ret = fn_clReleaseKernel(kernel2);
+#ifdef DEBUG_OUTPUT
+					if (ret != CL_SUCCESS)
+						printf("clReleaseKernel : Failed\n");
+					printf("\nSelected method%d\n", select_method);
+#endif
+				} else {
+					ret = fn_clReleaseKernel(kernel12);
+#ifdef DEBUG_OUTPUT
+					if (ret != CL_SUCCESS)
+						printf("clReleaseKernel : Failed\n");
+#endif
+					wsprintfA(buf, "method%d", 10);
+					kernel10 = fn_clCreateKernel(program, buf, &ret);
+					if (ret == CL_SUCCESS){
+						ret = fn_clGetKernelWorkGroupInfo(kernel10, selected_device, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, sizeof(size_t), &item10, NULL);
+						if (ret == CL_SUCCESS){
+#ifdef DEBUG_OUTPUT
+							printf("\nTesting %s\n", buf);
+							printf("PREFERRED_WORK_GROUP_SIZE_MULTIPLE = %zu\n", item10);
+#endif
+						}
+					}
+					if (item10 >= item2){
+						select_method = 10;
+						OpenCL_kernel = kernel10;
+						ret = fn_clReleaseKernel(kernel2);
+#ifdef DEBUG_OUTPUT
+						if (ret != CL_SUCCESS)
+							printf("clReleaseKernel : Failed\n");
+						printf("\nSelected method%d\n", select_method);
+#endif
+					} else {
+						wsprintfA(buf, "method%d", 4);
+						kernel4 = fn_clCreateKernel(program, buf, &ret);
+						if (ret == CL_SUCCESS){
+							ret = fn_clGetKernelWorkGroupInfo(kernel4, selected_device, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, sizeof(size_t), &item4, NULL);
+							if (ret == CL_SUCCESS){
+#ifdef DEBUG_OUTPUT
+								printf("\nTesting %s\n", buf);
+								printf("PREFERRED_WORK_GROUP_SIZE_MULTIPLE = %zu\n", item4);
+#endif
+							}
+						}
+						if (item4 >= item2){
+							select_method = 4;
+							OpenCL_kernel = kernel4;
+							ret = fn_clReleaseKernel(kernel2);
+#ifdef DEBUG_OUTPUT
+							if (ret != CL_SUCCESS)
+								printf("clReleaseKernel : Failed\n");
+							printf("\nSelected method%d\n", select_method);
+#endif
+						} else {
+							select_method = 2;
+							OpenCL_kernel = kernel2;
+							ret = fn_clReleaseKernel(kernel4);
+#ifdef DEBUG_OUTPUT
+							if (ret != CL_SUCCESS)
+								printf("clReleaseKernel : Failed\n");
+							printf("\nSelected method%d\n", select_method);
+#endif
+						}
+					}
+				}
+			}
+		}
+		OpenCL_method |= select_method;
+	} else if (((cpu_flag & 128) != 0) && (sse_unit == 256)){
+		OpenCL_method |= 16;	// JIT(SSE2) は bit ごとに上位から 16バイトずつ並ぶ
+		// ローカルのテーブルサイズが異なることに注意
+		// XOR 方式以外は 2KB (4バイト * 256項目 * 2個) 使う
+		// XOR (JIT) は 64バイト (4バイト * 16項目) 使う
+	} else {
+		int select_method;	// 並び替えられてないデータ用
+		if (OpenCL_method & 0x40000){	// 4-byte and 2 blocks
+			select_method = 9;
+		} else if (OpenCL_method & 0x10000){	// 4-byte
+			select_method = 1;
+		} else {	// kernel を作って詳細を確かめる
+			size_t item1, item9;
+			cl_kernel kernel1, kernel9;
+			item1 = item9 = 0;
+			// まずは一番重くて速い奴を調べる
+			wsprintfA(buf, "method%d", 9);
+			kernel9 = fn_clCreateKernel(program, buf, &ret);
+			if (ret == CL_SUCCESS){
+				ret = fn_clGetKernelWorkGroupInfo(kernel9, selected_device, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, sizeof(size_t), &item9, NULL);
+				if (ret == CL_SUCCESS){
+#ifdef DEBUG_OUTPUT
+					printf("\nTesting %s\n", buf);
+					printf("PREFERRED_WORK_GROUP_SIZE_MULTIPLE = %zu\n", item9);
+#endif
+				}
+			}
+			if (item9 >= 32){	// 32以上あれば余裕で動くとみなす
+				select_method = 9;
+				OpenCL_kernel = kernel9;
+#ifdef DEBUG_OUTPUT
+				printf("\nSelected method%d\n", select_method);
+#endif
+			} else {	// 他の奴と比較する
+				wsprintfA(buf, "method%d", 1);
+				kernel1 = fn_clCreateKernel(program, buf, &ret);
+				if (ret == CL_SUCCESS){
+					ret = fn_clGetKernelWorkGroupInfo(kernel1, selected_device, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, sizeof(size_t), &item1, NULL);
+					if (ret == CL_SUCCESS){
+#ifdef DEBUG_OUTPUT
+						printf("\nTesting %s\n", buf);
+						printf("PREFERRED_WORK_GROUP_SIZE_MULTIPLE = %zu\n", item1);
+#endif
+					}
+				}
+				if (item9 >= item1){
+					select_method = 9;
+					OpenCL_kernel = kernel9;
+					ret = fn_clReleaseKernel(kernel1);
+#ifdef DEBUG_OUTPUT
+					if (ret != CL_SUCCESS)
+						printf("clReleaseKernel : Failed\n");
+					printf("\nSelected method%d\n", select_method);
+#endif
+				} else {
+					select_method = 1;
+					OpenCL_kernel = kernel1;
+					ret = fn_clReleaseKernel(kernel9);
+#ifdef DEBUG_OUTPUT
+					if (ret != CL_SUCCESS)
+						printf("clReleaseKernel : Failed\n");
+					printf("\nSelected method%d\n", select_method);
+#endif
+				}
+			}
+		}
+		OpenCL_method |= select_method;
+	}
+
+	// カーネル関数を抽出する
+	if (OpenCL_kernel == NULL){
+		wsprintfA(buf, "method%d", OpenCL_method & 31);
+		OpenCL_kernel = fn_clCreateKernel(program, buf, &ret);
+		if (ret != CL_SUCCESS)
+			return (ret << 8) | 22;
+#ifdef DEBUG_OUTPUT
+		printf("CreateKernel : %s\n", buf);
+		ret = fn_clGetKernelWorkGroupInfo(OpenCL_kernel, selected_device, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, sizeof(size_t), &data_size, NULL);
+		if (ret == CL_SUCCESS)
+			printf("PREFERRED_WORK_GROUP_SIZE_MULTIPLE = %zu\n", data_size);
+#endif
+	}

 	// カーネルが実行できる work item 数を調べる
-	ret = fn_clGetKernelWorkGroupInfo(OpenCL_kernel, NULL, CL_KERNEL_WORK_GROUP_SIZE, sizeof(size_t), &data_size, NULL);
-	if ((ret == CL_SUCCESS) && (data_size < 256)){	// 最低でも 256以上は必要
+	ret = fn_clGetKernelWorkGroupInfo(OpenCL_kernel, selected_device, CL_KERNEL_WORK_GROUP_SIZE, sizeof(size_t), &data_size, NULL);
+	if ((ret == CL_SUCCESS) && (data_size < 256)){	// 最低でも 256 以上は必要
 #ifdef DEBUG_OUTPUT
 		printf("KERNEL_WORK_GROUP_SIZE = %zd\n", data_size);
 #endif
@@ -538,6 +693,60 @@ int init_OpenCL(int unit_size, int *src_max)
 		fn_clUnloadCompiler();
 	}

+	// work group 数が必要以上に多い場合は減らす
+	if (OpenCL_method & 4){
+		// work item 一個が 32バイトずつ計算する、256個なら work group ごとに 8KB 担当する
+		data_size = unit_size / 8192;
+	} else if (OpenCL_method & 2){
+		// work item 一個が 8バイトずつ計算する、256個なら work group ごとに 2KB 担当する
+		data_size = unit_size / 2048;
+	} else {
+		// work item 一個が 4バイトずつ計算する、256個なら work group ごとに 1KB 担当する
+		data_size = unit_size / 1024;
+	}
+	if (OpenCL_group_num > data_size){
+		OpenCL_group_num = data_size;
+		printf("Number of work groups is reduced to %zd\n", OpenCL_group_num);
+	}
+
+	// データへのアクセス方法をデバイスによって変える
+	if (OpenCL_method & 0x200000){
+		OpenCL_method |= 32;
+	} else if ((OpenCL_method & 0x100000) == 0){
+		if (unified_memory){
+			OpenCL_method |= 32;	// Integrated GPU なら CL_MEM_USE_HOST_PTR を使う
+		} else {	// Discrete GPU でも Nvidia のは動作を変える
+			ret = fn_clGetDeviceInfo(selected_device, CL_DEVICE_VERSION, sizeof(buf), buf, NULL);
+			if (ret == CL_SUCCESS){
+				if (strstr(buf, "CUDA") != NULL)
+					OpenCL_method |= 32;	// NVIDIA GPU なら CL_MEM_USE_HOST_PTR を使う
+			}
+		}
+	}
+
+	// 出力先は1ブロック分だけあればいい
+	// CL_MEM_ALLOC_HOST_PTRを使えばpinned memoryになるらしい
+	data_size = unit_size;
+	if (OpenCL_method & 8)
+		data_size *= 2;	// 2ブロックずつ計算できるように、2倍確保しておく
+	OpenCL_dst = gfn_clCreateBuffer(OpenCL_context, CL_MEM_ALLOC_HOST_PTR, data_size, NULL, &ret);
+	if (ret != CL_SUCCESS)
+		return (ret << 8) | 13;
+#ifdef DEBUG_OUTPUT
+	printf("dst buf : %zd KB (%zd Bytes), OK\n", data_size >> 10, data_size);
+#endif
+
+	// factor は最大個数分 (src_max個)
+	data_size = sizeof(unsigned short) * (*src_max);
+	if (OpenCL_method & 8)
+		data_size *= 2;	// 2ブロックずつ計算できるように、2倍確保しておく
+	OpenCL_buf = gfn_clCreateBuffer(OpenCL_context, CL_MEM_READ_ONLY, data_size, NULL, &ret);
+	if (ret != CL_SUCCESS)
+		return (ret << 8) | 14;
+#ifdef DEBUG_OUTPUT
+	printf("factor buf : %zd Bytes (%d factors), OK\n", data_size, (*src_max));
+#endif
+
 	// カーネル引数を指定する
 	ret = gfn_clSetKernelArg(OpenCL_kernel, 1, sizeof(cl_mem), &OpenCL_dst);
 	if (ret != CL_SUCCESS)
@@ -545,13 +754,12 @@ int init_OpenCL(int unit_size, int *src_max)
 	ret = gfn_clSetKernelArg(OpenCL_kernel, 2, sizeof(cl_mem), &OpenCL_buf);
 	if (ret != CL_SUCCESS)
 		return (ret << 8) | 102;
-	if (ret != CL_SUCCESS)
-		return (ret << 8) | 103;

 #ifdef DEBUG_OUTPUT
 	// ワークアイテム数
 	printf("\nMax number of work items = %zd (256 * %zd)\n", OpenCL_group_num * 256, OpenCL_group_num);
 #endif
+	OpenCL_method &= 0xFF;	// 最後に選択設定を消去する

 	return 0;
 }
@@ -663,16 +871,24 @@ void info_OpenCL(char *buf, int buf_size)
 // ソース・ブロックをデバイス側にコピーする
 int gpu_copy_blocks(
 	unsigned char *data,	// ブロックのバッファー (境界は 4096にすること)
-	int unit_size,			// 4096の倍数にすること
+	unsigned int unit_size,	// 4096の倍数にすること
 	int src_num)			// 何ブロックをコピーするのか
 {
 	size_t data_size;
 	cl_int ret;
+	cl_mem_flags flags;

 	// Integrated GPU と Discrete GPU の違いに関係なく、使う分だけ毎回メモリー領域を確保する
 	data_size = (size_t)unit_size * src_num;
-	// Intel GPUならZeroCopyできる、GeForce GPUでもメモリー消費量が少なくてコピーが速い
-	OpenCL_src = gfn_clCreateBuffer(OpenCL_context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, data_size, data, &ret);
+	if (OpenCL_method & 32){	// AMD's APU や Integrated GPU なら ZeroCopy する
+		// 実際に比較してみると GeForce GPU でもメモリー消費量が少なくてコピーが速い
+		// NVIDIA GPU は CL_MEM_USE_HOST_PTR でも VRAM 上にキャッシュするので速いらしい
+		flags = CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR;
+	} else {	// Discrete GPU ならデータを VRAM にコピーする
+		// AMD GPU は明示的にコピーするよう指定しないといけない
+		flags = CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR;
+	}
+	OpenCL_src = gfn_clCreateBuffer(OpenCL_context, flags, data_size, data, &ret);
 	if (ret != CL_SUCCESS)
 		return (ret << 8) | 1;
 #ifdef DEBUG_OUTPUT
@@ -691,17 +907,31 @@ int gpu_copy_blocks(
 int gpu_multiply_blocks(
 	int src_num,			// Number of multiplying source blocks
 	unsigned short *mat,	// Matrix of numbers to multiply by
+	unsigned short *mat2,	// Set to calculate 2 blocks at once
 	unsigned char *buf,		// Products go here
-	int len)				// Byte length
+	unsigned int len)		// Byte length
 {
 	unsigned __int64 *vram, *src, *dst;
 	size_t global_size, local_size;
 	cl_int ret;

 	// 倍率の配列をデバイス側に書き込む
-	ret = gfn_clEnqueueWriteBuffer(OpenCL_command, OpenCL_buf, CL_FALSE, 0, sizeof(short) * src_num, mat, 0, NULL, NULL);
+	if (mat2 == NULL){	// 1ブロック分だけコピーする
+		ret = gfn_clEnqueueWriteBuffer(OpenCL_command, OpenCL_buf, CL_FALSE, 0, sizeof(short) * src_num, mat, 0, NULL, NULL);
+	} else {	// 2ブロックずつ計算する場合は、配列のサイズも２倍になる
+		if ((size_t)mat2 == 1){	// アドレスが 1 になることはあり得ないので、識別できる
+			ret = gfn_clEnqueueWriteBuffer(OpenCL_command, OpenCL_buf, CL_FALSE, 0, sizeof(short) * src_num * 2, mat, 0, NULL, NULL);
+		} else {	// 2回コピーする
+			size_t data_size = sizeof(short) * src_num;
+			ret = gfn_clEnqueueWriteBuffer(OpenCL_command, OpenCL_buf, CL_FALSE, 0, data_size, mat, 0, NULL, NULL);
+			if (ret != CL_SUCCESS)
+				return (ret << 8) | 10;
+			// もう一つの配列は違う場所からコピーする
+			ret = gfn_clEnqueueWriteBuffer(OpenCL_command, OpenCL_buf, CL_FALSE, data_size, data_size, mat2, 0, NULL, NULL);
+		}
+	}
 	if (ret != CL_SUCCESS)
-		return (ret << 8) | 10;
+		return (ret << 8) | 11;

 	// 引数を指定する
 	ret = gfn_clSetKernelArg(OpenCL_kernel, 3, sizeof(int), &src_num);
@@ -709,17 +939,17 @@ int gpu_multiply_blocks(
 		return (ret << 8) | 103;

 	// カーネル並列実行
-	local_size = 256;	// テーブルやキャッシュのため、work item 数は 256に固定する
+	local_size = 256;	// テーブルやキャッシュのため、work item 数は 256 に固定する
 	global_size = OpenCL_group_num * 256;
-	//printf("group num = %d, global size = %d, local size = 256 \n", OpenCL_group_num, global_size);
+	//printf("group num = %d, global size = %d, local size = %d \n", OpenCL_group_num, global_size, local_size);
 	ret = gfn_clEnqueueNDRangeKernel(OpenCL_command, OpenCL_kernel, 1, NULL, &global_size, &local_size, 0, NULL, NULL);
 	if (ret != CL_SUCCESS)
-		return (ret << 8) | 11;
+		return (ret << 8) | 12;

 	// 出力内容をホスト側に反映させる
 	vram = gfn_clEnqueueMapBuffer(OpenCL_command, OpenCL_dst, CL_TRUE, CL_MAP_READ, 0, len, 0, NULL, NULL, &ret);
 	if (ret != CL_SUCCESS)
-		return (ret << 8) | 12;
+		return (ret << 8) | 13;

 	// 8バイトごとに XOR する (SSE2 で XOR しても速くならず)
 	src = vram;
@@ -734,7 +964,7 @@ int gpu_multiply_blocks(
 	// ホスト側でデータを変更しなくても、clEnqueueMapBufferと対で呼び出さないといけない
 	ret = gfn_clEnqueueUnmapMemObject(OpenCL_command, OpenCL_dst, vram, 0, NULL, NULL);
 	if (ret != CL_SUCCESS)
-		return (ret << 8) | 13;
+		return (ret << 8) | 14;

 	return 0;
 }
@@ -747,12 +977,12 @@ int gpu_finish(void)
 	// 全ての処理が終わるのを待つ
 	ret = gfn_clFinish(OpenCL_command);
 	if (ret != CL_SUCCESS)
-		return (ret << 8) | 20;
+		return (ret << 8) | 30;

 	if (OpenCL_src != NULL){	// 確保されてる場合は解除する
 		ret = gfn_clReleaseMemObject(OpenCL_src);
 		if (ret != CL_SUCCESS)
-			return (ret << 8) | 21;
+			return (ret << 8) | 31;
 		OpenCL_src = NULL;
 	}

--- a/source/par2j/lib_opencl.h
+++ b/source/par2j/lib_opencl.h
@@ -10,20 +10,21 @@ extern "C" {

 extern int OpenCL_method;

-int init_OpenCL(int unit_size, int *src_max);
+int init_OpenCL(unsigned int unit_size, int *src_max);
 int free_OpenCL(void);
 void info_OpenCL(char *buf, int buf_size);

 int gpu_copy_blocks(
 	unsigned char *data,
-	int unit_size,
+	unsigned int unit_size,
 	int src_num);

 int gpu_multiply_blocks(
 	int src_num,			// Number of multiplying source blocks
 	unsigned short *mat,	// Matrix of numbers to multiply by
+	unsigned short *mat2,	// Set to calculate 2 blocks at once
 	unsigned char *buf,		// Products go here
-	int len);				// Byte length
+	unsigned int len);		// Byte length

 int gpu_finish(void);

--- a/source/par2j/list.c
+++ b/source/par2j/list.c
@@ -1,5 +1,5 @@
 // list.c
-// Copyright : 2023-10-15 Yutaka Sawada
+// Copyright : 2023-12-12 Yutaka Sawada
 // License : GPL

 #ifndef _UNICODE
@@ -26,6 +26,11 @@

 //#define TIMER // 実験用

+#ifdef TIMER
+#include <time.h>
+static double time_sec, time_speed;
+#endif
+
 // recovery set のファイルのハッシュ値を調べる (空のファイルは除く)
 // 0x00 = ファイルが存在して完全である
 // 0x01 = ファイルが存在しない
@@ -296,7 +301,7 @@ int check_file_complete(
 {
 	int i, rv;
 #ifdef TIMER
-unsigned int time_start = GetTickCount();
+clock_t time_start = clock();
 #endif

 	printf("\nVerifying Input File   :\n");
@@ -332,14 +337,14 @@ unsigned int time_start = GetTickCount();
 	}

 #ifdef TIMER
-time_start = GetTickCount() - time_start;
-printf("\n hash %d.%03d sec", time_start / 1000, time_start % 1000);
-if (time_start > 0){
-	time_start = (int)((total_file_size * 125) / ((__int64)time_start * 131072));
-	printf(", %d MB/s\n", time_start);
+time_start = clock() - time_start;
+time_sec = (double)time_start / CLOCKS_PER_SEC;
+if (time_sec > 0){
+	time_speed = (double)total_file_size / (time_sec * 1048576);
 } else {
-	printf("\n");
+	time_speed = 0;
 }
+printf("\n hash %.3f sec, %.0f MB/s\n", time_sec, time_speed);
 #endif
 	return 0;
 }
@@ -364,7 +369,7 @@ int check_file_complete_multi(
 	HANDLE hSub[MAX_READ_NUM];
 	FILE_CHECK_TH th[MAX_READ_NUM];
 #ifdef TIMER
-unsigned int time_start = GetTickCount();
+clock_t time_start = clock();
 #endif

 	memset(hSub, 0, sizeof(HANDLE) * MAX_READ_NUM);
@@ -630,14 +635,14 @@ unsigned int time_start = GetTickCount();
 	}

 #ifdef TIMER
-time_start = GetTickCount() - time_start;
-printf("\n hash %d.%03d sec", time_start / 1000, time_start % 1000);
-if (time_start > 0){
-	time_start = (int)((total_file_size * 125) / ((__int64)time_start * 131072));
-	printf(", %d MB/s\n", time_start);
+time_start = clock() - time_start;
+time_sec = (double)time_start / CLOCKS_PER_SEC;
+if (time_sec > 0){
+	time_speed = (double)total_file_size / (time_sec * 1048576);
 } else {
-	printf("\n");
+	time_speed = 0;
 }
+printf("\n hash %.3f sec, %.0f MB/s\n", time_sec, time_speed);
 #endif

 error_end:
--- a/source/par2j/md5_crc.c
+++ b/source/par2j/md5_crc.c
@@ -1,5 +1,5 @@
 // md5_crc.c
-// Copyright : 2023-10-29 Yutaka Sawada
+// Copyright : 2023-12-12 Yutaka Sawada
 // License : GPL

 #ifndef _UNICODE
@@ -21,7 +21,6 @@
 #include "phmd5.h"
 #include "md5_crc.h"

-
 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */

 // バイト配列の MD5 ハッシュ値を求める
@@ -200,8 +199,10 @@ int file_md5_crc32_block(
 //#define TIMER // 実験用

 #ifdef TIMER
-static unsigned int time_start, time1_start;
-static unsigned int time_total = 0, time2_total = 0, time3_total = 0;
+#include <time.h>
+static double time_sec, time_speed;
+static clock_t time_start, time1_start;
+static clock_t time_total = 0, time2_total = 0, time3_total = 0;
 #endif

 #define MAX_BUF_SIZE	2097152	// ヒープ領域を使う場合の最大サイズ
@@ -224,7 +225,7 @@ int file_hash_crc(
 	HANDLE hFile;
 	OVERLAPPED ol;
 #ifdef TIMER
-time1_start = GetTickCount();
+time1_start = clock();
 #endif

 	// ソース・ファイルを開く
@@ -251,11 +252,11 @@ time1_start = GetTickCount();
 	if (file_left < IO_SIZE)
 		read_size = (unsigned int)file_left;
 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 	off = ReadFile(hFile, buf1, read_size, NULL, &ol);
 #ifdef TIMER
-time2_total += GetTickCount() - time_start;
+time2_total += clock() - time_start;
 #endif
 	if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
 		print_win32_err();
@@ -281,11 +282,11 @@ time2_total += GetTickCount() - time_start;
 			ol.OffsetHigh = (unsigned int)(file_off >> 32);
 			file_off += IO_SIZE;
 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 			off = ReadFile(hFile, buf, read_size, NULL, &ol);
 #ifdef TIMER
-time2_total += GetTickCount() - time_start;
+time2_total += clock() - time_start;
 #endif
 			if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
 				print_win32_err();
@@ -301,7 +302,7 @@ time2_total += GetTickCount() - time_start;
 		}

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		off = 0;	// チェックサム計算
 		if (block_left > 0){	// 前回足りなかった分を追加する
@@ -338,7 +339,7 @@ time_start = GetTickCount();
 			}
 		}
 #ifdef TIMER
-time3_total += GetTickCount() - time_start;
+time3_total += clock() - time_start;
 #endif

 		// 経過表示
@@ -369,16 +370,17 @@ error_end:
 		CloseHandle(ol.hEvent);

 #ifdef TIMER
-time_total += GetTickCount() - time1_start;
+time_total += clock() - time1_start;
 if (*prog_now == total_file_size){
-	printf("\nread  %d.%03d sec\n", time2_total / 1000, time2_total % 1000);
-	printf("main  %d.%03d sec\n", time3_total / 1000, time3_total % 1000);
-	if (time_total > 0){
-		time_start = (int)((total_file_size * 125) / ((__int64)time_total * 131072));
+	printf("\nread  %.3f sec\n", (double)time2_total / CLOCKS_PER_SEC);
+	printf("main  %.3f sec\n", (double)time3_total / CLOCKS_PER_SEC);
+	time_sec = (double)time_total / CLOCKS_PER_SEC;
+	if (time_sec > 0){
+		time_speed = (double)total_file_size / (time_sec * 1048576);
 	} else {
-		time_start = 0;
+		time_speed = 0;
 	}
-	printf("total %d.%03d sec, %d MB/s\n", time_total / 1000, time_total % 1000, time_start);
+	printf("total %.3f sec, %.0f MB/s\n", time_sec, time_speed);
 }
 #endif
 	return err;
@@ -403,7 +405,7 @@ int file_hash_crc(
 	HANDLE hFile;
 	OVERLAPPED ol;
 #ifdef TIMER
-time1_start = GetTickCount();
+time1_start = clock();
 #endif

 	// ソース・ファイルを開く
@@ -442,11 +444,11 @@ error_retry_read:
 	if (file_left < IO_SIZE)
 		read_size = (unsigned int)file_left;
 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 	off = ReadFile(hFile, buf1, read_size, NULL, &ol);
 #ifdef TIMER
-time2_total += GetTickCount() - time_start;
+time2_total += clock() - time_start;
 #endif
 	if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
 		print_win32_err();
@@ -536,11 +538,11 @@ error_retry_pause:
 			ol.OffsetHigh = (unsigned int)(file_off >> 32);
 			file_off += IO_SIZE;
 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 			off = ReadFile(hFile, buf, read_size, NULL, &ol);
 #ifdef TIMER
-time2_total += GetTickCount() - time_start;
+time2_total += clock() - time_start;
 #endif
 			if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
 				print_win32_err();
@@ -557,7 +559,7 @@ time2_total += GetTickCount() - time_start;
 		}

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		off = 0;	// チェックサム計算
 		if (block_left > 0){	// 前回足りなかった分を追加する
@@ -594,7 +596,7 @@ time_start = GetTickCount();
 			}
 		}
 #ifdef TIMER
-time3_total += GetTickCount() - time_start;
+time3_total += clock() - time_start;
 #endif

 		// 経過表示
@@ -625,16 +627,17 @@ error_end:
 		CloseHandle(ol.hEvent);

 #ifdef TIMER
-time_total += GetTickCount() - time1_start;
+time_total += clock() - time1_start;
 if (*prog_now == total_file_size){
-	printf("\nread  %d.%03d sec\n", time2_total / 1000, time2_total % 1000);
-	printf("main  %d.%03d sec\n", time3_total / 1000, time3_total % 1000);
-	if (time_total > 0){
-		time_start = (int)((total_file_size * 125) / ((__int64)time_total * 131072));
+	printf("\nread  %.3f sec\n", (double)time2_total / CLOCKS_PER_SEC);
+	printf("main  %.3f sec\n", (double)time3_total / CLOCKS_PER_SEC);
+	time_sec = (double)time_total / CLOCKS_PER_SEC;
+	if (time_sec > 0){
+		time_speed = (double)total_file_size / (time_sec * 1048576);
 	} else {
-		time_start = 0;
+		time_speed = 0;
 	}
-	printf("total %d.%03d sec, %d MB/s\n", time_total / 1000, time_total % 1000, time_start);
+	printf("total %.3f sec, %.0f MB/s\n", time_sec, time_speed);
 }
 #endif
 	return err;
@@ -660,7 +663,7 @@ int file_hash_crc(
 	HANDLE hFile;
 	OVERLAPPED ol;
 #ifdef TIMER
-time1_start = GetTickCount();
+time1_start = clock();
 #endif

 	// ソース・ファイルを開く
@@ -699,11 +702,11 @@ time1_start = GetTickCount();
 	if (file_left < io_size)
 		read_size = (unsigned int)file_left;
 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 	off = ReadFile(hFile, buf1, read_size, NULL, &ol);
 #ifdef TIMER
-time2_total += GetTickCount() - time_start;
+time2_total += clock() - time_start;
 #endif
 	if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
 		print_win32_err();
@@ -729,11 +732,11 @@ time2_total += GetTickCount() - time_start;
 			ol.OffsetHigh = (unsigned int)(file_off >> 32);
 			file_off += io_size;
 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 			off = ReadFile(hFile, buf, read_size, NULL, &ol);
 #ifdef TIMER
-time2_total += GetTickCount() - time_start;
+time2_total += clock() - time_start;
 #endif
 			if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
 				print_win32_err();
@@ -749,7 +752,7 @@ time2_total += GetTickCount() - time_start;
 		}

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		off = 0;	// チェックサム計算
 		if (block_left > 0){	// 前回足りなかった分を追加する
@@ -786,7 +789,7 @@ time_start = GetTickCount();
 			}
 		}
 #ifdef TIMER
-time3_total += GetTickCount() - time_start;
+time3_total += clock() - time_start;
 #endif

 		// 経過表示
@@ -819,16 +822,17 @@ error_end:
 		_aligned_free(buf1);

 #ifdef TIMER
-time_total += GetTickCount() - time1_start;
+time_total += clock() - time1_start;
 if (*prog_now == total_file_size){
-	printf("\nread  %d.%03d sec\n", time2_total / 1000, time2_total % 1000);
-	printf("main  %d.%03d sec\n", time3_total / 1000, time3_total % 1000);
-	if (time_total > 0){
-		time_start = (int)((total_file_size * 125) / ((__int64)time_total * 131072));
+	printf("\nread  %.3f sec\n", (double)time2_total / CLOCKS_PER_SEC);
+	printf("main  %.3f sec\n", (double)time3_total / CLOCKS_PER_SEC);
+	time_sec = (double)time_total / CLOCKS_PER_SEC;
+	if (time_sec > 0){
+		time_speed = (double)total_file_size / (time_sec * 1048576);
 	} else {
-		time_start = 0;
+		time_speed = 0;
 	}
-	printf("total %d.%03d sec, %d MB/s\n", time_total / 1000, time_total % 1000, time_start);
+	printf("total %.3f sec, %.0f MB/s\n", time_sec, time_speed);
 }
 #endif
 	return err;
@@ -1038,7 +1042,7 @@ int file_hash_check(
 	PHMD5 hash_ctx, block_ctx;
 	OVERLAPPED ol;
 #ifdef TIMER
-time1_start = GetTickCount();
+time1_start = clock();
 #endif

 	prog_last = -1;	// 検証中のファイル名を毎回表示する
@@ -1062,11 +1066,11 @@ time1_start = GetTickCount();
 		file_left = file_size - 16384;	// 本来のファイル・サイズまでしか検査しない
 	}
 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 	off = ReadFile(hFile, buf, len, NULL, &ol);
 #ifdef TIMER
-time2_total += GetTickCount() - time_start;
+time2_total += clock() - time_start;
 #endif
 	if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
 		print_win32_err();
@@ -1141,11 +1145,11 @@ time2_total += GetTickCount() - time_start;
 	if (file_left < IO_SIZE)
 		read_size = (unsigned int)file_left;
 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 	off = ReadFile(hFile, buf1, read_size, NULL, &ol);
 #ifdef TIMER
-time2_total += GetTickCount() - time_start;
+time2_total += clock() - time_start;
 #endif
 	if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
 		print_win32_err();
@@ -1168,11 +1172,11 @@ time2_total += GetTickCount() - time_start;
 			ol.OffsetHigh = (unsigned int)(file_off >> 32);
 			file_off += IO_SIZE;
 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 			off = ReadFile(hFile, buf, read_size, NULL, &ol);
 #ifdef TIMER
-time2_total += GetTickCount() - time_start;
+time2_total += clock() - time_start;
 #endif
 			if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
 				print_win32_err();
@@ -1187,7 +1191,7 @@ time2_total += GetTickCount() - time_start;
 		}

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		if (s_blk != NULL){
 			off = 0;
@@ -1230,7 +1234,7 @@ time_start = GetTickCount();
 			Phmd5Process(&hash_ctx, buf, len);	// MD5 計算
 		}
 #ifdef TIMER
-time3_total += GetTickCount() - time_start;
+time3_total += clock() - time_start;
 #endif

 		// 経過表示
@@ -1267,15 +1271,16 @@ error_end:
 		CloseHandle(ol.hEvent);

 #ifdef TIMER
-time_total += GetTickCount() - time1_start;
-	printf("\nread  %d.%03d sec\n", time2_total / 1000, time2_total % 1000);
-	printf("main  %d.%03d sec\n", time3_total / 1000, time3_total % 1000);
-	if (time_total > 0){
-		time_start = (int)((file_size * 125) / ((__int64)time_total * 131072));
+time_total += clock() - time1_start;
+	printf("\nread  %.3f sec\n", (double)time2_total / CLOCKS_PER_SEC);
+	printf("main  %.3f sec\n", (double)time3_total / CLOCKS_PER_SEC);
+	time_sec = (double)time_total / CLOCKS_PER_SEC;
+	if (time_sec > 0){
+		time_speed = (double)file_size / (time_sec * 1048576);
 	} else {
-		time_start = 0;
+		time_speed = 0;
 	}
-	printf("total %d.%03d sec, %d MB/s\n", time_total / 1000, time_total % 1000, time_start);
+	printf("total %.3f sec, %.0f MB/s\n", time_sec, time_speed);
 #endif
 	return comp_num;
 }
@@ -1536,7 +1541,7 @@ int file_hash_direct(
 	HANDLE hFile;
 	OVERLAPPED ol;
 #ifdef TIMER
-time1_start = GetTickCount();
+time1_start = clock();
 #endif

 	prog_last = -1;	// 検証中のファイル名を毎回表示する
@@ -1592,11 +1597,11 @@ time1_start = GetTickCount();
 		file_left = file_size - 16384;	// 本来のファイル・サイズまでしか検査しない
 	}
 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 	off = ReadFile(hFile, buf, read_size, NULL, &ol);
 #ifdef TIMER
-time2_total += GetTickCount() - time_start;
+time2_total += clock() - time_start;
 #endif
 	if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
 		comp_num = -1;
@@ -1679,11 +1684,11 @@ time2_total += GetTickCount() - time_start;
 		read_size = (read_size + 4095) & ~4095;	// 4KB の倍数にする
 	}
 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 	off = ReadFile(hFile, buf1, read_size, NULL, &ol);
 #ifdef TIMER
-time2_total += GetTickCount() - time_start;
+time2_total += clock() - time_start;
 #endif
 	if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
 		print_win32_err();
@@ -1710,11 +1715,11 @@ time2_total += GetTickCount() - time_start;
 			ol.OffsetHigh = (unsigned int)(file_off >> 32);
 			file_off += IO_SIZE;
 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 			off = ReadFile(hFile, buf, read_size, NULL, &ol);
 #ifdef TIMER
-time2_total += GetTickCount() - time_start;
+time2_total += clock() - time_start;
 #endif
 			if ((off == 0) && (GetLastError() != ERROR_IO_PENDING)){
 				print_win32_err();
@@ -1729,7 +1734,7 @@ time2_total += GetTickCount() - time_start;
 		}

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		if (s_blk != NULL){
 			off = 0;
@@ -1771,7 +1776,7 @@ time_start = GetTickCount();
 			Phmd5Process(&hash_ctx, buf, len);	// MD5 計算
 		}
 #ifdef TIMER
-time3_total += GetTickCount() - time_start;
+time3_total += clock() - time_start;
 #endif

 		// 経過表示
@@ -1812,10 +1817,16 @@ error_end:
 		_aligned_free(buf1);

 #ifdef TIMER
-time_total += GetTickCount() - time1_start;
-	printf("\nread  %d.%03d sec\n", time2_total / 1000, time2_total % 1000);
-	printf("main  %d.%03d sec\n", time3_total / 1000, time3_total % 1000);
-	printf("total %d.%03d sec\n", time_total / 1000, time_total % 1000);
+time_total += clock() - time1_start;
+	printf("\nread  %.3f sec\n", (double)time2_total / CLOCKS_PER_SEC);
+	printf("main  %.3f sec\n", (double)time3_total / CLOCKS_PER_SEC);
+	time_sec = (double)time_total / CLOCKS_PER_SEC;
+	if (time_sec > 0){
+		time_speed = (double)file_size / (time_sec * 1048576);
+	} else {
+		time_speed = 0;
+	}
+	printf("total %.3f sec, %.0f MB/s\n", time_sec, time_speed);
 #endif
 	return comp_num;
 }
--- a/source/par2j/par2_cmd.c
+++ b/source/par2j/par2_cmd.c
@@ -1,5 +1,5 @@
 // par2_cmd.c
-// Copyright : 2023-10-15 Yutaka Sawada
+// Copyright : 2023-12-09 Yutaka Sawada
 // License : GPL

 #ifndef _UNICODE
@@ -1479,14 +1479,12 @@ ri= switch_set & 0x00040000
 			} else if (wcsncmp(tmp_p, L"lc", 2) == 0){
 				k = 0;
 				j = 2;
-				while ((j < 2 + 5) && (tmp_p[j] >= '0') && (tmp_p[j] <= '9')){
+				while ((j < 2 + 7) && (tmp_p[j] >= '0') && (tmp_p[j] <= '9')){
 					k = (k * 10) + (tmp_p[j] - '0');
 					j++;
 				}
-				if (k & 256){	// GPU を使う
-					OpenCL_method = 1;	// Faster GPU
-				} else if (k & 512){
-					OpenCL_method = -1;	// Slower GPU
+				if (k & 0x300){	// GPU を使う
+					OpenCL_method = k & 0x003F0300;
 				}
 				if (k & 1024)	// CLMUL と ALTMAP を使わない
 					cpu_flag = (cpu_flag & 0xFFFFFFF7) | 256;
--- a/source/par2j/reedsolomon.c
+++ b/source/par2j/reedsolomon.c
@@ -1,5 +1,5 @@
 // reedsolomon.c
-// Copyright : 2023-10-26 Yutaka Sawada
+// Copyright : 2023-12-12 Yutaka Sawada
 // License : GPL

 #ifndef _UNICODE
@@ -27,6 +27,9 @@
 #include "rs_decode.h"
 #include "reedsolomon.h"

+#ifdef TIMER
+#include <time.h>
+#endif

 // GPU を使う最小データサイズ (MB 単位)
 // GPU の起動には時間がかかるので、データが小さすぎると逆に遅くなる
@@ -739,7 +742,7 @@ int rs_encode(
 	int err = 0;
 	unsigned int len;
 #ifdef TIMER
-unsigned int time_total = GetTickCount();
+clock_t time_total = clock();
 #endif

 	if (galois_create_table()){
@@ -755,7 +758,7 @@ unsigned int time_total = GetTickCount();
 	// パリティ計算用の行列演算の準備をする
 	len = sizeof(unsigned short) * source_num;
 	if (OpenCL_method != 0)
-		len *= 2;	// GPU の作業領域も確保しておく
+		len *= 3;	// GPU の作業領域も確保しておく
 	constant = malloc(len);
 	if (constant == NULL){
 		printf("malloc, %d\n", len);
@@ -799,8 +802,8 @@ unsigned int time_total = GetTickCount();
 		err = encode_method2(file_path, header_buf, rcv_hFile, files, s_blk, p_blk, constant);
 #ifdef TIMER
 	if (err != 1){
-		time_total = GetTickCount() - time_total;
-		printf("total  %d.%03d sec\n", time_total / 1000, time_total % 1000);
+		time_total = clock() - time_total;
+		printf("total  %.3f sec\n", (double)time_total / CLOCKS_PER_SEC);
 	}
 #endif

@@ -830,7 +833,7 @@ int rs_encode_1pass(
 	int err = 0;
 	unsigned int len;
 #ifdef TIMER
-unsigned int time_total = GetTickCount();
+clock_t time_total = clock();
 #endif

 	if (galois_create_table()){
@@ -841,7 +844,7 @@ unsigned int time_total = GetTickCount();
 	// パリティ計算用の行列演算の準備をする
 	len = sizeof(unsigned short) * source_num;
 	if (OpenCL_method != 0)
-		len *= 2;	// GPU の作業領域も確保しておく
+		len *= 3;	// GPU の作業領域も確保しておく
 	constant = malloc(len);
 	if (constant == NULL){
 		printf("malloc, %d\n", len);
@@ -888,8 +891,8 @@ unsigned int time_total = GetTickCount();
 	if (err < 0){
 		printf("switching to 2-pass processing, %d\n", err);
 	} else if (err != 1){
-		time_total = GetTickCount() - time_total;
-		printf("total  %d.%03d sec\n", time_total / 1000, time_total % 1000);
+		time_total = clock() - time_total;
+		printf("total  %.3f sec\n", (double)time_total / CLOCKS_PER_SEC);
 	}
 #endif

@@ -913,7 +916,7 @@ int rs_decode(
 	int err = 0, i, j, k;
 	unsigned int len;
 #ifdef TIMER
-unsigned int time_matrix = 0, time_total = GetTickCount();
+clock_t time_matrix = 0, time_total = clock();
 #endif

 	if (galois_create_table()){
@@ -948,7 +951,7 @@ unsigned int time_matrix = 0, time_total = GetTickCount();
 	id = mat + (block_lost * source_num);

 #ifdef TIMER
-time_matrix = GetTickCount();
+time_matrix = clock();
 #endif
 	// 復元用の行列を計算する
 	print_progress_text(0, "Computing matrix");
@@ -989,7 +992,7 @@ time_matrix = GetTickCount();
 	//for (i = 0; i < block_lost; i++)
 	//	printf("id[%d] = %d\n", i, id[i]);
 #ifdef TIMER
-time_matrix = GetTickCount() - time_matrix;
+time_matrix = clock() - time_matrix;
 #endif

 #ifdef TIMER
@@ -1032,9 +1035,9 @@ time_matrix = GetTickCount() - time_matrix;
 		err = decode_method2(file_path, block_lost, rcv_hFile, files, s_blk, p_blk, mat);
 #ifdef TIMER
 	if (err != 1){
-		time_total = GetTickCount() - time_total;
-		printf("total  %d.%03d sec\n", time_total / 1000, time_total % 1000);
-		printf("matrix %d.%03d sec\n", time_matrix / 1000, time_matrix % 1000);
+		time_total = clock() - time_total;
+		printf("total  %.3f sec\n", (double)time_total / CLOCKS_PER_SEC);
+		printf("matrix %.3f sec\n", (double)time_matrix / CLOCKS_PER_SEC);
 	}
 #endif

--- a/source/par2j/res_par2j.rc
+++ b/source/par2j/res_par2j.rc
@@ -1,7 +1,7 @@
 1 RT_STRING ".\\source.cl"

 1 VERSIONINFO
-FILEVERSION 1,3,3,1
+FILEVERSION 1,3,3,3
 PRODUCTVERSION 1,3,3,0
 FILEOS 0x40004
 FILETYPE 0x1
@@ -11,9 +11,9 @@ BLOCK "StringFileInfo"
 	BLOCK "040904B0"
 	{
 		VALUE "FileDescription", "PAR2 client"
-		VALUE "LegalCopyright", "Copyright (C) 2023 Yutaka Sawada"
+		VALUE "LegalCopyright", "Copyright (C) 2024 Yutaka Sawada"
 		VALUE "ProductName", "par2j"
-		VALUE "FileVersion", "1.3.3.1"
+		VALUE "FileVersion", "1.3.3.3"
 		VALUE "ProductVersion", "1.3.3.0"
 	}
 }
--- a/source/par2j/rs_decode.c
+++ b/source/par2j/rs_decode.c
@@ -1,5 +1,5 @@
 // rs_decode.c
-// Copyright : 2023-10-29 Yutaka Sawada
+// Copyright : 2023-12-13 Yutaka Sawada
 // License : GPL

 #ifndef _UNICODE
@@ -28,7 +28,9 @@


 #ifdef TIMER
-static unsigned int time_start, time_read = 0, time_write = 0, time_calc = 0;
+#include <time.h>
+static double time_sec, time_speed;
+static clock_t time_start, time_read = 0, time_write = 0, time_calc = 0;
 static unsigned int read_count, write_count = 0, skip_count;
 #endif

@@ -60,7 +62,7 @@ static DWORD WINAPI thread_decode2(LPVOID lpParameter)
 	RS_TH *th;
 #ifdef TIMER
 unsigned int loop_count2a = 0, loop_count2b = 0;
-unsigned int time_start2, time_encode2a = 0, time_encode2b = 0;
+clock_t time_start2, time_encode2a = 0, time_encode2b = 0;
 #endif

 	th = (RS_TH *)lpParameter;
@@ -78,7 +80,7 @@ unsigned int time_start2, time_encode2a = 0, time_encode2b = 0;
 	WaitForSingleObject(hRun, INFINITE);	// 計算開始の合図を待つ
 	while (th->now < INT_MAX / 2){
 #ifdef TIMER
-time_start2 = GetTickCount();
+time_start2 = clock();
 #endif
 		s_buf = th->buf;
 		factor = th->mat;
@@ -95,7 +97,7 @@ loop_count2a++;
 #endif
 			}
 #ifdef TIMER
-time_encode2a += GetTickCount() - time_start2;
+time_encode2a += clock() - time_start2;
 #endif
 		} else {	// 消失ブロックを部分的に保持する場合
 			// スレッドごとに復元する消失ブロックの chunk を変える
@@ -136,7 +138,7 @@ loop_count2b += src_num;
 #endif
 			}
 #ifdef TIMER
-time_encode2b += GetTickCount() - time_start2;
+time_encode2b += clock() - time_start2;
 #endif
 		}
 		//_mm_sfence();	// メモリーへの書き込みを完了する
@@ -146,19 +148,21 @@ time_encode2b += GetTickCount() - time_start2;
 #ifdef TIMER
 loop_count2b /= chunk_num;	// chunk数で割ってブロック数にする
 printf("sub-thread : total loop = %d\n", loop_count2a + loop_count2b);
-if (time_encode2a > 0){
-	i = (int)((__int64)loop_count2a * unit_size * 125 / ((__int64)time_encode2a * 131072));
+time_sec = (double)time_encode2a / CLOCKS_PER_SEC;
+if (time_sec > 0){
+	time_speed = ((double)loop_count2a * unit_size) / (time_sec * 1048576);
 } else {
-	i = 0;
+	time_speed = 0;
 }
 if (loop_count2a > 0)
-	printf(" 1st decode %d.%03d sec, %d loop, %d MB/s\n", time_encode2a / 1000, time_encode2a % 1000, loop_count2a, i);
-if (time_encode2b > 0){
-	i = (int)((__int64)loop_count2b * unit_size * 125 / ((__int64)time_encode2b * 131072));
+	printf(" 1st decode %.3f sec, %d loop, %.0f MB/s\n", time_sec, loop_count2a, time_speed);
+time_sec = (double)time_encode2b / CLOCKS_PER_SEC;
+if (time_sec > 0){
+	time_speed = ((double)loop_count2b * unit_size) / (time_sec * 1048576);
 } else {
-	i = 0;
+	time_speed = 0;
 }
-printf(" 2nd decode %d.%03d sec, %d loop, %d MB/s\n", time_encode2b / 1000, time_encode2b % 1000, loop_count2b, i);
+printf(" 2nd decode %.3f sec, %d loop, %.0f MB/s\n", time_sec, loop_count2b, time_speed);
 #endif

 	// 終了処理
@@ -178,7 +182,7 @@ static DWORD WINAPI thread_decode3(LPVOID lpParameter)
 	RS_TH *th;
 #ifdef TIMER
 unsigned int loop_count2a = 0, loop_count2b = 0;
-unsigned int time_start2, time_encode2a = 0, time_encode2b = 0;
+clock_t time_start2, time_encode2a = 0, time_encode2b = 0;
 #endif

 	th = (RS_TH *)lpParameter;
@@ -197,7 +201,7 @@ unsigned int time_start2, time_encode2a = 0, time_encode2b = 0;
 	WaitForSingleObject(hRun, INFINITE);	// 計算開始の合図を待つ
 	while (th->now < INT_MAX / 2){
 #ifdef TIMER
-time_start2 = GetTickCount();
+time_start2 = clock();
 #endif
 		s_buf = th->buf;
 		factor = th->mat;
@@ -214,7 +218,7 @@ loop_count2a++;
 #endif
 			}
 #ifdef TIMER
-time_encode2a += GetTickCount() - time_start2;
+time_encode2a += clock() - time_start2;
 #endif
 		} else {	// 全ての消失ブロックを保持する場合
 			// スレッドごとに復元する消失ブロックの chunk を変える
@@ -250,7 +254,7 @@ loop_count2b += src_num;
 #endif
 			}
 #ifdef TIMER
-time_encode2b += GetTickCount() - time_start2;
+time_encode2b += clock() - time_start2;
 #endif
 		}
 		//_mm_sfence();	// メモリーへの書き込みを完了する
@@ -260,19 +264,21 @@ time_encode2b += GetTickCount() - time_start2;
 #ifdef TIMER
 loop_count2b /= chunk_num;	// chunk数で割ってブロック数にする
 printf("sub-thread : total loop = %d\n", loop_count2a + loop_count2b);
-if (time_encode2a > 0){
-	i = (int)((__int64)loop_count2a * unit_size * 125 / ((__int64)time_encode2a * 131072));
+time_sec = (double)time_encode2a / CLOCKS_PER_SEC;
+if (time_sec > 0){
+	time_speed = ((double)loop_count2a * unit_size) / (time_sec * 1048576);
 } else {
-	i = 0;
+	time_speed = 0;
 }
 if (loop_count2a > 0)
-	printf(" 1st decode %d.%03d sec, %d loop, %d MB/s\n", time_encode2a / 1000, time_encode2a % 1000, loop_count2a, i);
-if (time_encode2b > 0){
-	i = (int)((__int64)loop_count2b * unit_size * 125 / ((__int64)time_encode2b * 131072));
+	printf(" 1st decode %.3f sec, %d loop, %.0f MB/s\n", time_sec, loop_count2a, time_speed);
+time_sec = (double)time_encode2b / CLOCKS_PER_SEC;
+if (time_sec > 0){
+	time_speed = ((double)loop_count2b * unit_size) / (time_sec * 1048576);
 } else {
-	i = 0;
+	time_speed = 0;
 }
-printf(" 2nd decode %d.%03d sec, %d loop, %d MB/s\n", time_encode2b / 1000, time_encode2b % 1000, loop_count2b, i);
+printf(" 2nd decode %.3f sec, %d loop, %.0f MB/s\n", time_sec, loop_count2b, time_speed);
 #endif

 	// 終了処理
@@ -292,7 +298,8 @@ static DWORD WINAPI thread_decode_gpu(LPVOID lpParameter)
 	HANDLE hRun, hEnd;
 	RS_TH *th;
 #ifdef TIMER
-unsigned int time_start2, time_encode2 = 0, loop_count2 = 0;
+unsigned int loop_count2 = 0;
+clock_t time_start2, time_encode2 = 0;
 #endif

 	th = (RS_TH *)lpParameter;
@@ -307,7 +314,7 @@ unsigned int time_start2, time_encode2 = 0, loop_count2 = 0;
 	WaitForSingleObject(hRun, INFINITE);	// 計算開始の合図を待つ
 	while (th->now < INT_MAX / 2){
 #ifdef TIMER
-time_start2 = GetTickCount();
+time_start2 = clock();
 #endif
 		// GPUはソース・ブロック読み込み中に呼ばれない
 		s_buf = th->buf;
@@ -321,22 +328,58 @@ time_start2 = GetTickCount();
 			InterlockedExchange(&(th->now), INT_MAX / 3);	// サブ・スレッドの計算を中断する
 		}

-		// スレッドごとに復元する消失ブロックを変える
-		while ((j = InterlockedIncrement(&(th->now))) < block_lost){	// j = ++th_now
-			// 倍率は逆行列から部分的にコピーする
-			i = gpu_multiply_blocks(src_num, factor + source_num * j, g_buf + (size_t)unit_size * j, unit_size);
-			if (i != 0){
-				th->len = i;
-				InterlockedExchange(&(th->now), INT_MAX / 3);	// サブ・スレッドの計算を中断する
-				break;
-			}
+		// 一つの GPUスレッドが全ての消失ブロックを処理する
+		if (OpenCL_method & 8){	// ２ブロックずつ計算する
+			// 消失ブロック数が奇数なら、最初の一個だけ別に計算する
+			if (block_lost & 1){
+				InterlockedIncrement(&(th->now));	// 常に j = 0 となる
+
+				// 倍率は逆行列から部分的にコピーする
+				i = gpu_multiply_blocks(src_num, factor, NULL, g_buf, unit_size);
+				if (i != 0){
+					th->len = i;
+					InterlockedExchange(&(th->now), INT_MAX / 3);	// サブ・スレッドの計算を中断する
+					break;
+				}

 #ifdef TIMER
 loop_count2 += src_num;
 #endif
+			}
+
+			// 残りのブロックは二個ずつ計算する
+			while ((j = InterlockedAdd(&(th->now), 2)) < block_lost){	// th_now += 2, j = th_now
+				j--;	// +2 してるから、最初のブロックは -1 する
+
+				// 倍率は逆行列から部分的に２回コピーする
+				i = gpu_multiply_blocks(src_num, factor + source_num * j, factor + source_num * (j + 1), g_buf + (size_t)unit_size * j, unit_size * 2);
+				if (i != 0){
+					th->len = i;
+					InterlockedExchange(&(th->now), INT_MAX / 3);	// サブ・スレッドの計算を中断する
+					break;
+				}
+#ifdef TIMER
+loop_count2 += src_num * 2;
+#endif
+			}
+
+		} else {	// 以前からの１ブロックずつ計算する方式
+			while ((j = InterlockedIncrement(&(th->now))) < block_lost){	// j = ++th_now
+				// 倍率は逆行列から部分的にコピーする（２ブロックずつの場合はブロック数をマイナスにする）
+				i = gpu_multiply_blocks(src_num, factor + source_num * j, NULL, g_buf + (size_t)unit_size * j, unit_size);
+				if (i != 0){
+					th->len = i;
+					InterlockedExchange(&(th->now), INT_MAX / 3);	// サブ・スレッドの計算を中断する
+					break;
+				}
+
+#ifdef TIMER
+loop_count2 += src_num;
+#endif
+			}
 		}
 #ifdef TIMER
-time_encode2 += GetTickCount() - time_start2;
+time_encode2 += clock() - time_start2;
 #endif
 		// 最後にVRAMを解放する
 		i = gpu_finish();
@@ -349,12 +392,13 @@ time_encode2 += GetTickCount() - time_start2;
 	}
 #ifdef TIMER
 printf("gpu-thread :\n");
-if (time_encode2 > 0){
-	i = (int)((__int64)loop_count2 * unit_size * 125 / ((__int64)time_encode2 * 131072));
+time_sec = (double)time_encode2 / CLOCKS_PER_SEC;
+if (time_sec > 0){
+	time_speed = ((double)loop_count2 * unit_size) / (time_sec * 1048576);
 } else {
-	i = 0;
+	time_speed = 0;
 }
-printf(" 2nd decode %d.%03d sec, %d loop, %d MB/s\n", time_encode2 / 1000, time_encode2 % 1000, loop_count2, i);
+printf(" 2nd decode %.3f sec, %d loop, %.0f MB/s\n", time_sec, loop_count2, time_speed);
 #endif

 	// 終了処理
@@ -430,7 +474,7 @@ int decode_method1(	// ソース・ブロックが一個だけの場合
 	block_off = 0;
 	while (block_off < block_size){
 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		// パリティ・ブロックを読み込む
 		len = block_size - block_off;
@@ -447,18 +491,18 @@ time_start = GetTickCount();
 		// パリティ・ブロックのチェックサムを計算する
 		checksum16_altmap(buf, buf + io_size, io_size);
 #ifdef TIMER
-time_read += GetTickCount() - time_start;
+time_read += clock() - time_start;
 #endif

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		// 失われたソース・ブロックを復元する
 		memset(work_buf, 0, unit_size);
 		// factor で割ると元に戻る
 		galois_align_multiply(buf, work_buf, unit_size, galois_divide(1, galois_power(2, id)));
 #ifdef TIMER
-time_calc += GetTickCount() - time_start;
+time_calc += clock() - time_start;
 #endif

 		// 経過表示
@@ -472,7 +516,7 @@ time_calc += GetTickCount() - time_start;
 		}

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		// 復元されたソース・ブロックのチェックサムを検証する
 		checksum16_return(work_buf, hash, io_size);
@@ -491,7 +535,7 @@ time_start = GetTickCount();
 			goto error_end;
 		}
 #ifdef TIMER
-time_write += GetTickCount() - time_start;
+time_write += clock() - time_start;
 #endif

 		block_off += io_size;
@@ -499,9 +543,9 @@ time_write += GetTickCount() - time_start;
 	print_progress_done();	// 末尾ブロックの断片化によっては 100% で完了するとは限らない

 #ifdef TIMER
-printf("read   %d.%03d sec\n", time_read / 1000, time_read % 1000);
-printf("write  %d.%03d sec\n", time_write / 1000, time_write % 1000);
-printf("decode %d.%03d sec\n", time_calc / 1000, time_calc % 1000);
+printf("read   %.3f sec\n", (double)time_read / CLOCKS_PER_SEC);
+printf("write  %.3f sec\n", (double)time_write / CLOCKS_PER_SEC);
+printf("decode %.3f sec\n", (double)time_calc / CLOCKS_PER_SEC);
 #endif

 error_end:
@@ -623,7 +667,7 @@ int decode_method2(	// ソース・データを全て読み込む場合
 #ifdef TIMER
 read_count = 0;
 skip_count = 0;
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		last_file = -1;
 		recv_now = 0;	// 何番目の代替ブロックか
@@ -760,7 +804,7 @@ skip_count++;
 			hFile = NULL;
 		}
 #ifdef TIMER
-time_read += GetTickCount() - time_start;
+time_read += clock() - time_start;
 #endif

 		WaitForMultipleObjects(cpu_num1, hEnd, TRUE, INFINITE);	// サブ・スレッドの計算終了の合図を待つ
@@ -845,7 +889,7 @@ skip_count++;
 			}

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 			// 復元されたブロックを書き込む
 			work_buf = p_buf;
@@ -916,7 +960,7 @@ write_count++;
 				}
 			}
 #ifdef TIMER
-time_write += GetTickCount() - time_start;
+time_write += clock() - time_start;
 #endif

 			part_off += part_num;	// 次の消失ブロック位置にする
@@ -930,9 +974,9 @@ time_write += GetTickCount() - time_start;
 	print_progress_done();

 #ifdef TIMER
-printf("read   %d.%03d sec\n", time_read / 1000, time_read % 1000);
+printf("read   %.3f sec\n", (double)time_read / CLOCKS_PER_SEC);
 j = ((block_size + io_size - 1) / io_size) * block_lost;
-printf("write  %d.%03d sec, count = %d/%d\n", time_write / 1000, time_write % 1000, write_count, j);
+printf("write  %.3f sec, count = %d/%d\n", (double)time_write / CLOCKS_PER_SEC, write_count, j);
 if (prog_num != prog_base)
 	printf(" prog_num = %I64d, prog_base = %I64d\n", prog_num, prog_base);
 #endif
@@ -1063,7 +1107,7 @@ int decode_method3(	// 復元するブロックを全て保持できる場合

 #ifdef TIMER
 read_count = 0;
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		last_file = -1;
 		for (i = 0; i < read_num; i++){	// スライスを一個ずつ読み込んでメモリー上に配置していく
@@ -1173,7 +1217,7 @@ read_count++;
 			hFile = NULL;
 		}
 #ifdef TIMER
-time_read += GetTickCount() - time_start;
+time_read += clock() - time_start;
 #endif

 		WaitForMultipleObjects(cpu_num1, hEnd, TRUE, INFINITE);	// サブ・スレッドの計算終了の合図を待つ
@@ -1238,7 +1282,7 @@ time_read += GetTickCount() - time_start;
 	}

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 	// 復元されたブロックを書き込む
 	work_buf = p_buf;
@@ -1297,7 +1341,7 @@ time_start = GetTickCount();
 		}
 	}
 #ifdef TIMER
-time_write += GetTickCount() - time_start;
+time_write += clock() - time_start;
 #endif
 	// 最後の書き込みファイルを閉じる
 	CloseHandle(hFile);
@@ -1305,8 +1349,8 @@ time_write += GetTickCount() - time_start;
 	print_progress_done();

 #ifdef TIMER
-printf("read   %d.%03d sec\n", time_read / 1000, time_read % 1000);
-printf("write  %d.%03d sec\n", time_write / 1000, time_write % 1000);
+printf("read   %.3f sec\n", (double)time_read / CLOCKS_PER_SEC);
+printf("write  %.3f sec\n", (double)time_write / CLOCKS_PER_SEC);
 if (prog_num != prog_base)
 	printf(" prog_num = %I64d, prog_base = %I64d\n", prog_num, prog_base);
 #endif
@@ -1463,7 +1507,7 @@ int decode_method4(	// 全てのブロックを断片的に保持する場合 (G
 #ifdef TIMER
 read_count = 0;
 skip_count = 0;
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		last_file = -1;
 		recv_now = 0;	// 何番目の代替ブロックか
@@ -1600,7 +1644,7 @@ skip_count++;
 			hFile = NULL;
 		}
 #ifdef TIMER
-time_read += GetTickCount() - time_start;
+time_read += clock() - time_start;
 #endif

 		memset(g_buf, 0, (size_t)unit_size * block_lost);	// 待機中に GPU用の領域をゼロ埋めしておく
@@ -1731,10 +1775,21 @@ skip_count++;
 #endif
 				} else if (src_off + src_num + src_max > source_num){
 					src_num = source_num - src_off - src_max;
-					if ((src_num < src_max) && (src_num + src_max <= vram_max) && (gpu_end * 2 > cpu_end)){
-						src_num += src_max;	// GPU担当量が少なくて、余裕がある場合は、残りも全て任せる
+					if (src_num < src_max){
+						if ((src_num + src_max <= vram_max) && (gpu_end * 2 > cpu_end)){
+							src_num += src_max;	// GPU担当量が少なくて、余裕がある場合は、残りも全て任せる
+#ifdef TIMER
+							printf("GPU last +: src_off = %d, src_num = %d + %d\n", src_off, src_num - src_max, src_max);
+#endif
+						} else if (src_num < src_max / 4){
+							src_num = src_max / 4;	// src_num が小さくなり過ぎないようにする
+#ifdef TIMER
+							printf("GPU last ?: src_off = %d, src_num = %d\n", src_off, src_num);
+						} else {
+							printf("GPU last -: src_off = %d, src_num = %d\n", src_off, src_num);
+#endif
+						}
 #ifdef TIMER
-						printf("GPU last +: src_off = %d, src_num = %d + %d\n", src_off, src_num - src_max, src_max);
 					} else {
 						printf("GPU last 2: src_off = %d, src_num = %d\n", src_off, src_num);
 #endif
@@ -1834,7 +1889,7 @@ skip_count++;
 			prog_num += th->size * block_lost;

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		// 復元されたブロックを書き込む
 		work_buf = p_buf;
@@ -1907,7 +1962,7 @@ write_count++;
 			}
 		}
 #ifdef TIMER
-time_write += GetTickCount() - time_start;
+time_write += clock() - time_start;
 #endif

 		block_off += io_size;
@@ -1918,9 +1973,9 @@ time_write += GetTickCount() - time_start;
 	print_progress_done();

 #ifdef TIMER
-printf("read   %d.%03d sec\n", time_read / 1000, time_read % 1000);
+printf("read   %.3f sec\n", (double)time_read / CLOCKS_PER_SEC);
 j = ((block_size + io_size - 1) / io_size) * block_lost;
-printf("write  %d.%03d sec, count = %d/%d\n", time_write / 1000, time_write % 1000, write_count, j);
+printf("write  %.3f sec, count = %d/%d\n", (double)time_write / CLOCKS_PER_SEC, write_count, j);
 if (prog_num != prog_base)
 	printf(" prog_num = %I64d, prog_base = %I64d\n", prog_num, prog_base);
 #endif
@@ -2085,7 +2140,7 @@ int decode_method5(	// 復元するブロックだけ保持する場合 (GPU対

 #ifdef TIMER
 read_count = 0;
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		last_file = -1;
 		for (i = 0; i < read_num; i++){	// スライスを一個ずつ読み込んでメモリー上に配置していく
@@ -2195,7 +2250,7 @@ read_count++;
 			hFile = NULL;
 		}
 #ifdef TIMER
-time_read += GetTickCount() - time_start;
+time_read += clock() - time_start;
 #endif

 		if (source_off == 0)
@@ -2318,10 +2373,21 @@ time_read += GetTickCount() - time_start;
 #endif
 				} else if (src_off + src_num + src_max > read_num){
 					src_num = read_num - src_off - src_max;
-					if ((src_num < src_max) && (src_num + src_max <= vram_max) && (gpu_end * 2 > cpu_end)){
-						src_num += src_max;	// GPU担当量が少なくて、余裕がある場合は、残りも全て任せる
+					if (src_num < src_max){
+						if ((src_num + src_max <= vram_max) && (gpu_end * 2 > cpu_end)){
+							src_num += src_max;	// GPU担当量が少なくて、余裕がある場合は、残りも全て任せる
+#ifdef TIMER
+							printf("GPU last +: src_off = %d, src_num = %d + %d\n", src_off, src_num - src_max, src_max);
+#endif
+						} else if (src_num < src_max / 4){
+							src_num = src_max / 4;	// src_num が小さくなり過ぎないようにする
+#ifdef TIMER
+							printf("GPU last ?: src_off = %d, src_num = %d\n", src_off, src_num);
+						} else {
+							printf("GPU last -: src_off = %d, src_num = %d\n", src_off, src_num);
+#endif
+						}
 #ifdef TIMER
-						printf("GPU last +: src_off = %d, src_num = %d + %d\n", src_off, src_num - src_max, src_max);
 					} else {
 						printf("GPU last 2: src_off = %d, src_num = %d\n", src_off, src_num);
 #endif
@@ -2424,7 +2490,7 @@ time_read += GetTickCount() - time_start;
 	}

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 	// 復元されたブロックを書き込む
 	work_buf = p_buf;
@@ -2485,7 +2551,7 @@ time_start = GetTickCount();
 		}
 	}
 #ifdef TIMER
-time_write += GetTickCount() - time_start;
+time_write += clock() - time_start;
 #endif
 	// 最後の書き込みファイルを閉じる
 	CloseHandle(hFile);
@@ -2493,8 +2559,8 @@ time_write += GetTickCount() - time_start;
 	print_progress_done();

 #ifdef TIMER
-printf("read   %d.%03d sec\n", time_read / 1000, time_read % 1000);
-printf("write  %d.%03d sec\n", time_write / 1000, time_write % 1000);
+printf("read   %.3f sec\n", (double)time_read / CLOCKS_PER_SEC);
+printf("write  %.3f sec\n", (double)time_write / CLOCKS_PER_SEC);
 if (prog_num != prog_base)
 	printf(" prog_num = %I64d, prog_base = %I64d\n", prog_num, prog_base);
 #endif
--- a/source/par2j/rs_encode.c
+++ b/source/par2j/rs_encode.c
@@ -1,5 +1,5 @@
 // rs_encode.c
-// Copyright : 2023-10-29 Yutaka Sawada
+// Copyright : 2023-12-18 Yutaka Sawada
 // License : GPL

 #ifndef _UNICODE
@@ -29,7 +29,9 @@


 #ifdef TIMER
-static unsigned int time_start, time_read = 0, time_write = 0, time_calc = 0;
+#include <time.h>
+static double time_sec, time_speed;
+static clock_t time_start, time_read = 0, time_write = 0, time_calc = 0;
 static unsigned int read_count, skip_count;
 #endif

@@ -61,7 +63,7 @@ static DWORD WINAPI thread_encode2(LPVOID lpParameter)
 	RS_TH *th;
 #ifdef TIMER
 unsigned int loop_count2a = 0, loop_count2b = 0;
-unsigned int time_start2, time_encode2a = 0, time_encode2b = 0;
+clock_t time_start2, time_encode2a = 0, time_encode2b = 0;
 #endif

 	th = (RS_TH *)lpParameter;
@@ -80,7 +82,7 @@ unsigned int time_start2, time_encode2a = 0, time_encode2b = 0;
 	WaitForSingleObject(hRun, INFINITE);	// 計算開始の合図を待つ
 	while (th->now < INT_MAX / 2){
 #ifdef TIMER
-time_start2 = GetTickCount();
+time_start2 = clock();
 #endif
 		s_buf = th->buf;
 		src_off = th->off;	// ソース・ブロック番号
@@ -98,7 +100,7 @@ loop_count2a++;
 			}

 #ifdef TIMER
-time_encode2a += GetTickCount() - time_start2;
+time_encode2a += clock() - time_start2;
 #endif
 		} else {	// パリティ・ブロックを部分的に保持する場合
 			// スレッドごとに作成するパリティ・ブロックの chunk を変える
@@ -143,7 +145,7 @@ loop_count2b += src_num;
 #endif
 			}
 #ifdef TIMER
-time_encode2b += GetTickCount() - time_start2;
+time_encode2b += clock() - time_start2;
 #endif
 		}
 		//_mm_sfence();	// メモリーへの書き込みを完了する
@@ -153,19 +155,21 @@ time_encode2b += GetTickCount() - time_start2;
 #ifdef TIMER
 loop_count2b /= chunk_num;	// chunk数で割ってブロック数にする
 printf("sub-thread : total loop = %d\n", loop_count2a + loop_count2b);
-if (time_encode2a > 0){
-	i = (int)((__int64)loop_count2a * unit_size * 125 / ((__int64)time_encode2a * 131072));
+time_sec = (double)time_encode2a / CLOCKS_PER_SEC;
+if (time_sec > 0){
+	time_speed = ((double)loop_count2a * unit_size) / (time_sec * 1048576);
 } else {
-	i = 0;
+	time_speed = 0;
 }
 if (loop_count2a > 0)
-	printf(" 1st encode %d.%03d sec, %d loop, %d MB/s\n", time_encode2a / 1000, time_encode2a % 1000, loop_count2a, i);
-if (time_encode2b > 0){
-	i = (int)((__int64)loop_count2b * unit_size * 125 / ((__int64)time_encode2b * 131072));
+	printf(" 1st encode %.3f sec, %d loop, %.0f MB/s\n", time_sec, loop_count2a, time_speed);
+time_sec = (double)time_encode2b / CLOCKS_PER_SEC;
+if (time_sec > 0){
+	time_speed = ((double)loop_count2b * unit_size) / (time_sec * 1048576);
 } else {
-	i = 0;
+	time_speed = 0;
 }
-printf(" 2nd encode %d.%03d sec, %d loop, %d MB/s\n", time_encode2b / 1000, time_encode2b % 1000, loop_count2b, i);
+printf(" 2nd encode %.3f sec, %d loop, %.0f MB/s\n", time_sec, loop_count2b, time_speed);
 #endif

 	// 終了処理
@@ -185,7 +189,7 @@ static DWORD WINAPI thread_encode3(LPVOID lpParameter)
 	RS_TH *th;
 #ifdef TIMER
 unsigned int loop_count2a = 0, loop_count2b = 0;
-unsigned int time_start2, time_encode2a = 0, time_encode2b = 0;
+clock_t time_start2, time_encode2a = 0, time_encode2b = 0;
 #endif

 	th = (RS_TH *)lpParameter;
@@ -204,7 +208,7 @@ unsigned int time_start2, time_encode2a = 0, time_encode2b = 0;
 	WaitForSingleObject(hRun, INFINITE);	// 計算開始の合図を待つ
 	while (th->now < INT_MAX / 2){
 #ifdef TIMER
-time_start2 = GetTickCount();
+time_start2 = clock();
 #endif
 		s_buf = th->buf;
 		src_off = th->off;	// ソース・ブロック番号
@@ -221,7 +225,7 @@ loop_count2a++;
 #endif
 			}
 #ifdef TIMER
-time_encode2a += GetTickCount() - time_start2;
+time_encode2a += clock() - time_start2;
 #endif
 		} else {	// 全てのパリティ・ブロックを保持する場合
 			// スレッドごとに作成するパリティ・ブロックの chunk を変える
@@ -261,7 +265,7 @@ loop_count2b += src_num;
 #endif
 			}
 #ifdef TIMER
-time_encode2b += GetTickCount() - time_start2;
+time_encode2b += clock() - time_start2;
 #endif
 		}
 		//_mm_sfence();	// メモリーへの書き込みを完了する
@@ -271,19 +275,21 @@ time_encode2b += GetTickCount() - time_start2;
 #ifdef TIMER
 loop_count2b /= chunk_num;	// chunk数で割ってブロック数にする
 printf("sub-thread : total loop = %d\n", loop_count2a + loop_count2b);
-if (time_encode2a > 0){
-	i = (int)((__int64)loop_count2a * unit_size * 125 / ((__int64)time_encode2a * 131072));
+time_sec = (double)time_encode2a / CLOCKS_PER_SEC;
+if (time_sec > 0){
+	time_speed = ((double)loop_count2a * unit_size) / (time_sec * 1048576);
 } else {
-	i = 0;
+	time_speed = 0;
 }
 if (loop_count2a > 0)
-	printf(" 1st encode %d.%03d sec, %d loop, %d MB/s\n", time_encode2a / 1000, time_encode2a % 1000, loop_count2a, i);
-if (time_encode2b > 0){
-	i = (int)((__int64)loop_count2b * unit_size * 125 / ((__int64)time_encode2b * 131072));
+	printf(" 1st encode %.3f sec, %d loop, %.0f MB/s\n", time_sec, loop_count2a, time_speed);
+time_sec = (double)time_encode2b / CLOCKS_PER_SEC;
+if (time_sec > 0){
+	time_speed = ((double)loop_count2b * unit_size) / (time_sec * 1048576);
 } else {
-	i = 0;
+	time_speed = 0;
 }
-printf(" 2nd encode %d.%03d sec, %d loop, %d MB/s\n", time_encode2b / 1000, time_encode2b % 1000, loop_count2b, i);
+printf(" 2nd encode %.3f sec, %d loop, %.0f MB/s\n", time_sec, loop_count2b, time_speed);
 #endif

 	// 終了処理
@@ -303,7 +309,8 @@ static DWORD WINAPI thread_encode_gpu(LPVOID lpParameter)
 	HANDLE hRun, hEnd;
 	RS_TH *th;
 #ifdef TIMER
-unsigned int time_start2, time_encode2 = 0, loop_count2 = 0;
+unsigned int loop_count2 = 0;
+clock_t time_start2, time_encode2 = 0;
 #endif

 	th = (RS_TH *)lpParameter;
@@ -320,7 +327,7 @@ unsigned int time_start2, time_encode2 = 0, loop_count2 = 0;
 	WaitForSingleObject(hRun, INFINITE);	// 計算開始の合図を待つ
 	while (th->now < INT_MAX / 2){
 #ifdef TIMER
-time_start2 = GetTickCount();
+time_start2 = clock();
 #endif
 		// GPUはソース・ブロック読み込み中に呼ばれない
 		s_buf = th->buf;
@@ -335,24 +342,71 @@ time_start2 = GetTickCount();
 		}

 		// 一つの GPUスレッドが全てのパリティ・ブロックを処理する
-		while ((j = InterlockedIncrement(&(th->now))) < parity_num){	// j = ++th_now
-			// factor は定数行列の乗数になる
-			for (i = 0; i < src_num; i++)
-				factor[i] = galois_power(constant[src_off + i], first_num + j);
+		if (OpenCL_method & 8){	// ２ブロックずつ計算する
+			// パリティ・ブロック数が奇数なら、最初の一個だけ別に計算する
+			if (parity_num & 1){
+				InterlockedIncrement(&(th->now));	// 常に j = 0 となる

-			// VRAM上のソース・ブロックごとにパリティを追加していく
-			i = gpu_multiply_blocks(src_num, factor, g_buf + (size_t)unit_size * j, unit_size);
-			if (i != 0){
-				th->len = i;
-				InterlockedExchange(&(th->now), INT_MAX / 3);	// サブ・スレッドの計算を中断する
-				break;
-			}
+				// factor は定数行列の乗数になる
+				for (i = 0; i < src_num; i++)
+					factor[i] = galois_power(constant[src_off + i], first_num);
+
+				// VRAM上のソース・ブロックごとにパリティを追加していく
+				i = gpu_multiply_blocks(src_num, factor, NULL, g_buf, unit_size);
+				if (i != 0){
+					th->len = i;
+					InterlockedExchange(&(th->now), INT_MAX / 3);	// サブ・スレッドの計算を中断する
+					break;
+				}
 #ifdef TIMER
 loop_count2 += src_num;
 #endif
-		}
+			}
+
+			// 残りのブロックは二個ずつ計算する
+			while ((j = InterlockedAdd(&(th->now), 2)) < parity_num){	// th_now += 2, j = th_now
+				j--;	// +2 してるから、最初のブロックは -1 する
+
+				// factor は定数行列の乗数になる
+				for (i = 0; i < src_num; i++){
+					int c = constant[src_off + i];	// 同じ定数だけど、何乗するかが異なる
+					factor[i] = galois_power(c, first_num + j);
+					factor[src_num + i] = galois_power(c, first_num + j + 1);
+				}
+
+				// VRAM上のソース・ブロックごとにパリティを追加していく
+				i = gpu_multiply_blocks(src_num, factor, (void *)1, g_buf + (size_t)unit_size * j, unit_size * 2);
+				if (i != 0){
+					th->len = i;
+					InterlockedExchange(&(th->now), INT_MAX / 3);	// サブ・スレッドの計算を中断する
+					break;
+				}
 #ifdef TIMER
-time_encode2 += GetTickCount() - time_start2;
+loop_count2 += src_num * 2;
+#endif
+			}
+
+		} else {	// 以前からの１ブロックずつ計算する方式
+			while ((j = InterlockedIncrement(&(th->now))) < parity_num){	// j = ++th_now
+				// factor は定数行列の乗数になる
+				for (i = 0; i < src_num; i++)
+					factor[i] = galois_power(constant[src_off + i], first_num + j);
+
+				// VRAM上のソース・ブロックごとにパリティを追加していく
+				i = gpu_multiply_blocks(src_num, factor, NULL, g_buf + (size_t)unit_size * j, unit_size);
+				if (i != 0){
+					th->len = i;
+					InterlockedExchange(&(th->now), INT_MAX / 3);	// サブ・スレッドの計算を中断する
+					break;
+				}
+#ifdef TIMER
+loop_count2 += src_num;
+#endif
+			}
+		}
+
+#ifdef TIMER
+time_encode2 += clock() - time_start2;
 #endif
 		// 最後にVRAMを解放する
 		i = gpu_finish();
@@ -365,12 +419,13 @@ time_encode2 += GetTickCount() - time_start2;
 	}
 #ifdef TIMER
 printf("gpu-thread :\n");
-if (time_encode2 > 0){
-	i = (int)((__int64)loop_count2 * unit_size * 125 / ((__int64)time_encode2 * 131072));
+time_sec = (double)time_encode2 / CLOCKS_PER_SEC;
+if (time_sec > 0){
+	time_speed = ((double)loop_count2 * unit_size) / (time_sec * 1048576);
 } else {
-	i = 0;
+	time_speed = 0;
 }
-printf(" 2nd encode %d.%03d sec, %d loop, %d MB/s\n", time_encode2 / 1000, time_encode2 % 1000, loop_count2, i);
+printf(" 2nd encode %.3f sec, %d loop, %.0f MB/s\n", time_sec, loop_count2, time_speed);
 #endif

 	// 終了処理
@@ -452,7 +507,7 @@ int encode_method1(	// ソース・ブロックが一個だけの場合
 	block_off = 0;
 	while (block_off < block_size){
 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		// ソース・ブロックを読み込む
 		len = s_blk[0].size - block_off;
@@ -469,7 +524,7 @@ time_start = GetTickCount();
 		s_blk[0].crc = crc_update(s_blk[0].crc, buf, len);	// without pad
 		checksum16_altmap(buf, buf + io_size, io_size);
 #ifdef TIMER
-time_read += GetTickCount() - time_start;
+time_read += clock() - time_start;
 #endif

 		// リカバリ・ファイルに書き込むサイズ
@@ -482,13 +537,13 @@ time_read += GetTickCount() - time_start;
 		// パリティ・ブロックごとに
 		for (i = 0; i < parity_num; i++){
 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 			memset(work_buf, 0, unit_size);
 			// factor は 2の乗数になる
 			galois_align_multiply(buf, work_buf, unit_size, galois_power(2, first_num + i));
 #ifdef TIMER
-time_calc += GetTickCount() - time_start;
+time_calc += clock() - time_start;
 #endif

 			// 経過表示
@@ -502,7 +557,7 @@ time_calc += GetTickCount() - time_start;
 			}

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 			// パリティ・ブロックのチェックサムを検証する
 			checksum16_return(work_buf, hash, io_size);
@@ -535,7 +590,7 @@ time_start = GetTickCount();
 				goto error_end;
 			}
 #ifdef TIMER
-time_write += GetTickCount() - time_start;
+time_write += clock() - time_start;
 #endif
 		}

@@ -565,7 +620,7 @@ time_write += GetTickCount() - time_start;
 		}

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		// 最後に Recovery Slice packet のヘッダーを書き込む
 		for (i = 0; i < parity_num; i++){
@@ -581,14 +636,14 @@ time_start = GetTickCount();
 			}
 		}
 #ifdef TIMER
-time_write += GetTickCount() - time_start;
+time_write += clock() - time_start;
 #endif
 	}

 #ifdef TIMER
-printf("read   %d.%03d sec\n", time_read / 1000, time_read % 1000);
-printf("write  %d.%03d sec\n", time_write / 1000, time_write % 1000);
-printf("encode %d.%03d sec\n", time_calc / 1000, time_calc % 1000);
+printf("read   %.3f sec\n", (double)time_read / CLOCKS_PER_SEC);
+printf("write  %.3f sec\n", (double)time_write / CLOCKS_PER_SEC);
+printf("encode %.3f sec\n", (double)time_calc / CLOCKS_PER_SEC);
 #endif

 error_end:
@@ -729,7 +784,7 @@ int encode_method2(	// ソース・データを全て読み込む場合
 #ifdef TIMER
 read_count = 0;
 skip_count = 0;
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		last_file = -1;
 		for (i = 0; i < source_num; i++){
@@ -830,7 +885,7 @@ skip_count++;
 		CloseHandle(hFile);
 		hFile = NULL;
 #ifdef TIMER
-time_read += GetTickCount() - time_start;
+time_read += clock() - time_start;
 #endif

 		WaitForMultipleObjects(cpu_num1, hEnd, TRUE, INFINITE);	// サブ・スレッドの計算終了の合図を待つ
@@ -930,7 +985,7 @@ skip_count++;
 			}

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 			// パリティ・ブロックを書き込む
 			work_buf = p_buf;
@@ -979,7 +1034,7 @@ time_start = GetTickCount();
 				}
 			}
 #ifdef TIMER
-time_write += GetTickCount() - time_start;
+time_write += clock() - time_start;
 #endif

 			part_off += part_num;	// 次のパリティ位置にする
@@ -1025,7 +1080,7 @@ time_write += GetTickCount() - time_start;
 		}

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		// 最後に Recovery Slice packet のヘッダーを書き込む
 		for (i = 0; i < parity_num; i++){
@@ -1041,13 +1096,13 @@ time_start = GetTickCount();
 			}
 		}
 #ifdef TIMER
-time_write += GetTickCount() - time_start;
+time_write += clock() - time_start;
 #endif
 	}

 #ifdef TIMER
-printf("read   %d.%03d sec\n", time_read / 1000, time_read % 1000);
-printf("write  %d.%03d sec\n", time_write / 1000, time_write % 1000);
+printf("read   %.3f sec\n", (double)time_read / CLOCKS_PER_SEC);
+printf("write  %.3f sec\n", (double)time_write / CLOCKS_PER_SEC);
 if (prog_num != prog_base)
 	printf(" prog_num = %I64d, prog_base = %I64d\n", prog_num, prog_base);
 #endif
@@ -1186,7 +1241,7 @@ int encode_method3(	// パリティ・ブロックを全て保持して、一度
 		src_off = source_off - 1;	// まだ計算して無い印

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		for (i = 0; i < read_num; i++){	// スライスを一個ずつ読み込んでメモリー上に配置していく
 			// ソース・ブロックを読み込む
@@ -1318,7 +1373,7 @@ time_start = GetTickCount();
 			memcpy(common_buf + packet_off + 16, file_md_ctx.hash, 16);
 		}
 #ifdef TIMER
-time_read += GetTickCount() - time_start;
+time_read += clock() - time_start;
 #endif

 		WaitForMultipleObjects(cpu_num1, hEnd, TRUE, INFINITE);	// サブ・スレッドの計算終了の合図を待つ
@@ -1393,19 +1448,19 @@ time_read += GetTickCount() - time_start;
 	}

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 	memcpy(common_buf + common_size, common_buf, common_size);	// 後の半分に前半のをコピーする
 	// 最後にパリティ・ブロックのチェックサムを検証して、リカバリ・ファイルに書き込む
 	err = create_recovery_file_1pass(file_path, recovery_path, packet_limit, block_distri,
 			packet_num, common_buf, common_size, footer_buf, footer_size, rcv_hFile, p_buf, NULL, unit_size);
 #ifdef TIMER
-time_write = GetTickCount() - time_start;
+time_write = clock() - time_start;
 #endif

 #ifdef TIMER
-printf("read   %d.%03d sec\n", time_read / 1000, time_read % 1000);
-printf("write  %d.%03d sec\n", time_write / 1000, time_write % 1000);
+printf("read   %.3f sec\n", (double)time_read / CLOCKS_PER_SEC);
+printf("write  %.3f sec\n", (double)time_write / CLOCKS_PER_SEC);
 if (prog_num != prog_base - prog_write * parity_num)
 	printf(" prog_num = %I64d != %I64d\n", prog_num, prog_base - prog_write * parity_num);
 #endif
@@ -1577,7 +1632,7 @@ int encode_method4(	// 全てのブロックを断片的に保持する場合 (G
 #ifdef TIMER
 read_count = 0;
 skip_count = 0;
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		last_file = -1;
 		for (i = 0; i < source_num; i++){
@@ -1678,7 +1733,7 @@ skip_count++;
 		CloseHandle(hFile);
 		hFile = NULL;
 #ifdef TIMER
-time_read += GetTickCount() - time_start;
+time_read += clock() - time_start;
 #endif

 		memset(g_buf, 0, (size_t)unit_size * parity_num);	// 待機中に GPU用の領域をゼロ埋めしておく
@@ -1817,11 +1872,21 @@ skip_count++;
 #endif
 				} else if (src_off + src_num + src_max > source_num){
 					src_num = source_num - src_off - src_max;
-					// src_num が 0にならないように、src_num == src_max なら上の last1 にする
-					if ((src_num < src_max) && (src_num + src_max <= vram_max) && (gpu_end * 2 > cpu_end)){
-						src_num += src_max;	// GPU担当量が少なくて、余裕がある場合は、残りも全て任せる
+					if (src_num < src_max){
+						if ((src_num + src_max <= vram_max) && (gpu_end * 2 > cpu_end)){
+							src_num += src_max;	// GPU担当量が少なくて、余裕がある場合は、残りも全て任せる
+#ifdef TIMER
+							printf("GPU last +: src_off = %d, src_num = %d + %d\n", src_off, src_num - src_max, src_max);
+#endif
+						} else if (src_num < src_max / 4){
+							src_num = src_max / 4;	// src_num が小さくなり過ぎないようにする
+#ifdef TIMER
+							printf("GPU last ?: src_off = %d, src_num = %d\n", src_off, src_num);
+						} else {
+							printf("GPU last -: src_off = %d, src_num = %d\n", src_off, src_num);
+#endif
+						}
 #ifdef TIMER
-						printf("GPU last +: src_off = %d, src_num = %d + %d\n", src_off, src_num - src_max, src_max);
 					} else {
 						printf("GPU last 2: src_off = %d, src_num = %d\n", src_off, src_num);
 #endif
@@ -1921,7 +1986,7 @@ skip_count++;
 			prog_num += th->size * parity_num;

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		// パリティ・ブロックを書き込む
 		work_buf = p_buf;
@@ -1972,7 +2037,7 @@ time_start = GetTickCount();
 			}
 		}
 #ifdef TIMER
-time_write += GetTickCount() - time_start;
+time_write += clock() - time_start;
 #endif

 		block_off += io_size;
@@ -2015,7 +2080,7 @@ time_write += GetTickCount() - time_start;
 		}

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		// 最後に Recovery Slice packet のヘッダーを書き込む
 		for (i = 0; i < parity_num; i++){
@@ -2031,13 +2096,13 @@ time_start = GetTickCount();
 			}
 		}
 #ifdef TIMER
-time_write += GetTickCount() - time_start;
+time_write += clock() - time_start;
 #endif
 	}

 #ifdef TIMER
-printf("read   %d.%03d sec\n", time_read / 1000, time_read % 1000);
-printf("write  %d.%03d sec\n", time_write / 1000, time_write % 1000);
+printf("read   %.3f sec\n", (double)time_read / CLOCKS_PER_SEC);
+printf("write  %.3f sec\n", (double)time_write / CLOCKS_PER_SEC);
 if (prog_num != prog_base)
 	printf(" prog_num = %I64d, prog_base = %I64d\n", prog_num, prog_base);
 #endif
@@ -2210,7 +2275,7 @@ int encode_method5(	// ソース・ブロックの一部とパリティ・ブロ
 		src_off = source_off - 1;	// まだ計算して無い印

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 		for (i = 0; i < read_num; i++){	// スライスを一個ずつ読み込んでメモリー上に配置していく
 			// ソース・ブロックを読み込む
@@ -2341,7 +2406,7 @@ time_start = GetTickCount();
 			memcpy(common_buf + packet_off + 16, file_md_ctx.hash, 16);
 		}
 #ifdef TIMER
-time_read += GetTickCount() - time_start;
+time_read += clock() - time_start;
 #endif

 		if (source_off == 0)
@@ -2462,10 +2527,21 @@ time_read += GetTickCount() - time_start;
 #endif
 				} else if (src_off + src_num + src_max > read_num){
 					src_num = read_num - src_off - src_max;
-					if ((src_num < src_max) && (src_num + src_max <= vram_max) && (gpu_end * 2 > cpu_end)){
-						src_num += src_max;	// GPU担当量が少なくて、余裕がある場合は、残りも全て任せる
+					if (src_num < src_max){
+						if ((src_num + src_max <= vram_max) && (gpu_end * 2 > cpu_end)){
+							src_num += src_max;	// GPU担当量が少なくて、余裕がある場合は、残りも全て任せる
+#ifdef TIMER
+							printf("GPU last +: src_off = %d, src_num = %d + %d\n", src_off, src_num - src_max, src_max);
+#endif
+						} else if (src_num < src_max / 4){
+							src_num = src_max / 4;	// src_num が小さくなり過ぎないようにする
+#ifdef TIMER
+							printf("GPU last ?: src_off = %d, src_num = %d\n", src_off, src_num);
+						} else {
+							printf("GPU last -: src_off = %d, src_num = %d\n", src_off, src_num);
+#endif
+						}
 #ifdef TIMER
-						printf("GPU last +: src_off = %d, src_num = %d + %d\n", src_off, src_num - src_max, src_max);
 					} else {
 						printf("GPU last 2: src_off = %d, src_num = %d\n", src_off, src_num);
 #endif
@@ -2568,19 +2644,19 @@ time_read += GetTickCount() - time_start;
 	}

 #ifdef TIMER
-time_start = GetTickCount();
+time_start = clock();
 #endif
 	memcpy(common_buf + common_size, common_buf, common_size);	// 後の半分に前半のをコピーする
 	// 最後にパリティ・ブロックのチェックサムを検証して、リカバリ・ファイルに書き込む
 	err = create_recovery_file_1pass(file_path, recovery_path, packet_limit, block_distri,
 			packet_num, common_buf, common_size, footer_buf, footer_size, rcv_hFile, p_buf, g_buf, unit_size);
 #ifdef TIMER
-time_write = GetTickCount() - time_start;
+time_write = clock() - time_start;
 #endif

 #ifdef TIMER
-printf("read   %d.%03d sec\n", time_read / 1000, time_read % 1000);
-printf("write  %d.%03d sec\n", time_write / 1000, time_write % 1000);
+printf("read   %.3f sec\n", (double)time_read / CLOCKS_PER_SEC);
+printf("write  %.3f sec\n", (double)time_write / CLOCKS_PER_SEC);
 if (prog_num != prog_base - prog_write * parity_num)
 	printf(" prog_num = %I64d != %I64d\n", prog_num, prog_base - prog_write * parity_num);
 #endif
--- a/source/par2j/source.cl
+++ b/source/par2j/source.cl
@@ -1,10 +1,11 @@
 void calc_table(__local uint *mtab, int id, int factor)
 {
-	int i, sum = 0;
+	int i, sum;

-	for (i = 0; i < 8; i++){
-		sum = (id & (1 << i)) ? (sum ^ factor) : sum;
-		factor = (factor & 0x8000) ? ((factor << 1) ^ 0x1100B) : (factor << 1);
+	sum = ((id << 31) >> 31) & factor;
+	for (i = 1; i < 8; i++){
+		factor = (factor << 1) ^ (((factor << 16) >> 31) & 0x1100B);
+		sum ^= ((id << (31 - i)) >> 31) & factor;
 	}
 	mtab[id] = sum;

@@ -14,6 +15,30 @@ void calc_table(__local uint *mtab, int id, int factor)
 	mtab[id + 256] = sum;
 }

+void calc_table2(__local uint *mtab, int id, int factor, int factor2)
+{
+	int i, sum, sum2, mask;
+
+	mask = (id << 31) >> 31;
+	sum = mask & factor;
+	sum2 = mask & factor2;
+	for (i = 1; i < 8; i++){
+		factor = (factor << 1) ^ (((factor << 16) >> 31) & 0x1100B);
+		factor2 = (factor2 << 1) ^ (((factor2 << 16) >> 31) & 0x1100B);
+		mask = (id << (31 - i)) >> 31;
+		sum ^= mask & factor;
+		sum2 ^= mask & factor2;
+	}
+	mtab[id] = sum | (sum2 << 16);
+
+	sum = (sum << 4) ^ (((sum << 16) >> 31) & 0x88058) ^ (((sum << 17) >> 31) & 0x4402C) ^ (((sum << 18) >> 31) & 0x22016) ^ (((sum << 19) >> 31) & 0x1100B);
+	sum = (sum << 4) ^ (((sum << 16) >> 31) & 0x88058) ^ (((sum << 17) >> 31) & 0x4402C) ^ (((sum << 18) >> 31) & 0x22016) ^ (((sum << 19) >> 31) & 0x1100B);
+	sum2 = (sum2 << 4) ^ (((sum2 << 16) >> 31) & 0x88058) ^ (((sum2 << 17) >> 31) & 0x4402C) ^ (((sum2 << 18) >> 31) & 0x22016) ^ (((sum2 << 19) >> 31) & 0x1100B);
+	sum2 = (sum2 << 4) ^ (((sum2 << 16) >> 31) & 0x88058) ^ (((sum2 << 17) >> 31) & 0x4402C) ^ (((sum2 << 18) >> 31) & 0x22016) ^ (((sum2 << 19) >> 31) & 0x1100B);
+
+	mtab[id + 256] = sum | (sum2 << 16);
+}
+
 __kernel void method1(
 	__global uint *src,
 	__global uint *dst,
@@ -31,6 +56,7 @@ __kernel void method1(
 		dst[i] = 0;

 	for (blk = 0; blk < blk_num; blk++){
+		barrier(CLK_LOCAL_MEM_FENCE);
 		calc_table(mtab, table_id, factors[blk]);
 		barrier(CLK_LOCAL_MEM_FENCE);

@@ -42,7 +68,6 @@ __kernel void method1(
 			dst[i] ^= sum;
 		}
 		src += BLK_SIZE;
-		barrier(CLK_LOCAL_MEM_FENCE);
 	}
 }

@@ -65,6 +90,7 @@ __kernel void method2(
 	}

 	for (blk = 0; blk < blk_num; blk++){
+		barrier(CLK_LOCAL_MEM_FENCE);
 		calc_table(mtab, table_id, factors[blk]);
 		barrier(CLK_LOCAL_MEM_FENCE);

@@ -82,11 +108,182 @@ __kernel void method2(
 			dst[pos + 4] ^= ((sum1 & 0xFF00FF00) >> 8) | (sum2 & 0xFF00FF00);
 		}
 		src += BLK_SIZE;
-		barrier(CLK_LOCAL_MEM_FENCE);
 	}
 }

 __kernel void method4(
+	__global uint4 *src,
+	__global uint4 *dst,
+	__global ushort *factors,
+	int blk_num)
+{
+	__local uint mtab[512];
+	int i, blk;
+	uchar4 r0, r1, r2, r3, r4, r5, r6, r7;
+	uchar16 lo, hi;
+	const int work_id = get_global_id(0) * 2;
+	const int work_size = get_global_size(0) * 2;
+	const int table_id = get_local_id(0);
+
+	for (i = work_id; i < BLK_SIZE / 4; i += work_size){
+		dst[i    ] = 0;
+		dst[i + 1] = 0;
+	}
+
+	for (blk = 0; blk < blk_num; blk++){
+		barrier(CLK_LOCAL_MEM_FENCE);
+		calc_table(mtab, table_id, factors[blk]);
+		barrier(CLK_LOCAL_MEM_FENCE);
+
+		for (i = work_id; i < BLK_SIZE / 4; i += work_size){
+			lo = as_uchar16(src[i    ]);
+			hi = as_uchar16(src[i + 1]);
+			r0 = (uchar4)(as_uchar2((ushort)(mtab[lo.s0] ^ mtab[256 + hi.s0])), as_uchar2((ushort)(mtab[lo.s1] ^ mtab[256 + hi.s1])));
+			r1 = (uchar4)(as_uchar2((ushort)(mtab[lo.s2] ^ mtab[256 + hi.s2])), as_uchar2((ushort)(mtab[lo.s3] ^ mtab[256 + hi.s3])));
+			r2 = (uchar4)(as_uchar2((ushort)(mtab[lo.s4] ^ mtab[256 + hi.s4])), as_uchar2((ushort)(mtab[lo.s5] ^ mtab[256 + hi.s5])));
+			r3 = (uchar4)(as_uchar2((ushort)(mtab[lo.s6] ^ mtab[256 + hi.s6])), as_uchar2((ushort)(mtab[lo.s7] ^ mtab[256 + hi.s7])));
+			r4 = (uchar4)(as_uchar2((ushort)(mtab[lo.s8] ^ mtab[256 + hi.s8])), as_uchar2((ushort)(mtab[lo.s9] ^ mtab[256 + hi.s9])));
+			r5 = (uchar4)(as_uchar2((ushort)(mtab[lo.sa] ^ mtab[256 + hi.sa])), as_uchar2((ushort)(mtab[lo.sb] ^ mtab[256 + hi.sb])));
+			r6 = (uchar4)(as_uchar2((ushort)(mtab[lo.sc] ^ mtab[256 + hi.sc])), as_uchar2((ushort)(mtab[lo.sd] ^ mtab[256 + hi.sd])));
+			r7 = (uchar4)(as_uchar2((ushort)(mtab[lo.se] ^ mtab[256 + hi.se])), as_uchar2((ushort)(mtab[lo.sf] ^ mtab[256 + hi.sf])));
+			dst[i    ] ^= as_uint4((uchar16)(r0.x, r0.z, r1.x, r1.z, r2.x, r2.z, r3.x, r3.z, r4.x, r4.z, r5.x, r5.z, r6.x, r6.z, r7.x, r7.z));
+			dst[i + 1] ^= as_uint4((uchar16)(r0.y, r0.w, r1.y, r1.w, r2.y, r2.w, r3.y, r3.w, r4.y, r4.w, r5.y, r5.w, r6.y, r6.w, r7.y, r7.w));
+		}
+		src += BLK_SIZE / 4;
+	}
+}
+
+__kernel void method9(
+	__global uint *src,
+	__global uint *dst,
+	__global ushort *factors,
+	int blk_num)
+{
+	__local uint mtab[512];
+	int i, blk;
+	uint v, sum, sum2;
+	const int work_id = get_global_id(0);
+	const int work_size = get_global_size(0);
+	const int table_id = get_local_id(0);
+
+	for (i = work_id; i < BLK_SIZE; i += work_size){
+		dst[i] = 0;
+		dst[i + BLK_SIZE] = 0;
+	}
+
+	for (blk = 0; blk < blk_num; blk++){
+		barrier(CLK_LOCAL_MEM_FENCE);
+		calc_table2(mtab, table_id, factors[blk], factors[blk_num + blk]);
+		barrier(CLK_LOCAL_MEM_FENCE);
+
+		for (i = work_id; i < BLK_SIZE; i += work_size){
+			v = src[i];
+			sum  = mtab[(uchar)v] ^ mtab[256 + (uchar)(v >> 8)];
+			sum2 = mtab[(uchar)(v >> 16)] ^ mtab[256 + (v >> 24)];
+			dst[i] ^= (sum & 0xFFFF) | (sum2 << 16);
+			dst[i + BLK_SIZE] ^= (sum >> 16) | (sum2 & 0xFFFF0000);
+		}
+		src += BLK_SIZE;
+	}
+}
+
+__kernel void method10(
+	__global uint *src,
+	__global uint *dst,
+	__global ushort *factors,
+	int blk_num)
+{
+	__local uint mtab[512];
+	int i, blk, pos;
+	uint lo, hi, t0, t1, t2, t3;
+	const int work_id = get_global_id(0) * 2;
+	const int work_size = get_global_size(0) * 2;
+	const int table_id = get_local_id(0);
+
+	for (i = work_id; i < BLK_SIZE; i += work_size){
+		dst[i    ] = 0;
+		dst[i + 1] = 0;
+		dst[i + BLK_SIZE    ] = 0;
+		dst[i + BLK_SIZE + 1] = 0;
+	}
+
+	for (blk = 0; blk < blk_num; blk++){
+		barrier(CLK_LOCAL_MEM_FENCE);
+		calc_table2(mtab, table_id, factors[blk], factors[blk_num + blk]);
+		barrier(CLK_LOCAL_MEM_FENCE);
+
+		for (i = work_id; i < BLK_SIZE; i += work_size){
+			pos = (i & ~7) + ((i & 7) >> 1);
+			lo = src[pos    ];
+			hi = src[pos + 4];
+			t0 = mtab[(uchar)lo] ^ mtab[256 + (uchar)hi];
+			t1 = mtab[(uchar)(lo >> 8)] ^ mtab[256 + (uchar)(hi >> 8)];
+			t2 = mtab[(uchar)(lo >> 16)] ^ mtab[256 + (uchar)(hi >> 16)];
+			t3 = mtab[lo >> 24] ^ mtab[256 + (hi >> 24)];
+			dst[pos    ] ^= (uchar)t0 | ((t1 << 8) & 0xFF00) | ((t2 << 16) & 0xFF0000) | (t3 << 24);
+			dst[pos + 4] ^= (uchar)(t0 >> 8) | (t1 & 0xFF00) | ((t2 << 8) & 0xFF0000) | ((t3 << 16) & 0xFF000000);
+			dst[pos + BLK_SIZE    ] ^= (uchar)(t0 >> 16) | ((t1 >> 8) & 0xFF00) | (t2 & 0xFF0000) | ((t3 << 8) & 0xFF000000);
+			dst[pos + BLK_SIZE + 4] ^= (t0 >> 24) | ((t1 >> 16) & 0xFF00) | ((t2 >> 8) & 0xFF0000) | (t3 & 0xFF000000);
+		}
+		src += BLK_SIZE;
+	}
+}
+
+__kernel void method12(
+	__global uint4 *src,
+	__global uint4 *dst,
+	__global ushort *factors,
+	int blk_num)
+{
+	__local uint mtab[512];
+	int i, blk;
+	uchar4 r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, rA, rB, rC, rD, rE, rF;
+	uchar16 lo, hi;
+	const int work_id = get_global_id(0) * 2;
+	const int work_size = get_global_size(0) * 2;
+	const int table_id = get_local_id(0);
+
+	for (i = work_id; i < BLK_SIZE / 4; i += work_size){
+		dst[i    ] = 0;
+		dst[i + 1] = 0;
+		dst[i + BLK_SIZE / 4    ] = 0;
+		dst[i + BLK_SIZE / 4 + 1] = 0;
+	}
+
+	for (blk = 0; blk < blk_num; blk++){
+		barrier(CLK_LOCAL_MEM_FENCE);
+		calc_table2(mtab, table_id, factors[blk], factors[blk_num + blk]);
+		barrier(CLK_LOCAL_MEM_FENCE);
+
+		for (i = work_id; i < BLK_SIZE / 4; i += work_size){
+			lo = as_uchar16(src[i    ]);
+			hi = as_uchar16(src[i + 1]);
+			r0 = as_uchar4(mtab[lo.s0] ^ mtab[256 + hi.s0]);
+			r1 = as_uchar4(mtab[lo.s1] ^ mtab[256 + hi.s1]);
+			r2 = as_uchar4(mtab[lo.s2] ^ mtab[256 + hi.s2]);
+			r3 = as_uchar4(mtab[lo.s3] ^ mtab[256 + hi.s3]);
+			r4 = as_uchar4(mtab[lo.s4] ^ mtab[256 + hi.s4]);
+			r5 = as_uchar4(mtab[lo.s5] ^ mtab[256 + hi.s5]);
+			r6 = as_uchar4(mtab[lo.s6] ^ mtab[256 + hi.s6]);
+			r7 = as_uchar4(mtab[lo.s7] ^ mtab[256 + hi.s7]);
+			r8 = as_uchar4(mtab[lo.s8] ^ mtab[256 + hi.s8]);
+			r9 = as_uchar4(mtab[lo.s9] ^ mtab[256 + hi.s9]);
+			rA = as_uchar4(mtab[lo.sa] ^ mtab[256 + hi.sa]);
+			rB = as_uchar4(mtab[lo.sb] ^ mtab[256 + hi.sb]);
+			rC = as_uchar4(mtab[lo.sc] ^ mtab[256 + hi.sc]);
+			rD = as_uchar4(mtab[lo.sd] ^ mtab[256 + hi.sd]);
+			rE = as_uchar4(mtab[lo.se] ^ mtab[256 + hi.se]);
+			rF = as_uchar4(mtab[lo.sf] ^ mtab[256 + hi.sf]);
+			dst[i    ] ^= as_uint4((uchar16)(r0.x, r1.x, r2.x, r3.x, r4.x, r5.x, r6.x, r7.x, r8.x, r9.x, rA.x, rB.x, rC.x, rD.x, rE.x, rF.x));
+			dst[i + 1] ^= as_uint4((uchar16)(r0.y, r1.y, r2.y, r3.y, r4.y, r5.y, r6.y, r7.y, r8.y, r9.y, rA.y, rB.y, rC.y, rD.y, rE.y, rF.y));
+			dst[i + BLK_SIZE / 4    ] ^= as_uint4((uchar16)(r0.z, r1.z, r2.z, r3.z, r4.z, r5.z, r6.z, r7.z, r8.z, r9.z, rA.z, rB.z, rC.z, rD.z, rE.z, rF.z));
+			dst[i + BLK_SIZE / 4 + 1] ^= as_uint4((uchar16)(r0.w, r1.w, r2.w, r3.w, r4.w, r5.w, r6.w, r7.w, r8.w, r9.w, rA.w, rB.w, rC.w, rD.w, rE.w, rF.w));
+		}
+		src += BLK_SIZE / 4;
+	}
+}
+
+__kernel void method16(
 	__global uint *src,
 	__global uint *dst,
 	__global ushort *factors,
@@ -94,7 +291,7 @@ __kernel void method4(
 {
 	__local int table[16];
 	__local uint cache[256];
-	int i, j, blk, pos, sht, mask;
+	int i, j, blk, pos, mask, tmp;
 	uint sum;
 	const int work_id = get_global_id(0);
 	const int work_size = get_global_size(0);
@@ -104,11 +301,12 @@ __kernel void method4(

 	for (blk = 0; blk < blk_num; blk++){
 		if (get_local_id(0) == 0){
-			pos = factors[blk] << 16;
-			table[0] = pos;
+			tmp = factors[blk];
+			table[0] = tmp;
 			for (j = 1; j < 16; j++){
-				pos = (pos << 1) ^ ((pos >> 31) & 0x100B0000);
-				table[j] = pos;
+				mask = (tmp & 0x8000) ? 0x1100B : 0;
+				tmp = (tmp << 1) ^ mask;
+				table[j] = tmp;
 			}
 		}
 		barrier(CLK_LOCAL_MEM_FENCE);
@@ -119,10 +317,11 @@ __kernel void method4(
 			barrier(CLK_LOCAL_MEM_FENCE);

 			sum = 0;
-			sht = (i & 60) >> 2;
+			tmp = (i & 60) >> 2;
+			tmp = 0x8000 >> tmp;
 			pos &= ~60;
 			for (j = 15; j >= 0; j--){
-				mask = (table[j] << sht) >> 31;
+				mask = (table[j] & tmp) ? 0xFFFFFFFF : 0;
 				sum ^= mask & cache[pos];
 				pos += 4;
 			}
--- a/source/par2j/verify.c
+++ b/source/par2j/verify.c
@@ -1,5 +1,5 @@
 // verify.c
-// Copyright : 2022-10-14 Yutaka Sawada
+// Copyright : 2024-06-09 Yutaka Sawada
 // License : GPL

 #ifndef _UNICODE
@@ -1253,21 +1253,22 @@ static int search_block_slide(
 	slice_ctx *sc)
 {
 	unsigned char *buf, hash[16], hash2[16], err_mag, *short_use;
-	int i, j, find_num, find_flag, find_next, find_last, short_next;
+	int i, j, find_num, find_flag, find_next, find_last, short_next, short2_next, tmp_next;
 	int block_count, short_count, tiny_count, tiny_skip, num, i1, i2, i3, i4;
 	int *order, *index, index_shift;
 	unsigned int len, off, end_off, err_off;
 	unsigned int prev_crc, fail_count, rear_off, overlap_count;
 	unsigned int crc, *crcs, *short_crcs;
 	unsigned int time_last, time_slide;
-	__int64 file_off, file_next, short_off, fail_off;
+	__int64 file_off, file_next, short_off, short2_off, tmp_off, fail_off;

 	if (file_size + 1 < last_off + (__int64)(sc->min_size))
 		return 0;	// 小さすぎるファイルは調べない
 	find_num = 0;	// このファイル内で何ブロック見つけたか
 	find_next = -1;	// 次に見つかると予想したブロックの番号
 	find_last = -1;	// 最後に見つけたブロックの番号 (-1=不明)
-	short_next = -1;
+	short_next = -1;	// 予想される末尾ブロックの番号
+	short2_next = -1;
 	fail_count = 0;	// CRC は一致したけど MD5 が違った回数
 	fail_off = 0;
 	rear_off = 0;
@@ -1278,24 +1279,32 @@ static int search_block_slide(
 			find_last = find_next - 1;	// 最後に見つけたブロックの番号
 		if ((last_off >= files[num1].size) || (last_off + block_size > file_size + 1))
 			find_next = -1;	// 予想位置がファイル・サイズを超えると駄目
-		if ((last_size < block_size) && (files[num1].b_num >= 2) &&	// 末尾の半端なブロックの番号と想定位置
-				(last_off < files[num1].size) && (files[num1].size <= file_size + 1)){
-			short_next = files[num1].b_off + files[num1].b_num - 1;	// 末尾ブロックの番号
-			if (find_next == short_next)
+		if ((last_size < block_size) && (last_off < files[num1].size)){	// 末尾の半端なブロックの番号と想定位置
+			tmp_next = files[num1].b_off + files[num1].b_num - 1;	// 末尾ブロックの番号
+			if (find_next == tmp_next)
 				find_next = -1;	// 予想が重複したら末尾ブロックとして探す
-			short_off = files[num1].size - last_size;
-		// ファイルサイズが1ブロック未満でも、同じサイズならエラー訂正を試みる
-		} else if ((last_off == 0) && (file_size == files[num1].size) && (file_size < (__int64)block_size)){
-			short_off = 0;
-			short_next = files[num1].b_off;
+			if ((files[num1].b_num >= 2) && (files[num1].size <= file_size + 1)){	// 本来の位置を調べる
+				short_next = tmp_next;
+				short_off = files[num1].size - last_size;
+			} else if ((last_off == 0) && (file_size == last_size)){	// ファイルが1ブロック未満でも、同じサイズならエラー訂正を試みる
+				short_next = tmp_next;
+				short_off = 0;
+			}
+			if (last_size < file_size){	// 末尾を調べる
+				short2_next = tmp_next;
+				short2_off = file_size - last_size;
+			}
 		}
 		if (file_size > files[num1].size){
 			rear_off = (unsigned int)((file_size - files[num1].size) % (__int64)block_size);
 		} else if (file_size < files[num1].size){
 			rear_off = block_size - (unsigned int)((files[num1].size - file_size) % (__int64)block_size);
 		}
-		//printf("file = %d, find_next = %d, find_last = %d\n", num1, find_next, find_last);
-		//printf("short_off = %I64d, short_next = %d, rear_off = %d\n", short_off, short_next, rear_off);
+/*		printf("file = %d, find_next = %d, find_last = %d, rear_off = %d\n", num1, find_next, find_last, rear_off);
+		if (short_next >= 0)
+			printf("short_off  = %I64d, short_next  = %d\n", short_off, short_next);
+		if (short2_next >= 0)
+			printf("short2_off = %I64d, short2_next = %d\n", short2_off, short2_next);*/
 	}
 	file_off = last_off;	// 検査開始位置から調べる
 	buf = sc->buf;
@@ -1374,8 +1383,11 @@ static int search_block_slide(
 						if (last_off < file_off + last_size)
 							last_off = file_off + last_size;	// 一番大きな半端なブロックの終端
 						find_next = -2;	// 小さなファイルが見つかった = ブロック検出の予想が外れた
-						if (i == short_next)
-							short_next = -1;	// 末尾ブロックは検出済み
+						if (i == short_next){	// この末尾ブロックは検出済み
+							short_next = -1;
+						} else if (i == short2_next){
+							short2_next = -1;
+						}

 						// 経過表示
 						if (GetTickCount() - time_last >= UPDATE_TIME){
@@ -1416,7 +1428,7 @@ static int search_block_slide(

 	// ブロック・サイズごとに探す
 	if (((block_count > 0) && ((file_off + (__int64)block_size <= file_size)
-			|| (find_next >= 0))) || (short_next >= 0)){	// ブロックの位置を予想して探す
+			|| (find_next >= 0))) || (short_next >= 0) || (short2_next >= 0)){	// ブロックの位置を予想して探す
 		// 前からスライドさせながらチェックサムを比較する
 		//printf("slide search from %I64d, file %d, next = %d\n", file_off, num1, find_next);
 		off = 0;	// buf 内でのオフセット
@@ -1445,8 +1457,13 @@ static int search_block_slide(
 			while (off < end_off){
 				find_flag = -2;
 				// 次の番号のブロックがその位置にあるかを先に調べる (発見済みでも)
-				if ((short_next >= 0) && (file_off + off == short_off)){	// 半端なブロックなら
-					i = short_next;
+				if (((short_next >= 0) && (file_off + off == short_off)) ||
+						((short2_next >= 0) && (file_off + off == short2_off))){	// 半端なブロックなら
+					if ((short_next >= 0) && (file_off + off == short_off)){
+						i = short_next;
+					} else {
+						i = short2_next;
+					}
 					num = s_blk[i].file;
 					if ((short_use[num] & 4) == 0){	// パディング部分を取り除いた CRC-32 を逆算する
 						short_crcs[num] = crc_reverse_zero(s_blk[i].crc, block_size - s_blk[i].size);
@@ -1456,7 +1473,8 @@ static int search_block_slide(
 					find_flag = correct_error(buf + off, s_blk[i].size, s_blk[i].hash, short_crcs[num], &err_off, &err_mag);
 					if (find_flag == 0)
 						find_flag = 2;
-				} else if ((find_next >= 0) && (file_off + off == last_off)){	// フルサイズのブロックなら
+				}
+				if ((find_flag < 0) && (find_next >= 0) && (file_off + off == last_off)){	// フルサイズのブロックなら
 					i = find_next;
 					if (crc == s_blk[i].crc){
 						data_md5(buf + off, block_size, hash);
@@ -1661,20 +1679,81 @@ static int search_block_slide(
 					find_next = i + 1;
 					if ((find_next >= source_num) || (s_blk[find_next].file != num)){
 						// 最後までいった、またはファイルが異なる
-						short_next = -1;
 						find_next = -1;
+						if ((short_next >= 0) && ((s_blk[short_next].exist & 0x1000) != 0))
+							short_next = -1;
+						if ((short2_next >= 0) && ((s_blk[short2_next].exist & 0x1000) != 0))
+							short2_next = -1;
 					} else if (s_blk[find_next].size < block_size){	// 半端なブロックは別に調べる
-						short_next = find_next;
-						short_off = file_off + off + block_size;
-						//printf("short_off = %I64d, short_next = %d, file = %d\n", short_off, short_next, num);
+						if (file_off + off + block_size + s_blk[find_next].size <= file_size){	// ファイル内に収まってる時だけ
+							tmp_next = find_next;
+							tmp_off = file_off + off + block_size;
+							if (find_flag <= 3){	// 順当な位置で見つけた場合
+								if ((tmp_next == short_next) && (tmp_off == short_off)){
+									// 予測済みのと一致するなら何もしない
+								} else if ((short_next >= 0) && (short2_next < 0)){	// 予測と異なるけど、別のが空いてるなら、そっちに記録する
+									//printf("short2_off = %I64d, short2_next = %d, file = %d\n", tmp_off, tmp_next, num);
+									short2_next = tmp_next;
+									short2_off = tmp_off;
+								} else {
+									if ((short_next >= 0) && (tmp_next == short2_next) && (tmp_off == short2_off)){	// 既に予測済みのと一致するなら入れ替える
+										short2_next = short_next;
+										short2_off = short_off;
+										//printf("exchange short2_off = %I64d, short2_next = %d\n", short2_off, short2_next);
+									}
+									//printf("short_off = %I64d, short_next = %d, file = %d\n", tmp_off, tmp_next, num);
+									short_next = tmp_next;
+									short_off = tmp_off;
+								}
+							} else if ((short_next < 0) &&
+									(((__int64)block_size * (__int64)(tmp_next - files[num].b_off) == tmp_off) ||
+									(tmp_off + s_blk[tmp_next].size == file_size))){
+								// 検出ブロックが順当でなくても、末尾ブロックの開始位置や末端がファイル・サイズに一致すれば
+								//printf("short_off = %I64d, short_next = %d, file = %d\n", tmp_off, tmp_next, num);
+								short_next = tmp_next;
+								short_off = tmp_off;
+							} else {
+								//printf("short2_off = %I64d, short2_next = %d, file = %d\n", tmp_off, tmp_next, num);
+								short2_next = tmp_next;
+								short2_off = tmp_off;
+							}
+						}
 						find_next = -1;
 					} else {
-						short_next = files[num].b_off + files[num].b_num - 1;	// 末尾ブロックの番号
-						if (s_blk[short_next].size < block_size){	// 半端なブロックは別に調べる
-							short_off = file_off + off + (__int64)(short_next - i) * (__int64)block_size;
-							//printf("short_off = %I64d, short_next = %d, file = %d\n", short_off, short_next, num);
-						} else {
-							short_next = -1;
+						tmp_next = files[num].b_off + files[num].b_num - 1;	// 末尾ブロックの番号
+						if (s_blk[tmp_next].size < block_size){	// 半端なブロックは別に調べる
+							tmp_off = file_off + off + (__int64)(tmp_next - i) * (__int64)block_size;
+							if (tmp_off + s_blk[tmp_next].size <= file_size){	// ファイル内に収まってる時だけ
+								if (find_flag <= 3){	// 順当な位置で見つけた場合
+									if ((tmp_next == short_next) && (tmp_off == short_off)){
+										// 予測済みのと一致するなら何もしない
+									} else if ((short_next >= 0) && (short2_next < 0)){	// 予測と異なるけど、別のが空いてるなら、そっちに記録する
+										//printf("far short2_off = %I64d, short2_next = %d, file = %d\n", tmp_off, tmp_next, num);
+										short2_next = tmp_next;
+										short2_off = tmp_off;
+									} else {
+										if ((short_next >= 0) && (tmp_next == short2_next) && (tmp_off == short2_off)){	// 既に予測済みのと一致するなら入れ替える
+											short2_next = short_next;
+											short2_off = short_off;
+											//printf("exchange short2_off = %I64d, short2_next = %d\n", short2_off, short2_next);
+										}
+										//printf("far short_off = %I64d, short_next = %d, file = %d\n", tmp_off, tmp_next, num);
+										short_next = tmp_next;
+										short_off = tmp_off;
+									}
+								} else if ((short_next < 0) &&
+										(((__int64)block_size * (__int64)(tmp_next - files[num].b_off) == tmp_off) ||
+										(tmp_off + s_blk[tmp_next].size == file_size))){
+									// 検出ブロックが順当でなくても、末尾ブロックの開始位置や末端がファイル・サイズに一致すれば
+									//printf("far short_off = %I64d, short_next = %d, file = %d\n", tmp_off, tmp_next, num);
+									short_next = tmp_next;
+									short_off = tmp_off;
+								} else if ((short2_next != tmp_next) || (short2_off != tmp_off)){
+									//printf("far short2_off = %I64d, short2_next = %d, file = %d\n", tmp_off, tmp_next, num);
+									short2_next = tmp_next;
+									short2_off = tmp_off;
+								}
+							}
 						}
 					}
 					tiny_skip = 0;	// 小さなファイルをブロック直後に一回だけ探す
--- a/source/par2j/version.h
+++ b/source/par2j/version.h
@@ -1,2 +1,2 @@
-#define FILE_VERSION "1.3.3.1"	// ファイルのバージョン番号
+#define FILE_VERSION "1.3.3.3"	// ファイルのバージョン番号
 #define PRODUCT_VERSION "1.3.3"	// 製品のバージョン番号
Author	SHA1	Message	Date
Yutaka Sawada	bf5b8d60aa	Release note of version 1.3.3.3	2024-06-15 13:50:52 +09:00
Yutaka Sawada	8d6c1cd0ea	Update release day	2024-06-15 13:42:41 +09:00
Yutaka Sawada	efe921aff5	Fixed bug in verifying external files	2024-06-09 12:48:50 +09:00
Yutaka Sawada	de3a853228	Improve detection of the last slice	2024-06-09 12:45:37 +09:00
Yutaka Sawada	77e20ad55c	Add help files' location	2024-04-13 11:45:37 +09:00
Yutaka Sawada	ffa5c8bf64	Add documents' location	2024-04-13 11:44:29 +09:00
Yutaka Sawada	ba47ccb680	Add usage of batch file	2024-04-13 11:42:44 +09:00
Yutaka Sawada	e97b1bee52	Add files via upload	2024-02-11 13:13:46 +09:00
Yutaka Sawada	24f43183fd	Change RecoveryFileLimit option	2024-02-11 11:54:59 +09:00
Yutaka Sawada	0628cf9249	Add usage of RecoveryFileLimit	2024-02-11 11:53:55 +09:00
Yutaka Sawada	da879a098f	Fixed file splitter	2024-02-09 11:59:08 +09:00
Yutaka Sawada	eb010ec7f5	Improve name checking of splited file	2024-02-09 11:57:39 +09:00
Yutaka Sawada	70fd411418	New sample of v1.3.3.3	2024-01-21 17:04:32 +09:00
Yutaka Sawada	e2f1251c70	Sample of v1.3.3.3	2024-01-21 10:41:57 +09:00
Yutaka Sawada	06ad11340d	Fixed a bug at many OpenCL devices	2024-01-21 10:40:14 +09:00
Yutaka Sawada	5c34457f2d	Increment version number	2024-01-21 10:39:29 +09:00
Yutaka Sawada	7cdcba4a35	Merge pull request #111 from guyi2000/master Fix access violation in some certain conditions	2024-01-20 22:50:04 +09:00
MPCBBishop	b30be14b3e	Fix access violation in some certain conditions	2024-01-20 18:58:16 +08:00
Yutaka Sawada	3b8d510aeb	Release note of version 1.3.3.2	2024-01-10 13:18:10 +09:00
Yutaka Sawada	9132c437fc	Update to year 2024	2024-01-10 10:44:19 +09:00
Yutaka Sawada	7159bbb1fd	Update to year 2024	2024-01-10 10:40:33 +09:00
Yutaka Sawada	ae9643f2ce	Add files via upload	2023-12-26 18:57:09 +09:00
Yutaka Sawada	6559e62276	Change lc option	2023-12-26 18:56:26 +09:00
Yutaka Sawada	1552fb8ec8	Add files via upload	2023-12-26 18:53:12 +09:00
Yutaka Sawada	79d0b184b8	Add notice of save_path	2023-12-03 21:43:40 +09:00
Yutaka Sawada	2793349268	Update PAR2 clients	2023-11-27 14:31:12 +09:00
Yutaka Sawada	4a7845dc7a	Erase old section	2023-11-27 14:29:58 +09:00
Yutaka Sawada	978bbe4b40	Optimization for AMD GPU	2023-11-27 14:19:43 +09:00
Yutaka Sawada	0bd2b92237	Optimization for AMD GPU	2023-11-27 14:18:13 +09:00
Yutaka Sawada	be51d4c842	Update for v1.3.3.1	2023-11-19 11:28:42 +09:00
Yutaka Sawada	af2ac4b113	Notice of changed option	2023-11-18 19:30:13 +09:00