Intro Download and install Frequently Asked Questions Tips and tricks

Homepage







© J.C. Kessels 2009
MyDefrag Forum
May 25, 2013, 04:14:01 pm *
Welcome, Guest. Please login or register.

Login with username, password and session length
News:
 
   Home   Help Search Login Register  
Pages: [1]
  Print  
Author Topic: What exactly are the sorting characters for SortByName?  (Read 645 times)
JayN
Newbie
*
Posts: 3


View Profile
« on: April 10, 2012, 02:46:22 am »

I wrote a program to dump the starting lcn.  Below is an excerpt following defrag with sortByName.  I presume you are ignoring case, but not clear what characters are being handled.
I sorted unicode and dumped, ignoring case.  These LCN should be ascending.  I understand the short files will go in MFT and be 0, but it isn't clear why some of the non-zero LCN are out of order in the list below.

In order for the sortByName to be most useful, we will need to know exactly how to execute accesses in the sort order you have chosen, so it would be helpful if you posted the exact sorting algorithm ... including the characters supported and whether it ignores case .

lcn:00000000:g:\tz\eclipse\plugins\com.freescale.morpho.tcf_4.0.0.122125-201203091545\morpho\ta\tcf_Services_ppcsim_ids.py
lcn:00332069:g:\tz\eclipse\plugins\com.freescale.morpho.tcf_4.0.0.122125-201203091545\morpho\ta\tcf_Service_ids.py
lcn:00332073:g:\tz\eclipse\plugins\com.freescale.morpho.tcf_4.0.0.122125-201203091545\morpho\ta\tcf_xp_cdde_manager_ids$py.class
lcn:00000000:g:\tz\eclipse\plugins\com.freescale.morpho.tcf_4.0.0.122125-201203091545\morpho\ta\tcf_xp_cdde_manager_ids.py
lcn:00000000:g:\tz\eclipse\plugins\com.freescale.morpho.tcf_4.0.0.122125-201203091545\morpho\ta\tcf_xp_services_Cerberus_ids.py
lcn:00332074:g:\tz\eclipse\plugins\com.freescale.morpho.tcf_4.0.0.122125-201203091545\morpho\ta\tcf_xp_SynchronousProxies_ids$py.class
lcn:00332076:g:\tz\eclipse\plugins\com.freescale.morpho.tcf_4.0.0.122125-201203091545\morpho\ta\tcf_xp_SynchronousProxies_ids.py
lcn:00000000:g:\tz\eclipse\plugins\com.freescale.morpho.tcf_4.0.0.122125-201203091545\morpho\ta\trkTcpIpTransport_ids.py
lcn:00000000:g:\tz\eclipse\plugins\com.freescale.morpho.tcf_4.0.0.122125-201203091545\morpho\ta\trk_connection_ids.py
lcn:00331974:g:\tz\eclipse\plugins\com.freescale.morpho.tcf_4.0.0.122125-201203091545\morpho\ta\__init__$py.class
lcn:00000000:g:\tz\eclipse\plugins\com.freescale.morpho.tcf_4.0.0.122125-201203091545\morpho\ta\__init__.py
lcn:00331545:g:\tz\eclipse\plugins\com.freescale.morpho.tcf_4.0.0.122125-201203091545\morpho\tcf.core.cwmanifest
lcn:00331546:g:\tz\eclipse\plugins\com.freescale.morpho.tcf_4.0.0.122125-201203091545\morpho\tcf.Locations.cwmanifest
Logged
jeroen
Administrator
JkDefrag Hero
*****
Posts: 7155



View Profile WWW
« Reply #1 on: April 10, 2012, 09:56:10 am »

MyDefrag does the comparison in two parts, for performance reasons. First it compares the directory paths (without filenames). If the paths are equal then it compares the filenames. In your example the files in the "\morpho\" folder are placed before the files in the "\morpho\ta\" subdirectory, because "\morpho\" is less (smaller) than "\morpho\ta\". To compare filenames MyDefrag uses the standard "_wcsicmp()" system call from Microsoft (with the default C locale). I don't know exactly how that function does it's magic, but I am fairly sure it's a simple binary compare with "A"-"Z" equal to "a"-"z", no other case-insensitive characters. In your example the files beginning with "_" are placed before files beginning with "t", because "_" is less than "t" in the character table. I am guessing that the program you used to sort the paths, treats "a"-"z" as equal to "A"-"Z" (the other way around), because "_" is greater than "T" in the character table.

Yes, the zero LCN files will be files that are stored in the MFT.
Logged
JayN
Newbie
*
Posts: 3


View Profile
« Reply #2 on: April 10, 2012, 02:28:40 pm »

ok I assume that explains it, since the sort I'm using does convert to uppercase for case-insensitive compare.

On the pathnames fs filenames ... I'm not quite sure about that part.   The pathnames returned by findFirst, findNext don't have a terminating \ .  There is no filename.  So would I split off the terminating folder name for this comparison?

MyDefrag does the comparison in two parts, for performance reasons. First it compares the directory paths (without filenames). If the paths are equal then it compares the filenames. In your example the files in the "\morpho\" folder are placed before the files in the "\morpho\ta\" subdirectory, because "\morpho\" is less (smaller) than "\morpho\ta\". To compare filenames MyDefrag uses the standard "_wcsicmp()" system call from Microsoft (with the default C locale). I don't know exactly how that function does it's magic, but I am fairly sure it's a simple binary compare with "A"-"Z" equal to "a"-"z", no other case-insensitive characters. In your example the files beginning with "_" are placed before files beginning with "t", because "_" is less than "t" in the character table. I am guessing that the program you used to sort the paths, treats "a"-"z" as equal to "A"-"Z" (the other way around), because "_" is greater than "T" in the character table.

Yes, the zero LCN files will be files that are stored in the MFT.
Logged
jeroen
Administrator
JkDefrag Hero
*****
Posts: 7155



View Profile WWW
« Reply #3 on: April 13, 2012, 05:31:29 am »

So would I split off the terminating folder name for this comparison?
No.
Logged
JayN
Newbie
*
Posts: 3


View Profile
« Reply #4 on: April 22, 2012, 08:51:35 pm »

Table 5.1 in this article indicates that ntfs converts to upper case when doing its directory inserts, so it seems like an option to sort by upper case names might improve the disk order for later processing by directory entry. 

http://members.fortunecity.com/clark_kent/exjobb/report3.html
 
A  $UpCase  Maps lowercase characters to their uppercase version, used when inserting files in a directory. 


Logged
jeroen
Administrator
JkDefrag Hero
*****
Posts: 7155



View Profile WWW
« Reply #5 on: April 23, 2012, 06:56:21 am »

I did not know that NTFS internally maps lowercase to uppercase, instead of uppercase to lowercase like the Microsoft "_wcsicmp()" system call that I am using. Perhaps I should use "_wcsicoll()" instead, but I am afraid that will make sorting a lot slower.

The exact order in which the data of files is placed on disk (which is what MyDefrag does) will improve the speed when this data is accessed in the same sequence. For example when copying a lot of files. This is governed by the program that does the accessing, not by NTFS. MyDefrag does not change the NTFS internal sorting order, the way that file entries are stored internally in the MFT, so for MyDefrag it does not matter how NTFS internally sorts the filenames. It only matters that MyDefrag places the data of the files in the same order as how it is accessed, and as said that has nothing to do with NTFS. The main reason why MyDefrag offers sorting by name is because it's a quick and easy way to get all the files in a folder close together on disk. This speeds up many user actions, because files that are used together are usually stored together in the same folder (-tree).
Logged
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.5 | SMF © 2006-2008, Simple Machines LLC Valid XHTML 1.0! Valid CSS!