Copyright © 2016 by Alan Conroy. This article may be copied in whole or in part as long as this copyright is included.


1 Introduction
2 Ground Rules

Building a File System
3 File Systems
4 File Content Data Structure
5 Allocation Cluster Manager
6 Exceptions and Emancipation
7 Base Classes, Testing, and More
8 File Meta Data
9 Native File Class
10 Our File System
11 Allocation Table
12 File System Support Code
13 Initializing the File System
14 Contiguous Files
15 Rebuilding the File System
16 Native File System Support Methods
17 Lookups, Wildcards, and Unicode, Oh My
18 Finishing the File System Class

The Init Program
19 Hardware Abstraction and UOS Architecture
20 Init Command Mode
21 Using Our File System
22 Hardware and Device Lists
23 Fun with Stores: Partitions
24 Fun with Stores: RAID
25 Fun with Stores: RAM Disks
26 Init wrap-up

The Executive
27 Overview of The Executive
28 Starting the Kernel
29 The Kernel
30 Making a Store Bootable
31 The MMC
32 The HMC
33 Loading the components
34 Using the File Processor
35 Symbols and the SSC
36 The File Processor and Device Management
37 The File Processor and File System Management
38 Finishing Executive Startup

Users and Security
39 Introduction to Users and Security
40 More Fun With Stores: File Heaps
41 File Heaps, part 2
42 SysUAF
43 TUser
44 SysUAF API

Terminal I/O
45 Shells and UCL
46 UOS API, the Application Side
47 UOS API, the Executive Side
48 I/O Devices
49 Streams
50 Terminal Output Filters
51 The TTerminal Class
52 Handles
53 Putting it All Together
54 Getting Terminal Input
55 QIO
56 Cooking Terminal Input
57 Putting it all together, part 2
58 Quotas and I/O

UCL
59 UCL Basics
60 Symbol Substitution
61 Command execution
62 Command execution, part 2
63 Command Abbreviation
64 ASTs
65 Expressions, Part 1
66 Expressions, Part 2: Support code
67 Expressions, part 3: Parsing
68 SYS_GETJPIW and SYS_TRNLNM
69 Expressions, part 4: Evaluation

UCL Lexical Functions
70 PROCESS_SCAN
71 PROCESS_SCAN, Part 2
72 TProcess updates
73 Unicode revisted
74 Lexical functions: F$CONTEXT
75 Lexical functions: F$PID
76 Lexical Functions: F$CUNITS
77 Lexical Functions: F$CVSI and F$CVUI
78 UOS Date and Time Formatting
79 Lexical Functions: F$CVTIME
80 LIB_CVTIME
81 Date/Time Contexts
82 SYS_GETTIM, LIB_Get_Timestamp, SYS_ASCTIM, and LIB_SYS_ASCTIM
83 Lexical Functions: F$DELTA_TIME
84 Lexical functions: F$DEVICE
85 SYS_DEVICE_SCAN
86 Lexical functions: F$DIRECTORY
87 Lexical functions: F$EDIT and F$ELEMENT
88 Lexical functions: F$ENVIRONMENT
89 SYS_GETUAI
90 Lexical functions: F$EXTRACT and F$IDENTIFIER
91 LIB_FAO and LIB_FAOL
92 LIB_FAO and LIB_FAOL, part 2
93 Lexical functions: F$FAO
94 File Processing Structures
95 Lexical functions: F$FILE_ATTRIBUTES
96 SYS_DISPLAY
97 Lexical functions: F$GETDVI
98 Parse_GetDVI
99 GetDVI
100 GetDVI, part 2
101 GetDVI, part 3
102 Lexical functions: F$GETJPI
103 GETJPI
104 Lexical functions: F$GETSYI
105 GETSYI
106 Lexical functions: F$INTEGER, F$LENGTH, F$LOCATE, and F$MATCH_WILD
107 Lexical function: F$PARSE
108 FILESCAN
109 SYS_PARSE
110 Lexical Functions: F$MODE, F$PRIVILEGE, and F$PROCESS
111 File Lookup Service
112 Lexical Functions: F$SEARCH
113 SYS_SEARCH
114 F$SETPRV and SYS_SETPRV
115 Lexical Functions: F$STRING, F$TIME, and F$TYPE
116 More on symbols
117 Lexical Functions: F$TRNLNM
118 SYS_TRNLNM, Part 2
119 Lexical functions: F$UNIQUE, F$USER, and F$VERIFY
120 Lexical functions: F$MESSAGE
121 TUOS_File_Wrapper
122 OPEN, CLOSE, and READ system services

UCL Commands
123 WRITE
124 Symbol assignment
125 The @ command
126 @ and EXIT
127 CRELNT system service
128 DELLNT system service
129 IF...THEN...ELSE
130 Comments, labels, and GOTO
131 GOSUB and RETURN
132 CALL, SUBROUTINE, and ENDSUBROUTINE
133 ON, SET {NO}ON, and error handling
134 INQUIRE
135 SYS_WRITE Service
136 OPEN
137 CLOSE
138 DELLNM system service
139 READ
140 Command Recall
141 RECALL
142 RUN
143 LIB_RUN
144 The Data Stream Interface
145 Preparing for execution
146 EOJ and LOGOUT
147 SYS_DELPROC and LIB_GET_FOREIGN

CUSPs and utilities
148 The I/O Queue
149 Timers
150 Logging in, part one
151 Logging in, part 2
152 System configuration
153 SET NODE utility
154 UUI
155 SETTERM utility
156 SETTERM utility, part 2
157 SETTERM utility, part 3
158 AUTHORIZE utility
159 AUTHORIZE utility, UI
160 AUTHORIZE utility, Access Restrictions
161 AUTHORIZE utility, Part 4
162 AUTHORIZE utility, Reporting
163 AUTHORIZE utility, Part 6
164 Authentication
165 Hashlib
166 Authenticate, Part 7
167 Logging in, part 3
168 DAY_OF_WEEK, CVT_FROM_INTERNAL_TIME, and SPAWN
169 DAY_OF_WEEK and CVT_FROM_INTERNAL_TIME
170 LIB_SPAWN
171 CREPRC
172 CREPRC, Part 2
173 COPY
174 COPY, part 2
175 COPY, part 3
176 COPY, part 4
177 LIB_Get_Default_File_Protection and LIB_Substitute_Wildcards
178 CREATESTREAM, STREAMNAME, and Set_Contiguous
179 Help Files
180 LBR Services
181 LBR Services, Part 2
182 LIBRARY utility
183 LIBRARY utility, Part 2
184 FS Services
185 FS Services, Part 2
186 Implementing Help
187 HELP
188 HELP, Part 2
189 DMG_Get_Key and LIB_Put_Formatted_Output
190 LIBRARY utility, Part 3
191 Shutting Down UOS
192 SHUTDOWN
193 WAIT
194 SETIMR
195 WAITFR and Scheduling
196 REPLY, OPCOM, and Mailboxes
197 REPLY utility
198 Mailboxes
199 BRKTHRU
200 OPCOM
201 Mailbox Services
202 Mailboxes, Part 2
203 DEFINE
204 CRELNM
205 DISABLE
206 STOP
207 OPCCRASH and SHUTDOWN
208 APPEND

Glossary/Index


Downloads

Lookups, Wildcards, and Unicode, Oh My

Now that we have a file system that we can initialize, rebuild, mount, and dismount, we need to be able to access the files in the file system. In order to do that, we need a way to search the folders for a given file. To distinguish this operation from searching the data in the files, we will call it file lookup instead of file search. At first glance, this may appear to be simple. By the time we are done with this article, you will see that this is one of the more complicated operations our file system will perform.

{ Lookup file in directory, returning index in directory (or -1 if not found).
  Free_Index will be set to the first free header in the index (or -1 if none
  before file). }
function TUOS_Native_File_System.Find_File( const FileSpec : string ;
    Directory : TUOS_Native_File ; var Free_Index : longint ) : longint ;

var Header : TUOS_File_Header ;
    S : string ;

begin
    Free_Index := -1 ;
    Result := 0 ;
    while( true ) do
    begin
        if( Directory.Read( 0, ( Result - 1 ) * sizeof( Header ),
	  sizeof( Header ), Header ) = 0 ) then
        begin
            // No more entries in this directory...
            Set_Last_Error( Create_Exception( UOS_File_System_Path_Not_Found, nil ) ) ;
            Result := -1 ;
            exit ;
        end ;
        if( ( Header.Name <> 0 ) and ( ( Header.Flags and FAF_DELETED ) = 0 ) ) then 
        begin // Not a null entry or deleted file
            S := Get_String( Header.Name ) ;
            if( WC_Match( FileSpec, S ) ) then
            begin
                exit ;
            end ;
        end else
        if( ( Header.Name = 0 ) and ( Free_Index < 0 ) ) then
        begin
            Free_Index := Result ;
        end ; // if( H.Name <> 0 )
        inc( Result ) ;
    end ;  // while( true )
    Result := -1 ;
end ;

This is an internal method to the class. It takes a folder file (Directory) and a file name (File_Spec). It returns both an index of the matching file (function result) or -1 if no match was found, and it returns the first unused ("free") header index found in the folder. The function is very simple - just loop through the entries in the folder until we find a match. We use the WC_Match function to do the comparison between the file name and the passed Filespec, which handles Unicode. More on that later.

function TUOS_Native_File_System.Lookup_Directory( Spec : string ) : TUOS_Native_File ;

var D : string ;
    Dummy : integer ;
    F : TUOS_Native_File ;
    H : TUOS_File_Header ;
    P : cardinal ;

begin
    Result := nil ;

    // Get root directory...
    if( copy( Spec, length( Spec ), 1 ) = '\' ) then
    begin
        setlength( Spec, length( Spec ) - 1 ) ;
    end ;
    if( Spec = '' ) then // Asked for root
    begin
        Result := Root ;
        exit ;
    end ;
    F := Root ;

    // Find specified directory...
    while( Spec <> '' ) do
    begin
        Dummy := pos( '\', Spec + '\' ) ;
        D := lowercase( copy( Spec, 1, Dummy - 1 ) ) ;
        Spec := copy( Spec, Dummy + 1, length( Spec ) ) ;
        P := 0 ;
        while( F.Read( 0, P, sizeof( H ), H ) > 0 ) do
        begin // Not a null entry or deleted file
            P := P + sizeof( H ) ;
            if( ( H.Name <> 0 ) and ( ( H.Flags and FAF_DELETED ) = 0 ) ) then 
            begin
                if( WC_Match( D, lowercase( Get_String( H.Name ) ) ) ) then
                begin
                    F := Create_File_Instance( F, P div sizeof( H ) ) ;
                    F.Header := H ;
                    if( Spec = '' ) then
                    begin
                        Result := F ;
                        exit ;
                    end ;
                    if( ( H.Flags and FAF_DIRECTORY ) = 0 ) then 
                    begin // Path contained a non-folder
                        if( F <> _Root ) then
                        begin
                            F.Free ;
                        end ;
                        exit ;
                    end ;
                    P := 0 ;
                    continue ;
                end ; // if
            end ; // if
        end ; // while

        // If we get here, we didn't find a match in the directory
        if( F <> _Root ) then
        begin
            F.Free ;
        end ;
        exit ;
    end ; // while( Spec <> '' )
    Result := F ;
end ; // TUOS_Native_File_System.Lookup_Directory

The method takes a file specification, which contains a folder path and file name. We always start with the root folder, then we process through each of the folders in the path, looking for the file in the current folder. If found, and it is a folder (FAF_DIRECTORY in the flags), we move to that folder and run through that folder as well. We keep track of the first free entry found in case this method is called from the create method so that it knows where a new file entry can be written. Notice that in both this method and the Find_File method, we check the header flags for FAF_DELETED. You may ask why a file is marked deleted but the name field isn't set to 0 to indicate an unused entry. That is because when a file is deleted, it may still be in use (open) by a program. The way Windows handles this is by not allowing open files to be deleted. UOS is more forgiving. We don't want to physically delete a file while it is in use, but by marking it as deleted and then ignoring those headers, we can logically delete the file even if we don't physically delete it. When a logically deleted file is closed (by everyone that has it open), it can be physically deleted. If you think about it, you might ask why we need a flag for this in the file header - wouldn't a flag in the file object suffice? No, it wouldn't, because if the system crashes before it is physically deleted, the file will suddenly reappear when the system starts up again. This would be especially bad if another file of the same name had been created because now there would be duplicate files in the folder. Another question you might ask is: why don't we then physically delete these files during the rebuild process? The reason has to do with security concerns, which are handled by the File Processor. We'll talk about that later.
Again, we use the WC_Match function to see if we found the correct folder name(s) and file name. This is not only for Unicode matching but also for wildcard support, which brings us to...

Wildcards:
Finding a specific file is not the only kind of file lookup we need to support. Users will want to do listings of all files in a folder, or all files with a specific extension, and so forth. We could leave it up to the directory program (eg "Explorer" on Windows) to open the folder file and work through the headers, but that would be less efficient and would require privileges that the user might not have. Further, it wouldn't support unprivileged users doing directory searches within their own programs. For these reasons, all file systems support wildcard lookups. A wildcard is a character in a filename template that represents one or more characters in the actual file names. For the technical users, the most flexible way to do this would be to support regular expressions. However, that is too technical for the general public and so all file systems support wildcards, although different operating systems support different wildcards (for instance, VMS supports a "%" wildcard). For our file system, we will support "*" and "?" wildcards. The question mark is used to match one single character. The asterisk is used to match any number of characters (including no characters at all). The File Processor will support additional wildcards, but it will apply those on top of what we do without our class having to know anything about them. Note that we break from the VMS specification by not supporting the "%" wildcard. The reason for this is because most people are used to being able to use "%" as a character in a filename. The utility provided by the wildcard is overshadowed by the potential confusion resulting from breaking with what most people are used to.
File extensions are a short code appended to the end of filenames that indicates the file's purpose or format. Usually they are separated from the rest of the filename by a dot (.). In olden days, the extension was kept physically separate from the filename in the file header. Modern operating systems allow any number of dots within a filename and extensions are largely a matter of usage rather than being built into the file system. But the convention is so commonly used that we need to provide a minimal amount of support for it. Typically, a wildcard specification which contains a dot is indicating the position of the extension. Without special handling, something like "*.A*" would match both filenames "File.A.txt" as well as "File.Application". Because our wildcard matching will largely be done from left to right, the first dot in the file name matches the dot in our wildcard specification. But convention says that the first file's extension is ".txt" and it shouldn't match the wildcard specification. So, in the case of a dot in the specification, we will need to match it up with the last dot appearing in the file name when we do our matching. We also have to handle a null extension. That is, when the filename ends with a dot. Further, we need to handle the case where there are no dots at all, in the file. This case is also considered a null extension. The idea being that the dot is more of a delimiter for the extension than an actual character. In fact, the early operating systems did not store the dot at all - it was always implied. Finally, it is customary for a single asterisk to match against any/all files regardless of extension. For ease of coding, we will internally normalize a wildcard specification of "*" as "*.*".

Unicode:
If all we had to deal with is wildcard matching, our job would be relatively easy. However, recall that we support Unicode filenames and store them in UTF8 format. UTF8 is simply a compressed way of storing Unicode. The use of UTF8 means that we need an added step of normalizing (decompressing) the UTF8 to Unicode-32.
But that's not all our matching function needs to do. Unix and its derivatives (like Linux) support case-sensitive filenames. That is, "This file.txt" and "this File.txt" are treated as different filenames. This may appeal to the more technically-minded people - and it is simpler to implement in code - but the average person doesn't see why these two names don't refer to the same file. In fact, Unix/Linux are the only operating systems I know of that allow this abomination. UOS will treat filenames in a case-insensitive way for the sake of the average person on planet Earth. However, this requires extra work on our part. This is not unique to Unicode, of course - we'd have to deal with it in an ASCII-only situation as well. The best way to simplify our code is to normalize the values we are comparing to either all lowercase or all uppercase. For various reasons which I won't go into, we will normalize everything to lowercase. In our example, both names would normalize to "this file.txt". Mind you, we will store the name using the mixed case specified by the user when the file is created or renamed. However, during our file lookup, we will normalize the strings before we do our matching. Though we need to attack this problem even without Unicode, the truth is that Unicode makes it a little more complicated. You see, there are sometimes multiple uppercase letters (for some languages) that normalize to a single lowercase letter. In other cases (eg Greek) the glyph used for a character changes if it is at the end of the word. For some characters, the process actually adds characters to the string. This whole process is called "character folding", but we will call it part of our normalization process. A more detailed description of all the issues involved can be found at The Unicode Consortium. Fortunately, the Unicode people have provided a table that gives us the folding mappings.
We also remove multiple spaces for similar reasons. Imagine trying to tell how many spaces separate other characters - especially when using a proportional font to display it. Most people couldn't tell if there were one, two, or three spaces. There are other characters which might be hard to distinguish, such as underscores which can run together to look like a solid line. However, both Windows and Linux support multiple underscores so we will also allow it so as to allow easier interchange of files between the different systems.
Armed with this information, we can now write our WC_Match function.

For the sake of explanation, let's look at a simplified version of the wildcard match routine, which assumes an already normalized and non-Unicode string:

function Wildcard_Match( Wildcard, Name : string ;
    wildcard_has_asterisk : boolean ) : boolean ;

var Dummy, Dummy1 : integer ;
    Wildcard_Start, Wildcard_End, Name_Start, Name_End : integer ;
    S : string ;

begin
    // Quick check and setup...
    if( Wildcard = '*.*' ) then
    begin
        Result := True ;
        exit ;
    end ;

First we do a quick check to see if the wildcard is nothing but a wildcard. If so, it would match any filename and so we can immediately exit with a positive result.


    // Non-wildcard check...
    if( not Wildcard_Has_Asterisk ) then
    begin
        Result := Equal( Wildcard, Name ) ;
        exit ;
    end ;
    Result := False ; // Assume failure

The next quick check is for the opposite situation - where there are no wildcards at all. In this case, we do a simple comparison beteween the filename and the specification and return the result.
Otherwise, we have mixed wildcard and non-wildcard characters in the specification.


    // Do wildcard match...
    Wildcard_Start := 1 ;
    Name_Start := 1 ;
    Wildcard_End := length( Wildcard ) ;
    Name_End := length( Name ) ;

    // Check prefix before first wildcard...
    Dummy := pos( '*', Wildcard ) ;
    if( Dummy > Wildcard_Start ) then // Something before the asterisk
    begin
        if( not Compare( Wildcard, Wildcard_Start, Dummy - Wildcard_Start, Name,
	  Name_Start, True ) ) then
        begin
            exit ;
        end ;
        Wildcard_Start := Dummy ;
        Name_Start := Dummy ;
    end ; // if( Dummy > Wildcard_Start )

    // Check suffix after last wildcard...
    Dummy := RPos( Wildcard, '*' ) ;
    if( Dummy < length( Wildcard ) ) then
    begin
        if( not Compare( Wildcard, Dummy + 1, length( Wildcard ) - Dummy, Name,
	  length( Name ) - ( length( Wildcard ) - Dummy ) + 1, True ) ) then
        begin
            exit ;
        end ;
        Wildcard_End := Dummy ;
        Name_End := length( Name ) - ( length( Wildcard ) - Dummy ) ;
    end ; // if( Dummy < length( Wildcard ) )

We use a start and end pointer into both the wildcard and filename strings. If the wildcard doesn't start the specification then, for the segment of text up to the wildcard, there needs to be an exact match with the same number of characters starting the filename. If not, we exit with a failure.
Likewise, if the wildcard doesn't end the specification then, for the segment of text past the wildcard, there needs to be an exact match with the same number of characters ending the filename. If not, we exit with a failure.
In both cases, the starting and ending positions are adjusted. Once those fixed portions of the filename are dealt with, the remainder of the matches float somewhere between those ends. The remainder of the matching takes place from left to right, matching each segment of the wildcard (each area between asterisks). Here are some example cases to illustrate this:
*.*Entirely wildcards - everything matches
xNo wildcards - exact match required
a*cMust start with "a" and end with "c". Anything else is allowed between them
*a*An "a" must appear somewhere in the name, but anything (or nothing) before or after. There is a single match segment.
a*b*cMust start with "a" and end with "c", and a "b" has to occur somewhere in-between. There is a single match segment.
*b*c*Must contain a b somewhere with a c somewhere after that. This has two match segments.

Wildcard_Start is the location of the first asterisk.


    // Check for remaining matches, left-to-right...
    while( Wildcard_Start <= Wildcard_End ) do
    begin
        if( Wildcard_Start >= Wildcard_End ) then // All that's left in the wildcard spec is "*"
        begin
            break ; // So we match whatever remains in Name
        end ;
        Dummy := _Pos( '*', Wildcard, Wildcard_Start + 1 ) ;
        S := Copy( Wildcard, Wildcard_Start + 1, Dummy - Wildcard_Start - 1 ) ;
        Dummy1 := _Pos( Name, S, Name_Start ) ;
        if( Dummy1 = 0 ) then
        begin
            exit ; // Not found
        end ;
        Name_Start := Dummy1 + Dummy - Wildcard_Start - 1 ;
        Wildcard_Start := Dummy ; // Move past wildcard and matching characters
    end ; // while

    Result := True ;
end ; // _Wildcard_Match

The remainder of the code loops through however many segments exist in the wildcard, matching those segments (from left to right) up against the name. Note that once the Wildcard pointers are both pointing to the same spot, then we have matched everything on both sides of the current asterisk wildcard, and anything remaining in the name matches by virtue of the fact that all we have left is a wildcard. As we match segments, we adjust the Wildcard_Start pointer so that it is always pointing to the asterisk prior to the current segment.
If we make it through to the end with no mismatches, we have a match between the name and the wildcard.

We made use of a Compare function that compares a substring of Wildcard to a substring of Name. Here is that function:

function Compare( const Contents : string ; Wildcard_Start, _Length : integer ;
    const Match : string ; Name_Start : integer ; Wildcard : boolean ) : boolean ;

var Loop : integer ;

begin
    // Setup...
    Result := False ;
    if( _Length < 1 ) then
    begin
        exit ;
    end ;
    if( Wildcard_Start < 1 ) then
    begin
        Wildcard_Start := 1 ;
    end ;
    if( Name_Start < 1 ) then
    begin
        Name_Start := 1 ;
    end ;
    if( Wildcard_Start + _Length > Length( Contents ) + 1 ) then
    begin
        _Length := Length( Contents ) - Wildcard_Start + 1 ;
    end ;

    // Do comparison...
    for Loop := 0 to _Length - 1 do
    begin
        if( Name_Start + Loop > length( Match ) ) then
        begin
            exit ;
        end ;
        if( Contents[ Wildcard_Start + Loop ] <> Match[ Name_Start + Loop ] ) then
        begin
            if( ( not Wildcard ) or ( Contents[ Wildcard_Start + Loop ] <> '?' ) ) then
            begin
                exit ;
            end ;
        end ;
    end ;
    Result := True ;
end ; // _Compare

We make sure that the passed parameters are within the limits of the strings. Then we step through each character of the wildcard and name strings, starting at the specific start locations, and compare them. The only difference from a standard string compare is that we match on any character that is a question mark (?) wildcard.

We also use an Equal function that is a simplified version of Compare. Why not use Compare instead? The reason is that there is extra overhead in Compare (and in the call itself due to the extra parameters) - and we want the matching code to run fast.

function _Equal( const Contents, Match : string ) : boolean ;

var Loop : integer ;

begin
    Result := False ;
    if( Length( Contents ) <> length( Match ) ) then
    begin
        exit ;
    end ;
    for Loop := 1 to Length( Contents ) do
    begin
        if( ( Contents[ Loop ] <> Match[ Loop ] ) ) then
        begin
            if( ( Contents[ Loop ] <> '?' ) and ( Match[ Loop ] <> '?' ) ) then
            begin
                exit ;
            end ;
        end ;
    end ;
    Result := True ;
end ; // TUnicode_String.Equal

Likewise, our _Pos function looks for a string within another string, as does the Pascal pos() function. The difference is that we match on the question mark (?) wildcard.

function _Pos( const Contents : string ; const Value : string ;
    Start : integer = 1 ) : integer ;

var Dummy, Dummy1 : integer ;
    Found : boolean ;

begin
    Result := 0 ;
    if( Start > Length( Contents ) ) then
    begin
        exit ;
    end ;
    if( length( Value ) > Length( Contents ) - Start + 1 ) then
    begin
        exit ; // Substring is longer than our contents
    end ;
    for Dummy := Start to Length( Contents ) - length( Value ) + 1 do
    begin
        Found := True ;
        for Dummy1 := 1 to length( Value ) do
        begin
            if(
                ( Value[ Dummy1 ] <> Contents[ Dummy1 + Dummy - 1 ] )
                and
                ( Contents[ Dummy1 + Dummy - 1 ] <> '?' )
                and
                ( Value[ Dummy1 ] <> '?' )
              ) then
            begin
                Found := False ;
                break ;
            end ;
        end ; // for Dummy1
        if( Found ) then
        begin
            Result := Dummy ;
            exit ;
        end ;
    end ; // for Dummy
end ;

This is fairly straight-forward code. Basically, we loop through the characters of the one string, looking for a match with the substring.

TUnicode_String
The preceding code was for the purpose of explaining the algorithm. Though we will be keeping the algorithm, we will not be keeping the code. If we were only dealing with ASCII strings, we would be (nearly) finished. However, that is not the case. So, we now present the TUnicode_String class, which contains all the support needed for the use of Unicode in our wildcard matching.

type TUnicode_String = class
                           private // Instance data...
                               Has_Asterisk : boolean ;
                               Contents : array[ 0..384 ] of cardinal ;

                           protected // Property handlers...
                               // Return length of our contents...
                               function Get_Length : integer ;
                               procedure Set_Length( Value : integer ) ;

                           public // API...
                               function As_String : string ;

                               // Assign our contents from a UTF8 string...
                               procedure Assign_From_String( const S : string ;
                                   Format : integer ) ;

                               { Return true if our substring matches the passed
                                 substring.  W_Start indicates the start
                                 position and _Length is the length of the
                                 substring.  Name_Start indicates the start in the
                                 compared string }
                               function Compare( Wildcard_Start, _Length : integer ;
                                   Match : TUnicode_String ; Name_Start : integer ;
                                   Wildcard : boolean ) : boolean ;

                               // Create...
                               function Copy( Start, Len : integer ) : TUnicode_String ;

                               // Return true if our contents are equal to the match
                               function Equal( Match : TUnicode_String ;
                                   Wildcard : boolean ) : boolean ;

                               // Insert character at given position
                               procedure Insert( Position : integer ;
                                   Value : cardinal ) ;

                               // Convert our characters to lowercase...
                               procedure Lowercase ;

                               // Position of substring...
                               function Pos( const Value : string ;
                                   Start : integer = 1 ) : integer ; overload ;

                               function Pos( const Value : TUnicode_String ;
                                   Start : integer = 1 ) : integer ; overload ;

                               // Return rightmost instance of Value
                               function RPos( Value : char ) : integer ;

                           public // Properties...
                               property Length : integer
                                   read Get_Length
                                   write Set_Length ;
                       end ; // TUnicode_String

We could use a more generic Unicode string class that supported dynamically-sized strings, with all the myriad string utility functions that one would expect. However, we are trying to make the matching code work fast. As it is, the Unicode already adds a huge amount of overhead, so we make our class customized to our specific needs.
The first thing to notice is that the string is static. That is, instead of a dynamically-allocated chunk of memory for the string, the contents of the string are stored an array of 0 to 384 integers. Offset 0 is reserved for the current string length, which leaves us with a possible string size of 0 through 384 characters. But, you say, our filenames are a maximum of 256 bytes, so why 384 (4-byte) characters instead of 256? The reason has to do with Unicode, as you might have guessed. In ASCII, an uppercase to lowercase conversion is a simple one-to-one translation. We talked briefly about character folding earlier in the article - in some languages supported by Unicode, the lowercase equivalent of a single uppercase character can be 1, 2, or even 3 lowercase characters. The worst case scenario is where 2 bytes of UTF8 translates into 3 characters. To put it another way, if we had a 256 byte string consisting of this worst-case scenario, that would be equal to 128 Unicode characters, which translated to lowercase, would be three times that, or 384. So, we size the array to hold the largest possible number of Unicode characters that could be generated from 256 bytes of UTF8.
Because we deal with wildcards, we include a flag which the matching code will use. We caculate it and set it once instead of doing it multiple times.
That is all the instance data we need to support a Unicode string. Now to the methods.

First, we have our getter and setter for the length of the string.

function TUnicode_String.Get_Length : integer ;

begin
    Result := Contents[ 0 ] ;
end ;


procedure TUnicode_String.Set_Length( Value : Integer ) ;

begin
    Contents[ 0 ] := Value ;
end ;

As I said earlier, we store the length of the string in the 0th element of the Contents array. Normally, I would include a separate length variable in the instance data, but I am matching the implementation of normal Pascal static strings.
When Delphi/FPC construct a class, they zero the instance data, so our string has length 0 when it is constructed. Note that we don't check to see if the value being set is greater than our maximum length of 384. Not validating parameters is, generally speaking, bad coding. But since this is a local class that we do not expose outside of this unit, we save the time for the check since, in theory, our code would never try to set it to an invalid value.

function TUnicode_String.As_String : string ;

var Dummy, Loop : integer ;

begin
    System.setlength( Result, Length ) ;
    for Loop := 1 to Length do
    begin
        Dummy := Contents[ Loop ] ;
        if( Dummy > 127 ) then
        begin
            Dummy := Dummy or 128 ;
        end ;
        Result[ Loop ] := chr( Dummy ) ;
    end ;
end ;

This method returns an ASCII string version of the contents. We can't return non-ASCII characters, of course. So we return ASCII values larger than 127 for non-ASCII values.

procedure TUnicode_String.Assign_From_String( const S : string ;
    Format : integer ) ;

var Index, Size, Mask : integer ;
    Value : cardinal ;

begin
    Index := 1 ; // Index in Spec
    Contents[ 0 ] := 0 ;
    if( Format = 3 ) then // UTF8
    begin
        while( Index <= system.length( S ) ) do
        begin
            Value := 0 ;
            if( S[ Index ] > #$FC ) then
            begin
                Size := 6 ;
                Mask := 1 ;
            end else
            if( S[ Index ] > #$F8 ) then
            begin
                Size := 5 ;
                Mask := 3 ;
            end else
            if( S[ Index ] > #$F0 ) then
            begin
                Size := 4 ;
                Mask := 7 ;
            end else
            if( S[ Index ] > #$E0 ) then
            begin
                Size := 3 ;
                Mask := $F ;
            end else
            if( S[ Index ] > #$C0 ) then
            begin
                Size := 2 ;
                Mask := $1F ;
            end else
            begin
                Size := 1 ;
                Mask := $7F ;
            end ;
            while( Size > 0 ) do
            begin
                dec( Size ) ;
                Value := Value or ( ord( S[ Index ] ) and Mask ) ;
                if( Size > 0 ) then
                begin
                    Value := Value shl 6 ;
                end ;
                Mask := $3F ;
                inc( Index ) ;
            end ;
            inc( Contents[ 0 ] ) ;
            Contents[ Contents[ 0 ] ] := Value ;
        end ; // while( Index < system.length( S ) )
    end else
    begin
        Value := 0 ;
        Index := 1 ; // Index in S
        while( Index <= system.length( S ) ) do
        begin
            move( PChar( S )[ Index - 1 ], Value, Format ) ;
            Index := Index + Format ;
            inc( Contents[ 0 ] ) ;
            Contents[ Contents[ 0 ] ] := Value ;
        end ;
    end ; // if( Format = 3 )
end ; // TUnicode_String.Assign_From_String

This method sets the class contents from a Pascal string. The format parameter indicates how the passed string is interpreted, as follows:

ParameterMeaning
1ASCII string (one byte per character)
2Unicode-16 string (two bytes per character)
3UTF-8 string
4Unicode-32 string (four bytes per character)

Note that in the case of format values 1, 2, and 4, the value is also the number of bytes per character in the passed string. Hence, the conversion between the string and the class is simply a matter of copying the appropriate number of bytes from the string into the internal array for each character. In the case of UTF-8 strings, the conversion is more complicated. We won't go into the details of decoding UTF-8. If you want more information on that, see the The Unicode Consortium.

function lowcase( V : cardinal ; var _Folding_Index : integer ) : cardinal ;

var L, H, Index : integer ;

begin
    Result := V ;
    if( ( V < $41 ) or ( V > $118BF ) ) then // Not within range of our table
    begin
        exit ;
    end ;
    L := 0 ;
    H := high( Foldings ) ;
    while( L < H ) do
    begin
        Index := L + ( ( H - L ) div 2 ) ;
        if( V = Foldings[ Index, 0 ] ) then
        begin
            if( Foldings[ Index, 2 ] > 0 ) then // Multiple outputs
            begin
                Result := 0 ;
                _Folding_Index := Index ;
                exit ;
            end ;
            Result := Foldings[ Index, 1 ] ;
            exit ;
        end else
        if( V < Foldings[ Index, 0 ] ) then
        begin
            H := Index ;
        end else
        begin
            L := Index + 1 ;
        end ;
        if( L >= H ) then
        begin
            exit ;
        end ;
    end ; // while( L < H )
end ; // lowcase


procedure TUnicode_String.Lowercase ;

var Dummy, V : integer ;
    _Folding_Index : integer ;

begin
    Dummy := 1 ;
    while( Dummy <= Length ) do
    begin
        V := lowcase( Contents[ Dummy ], _Folding_Index ) ;
        if( V = 0 ) then
        begin
            Contents[ Dummy ] := Foldings[ _Folding_Index, 1 ] ;
            for V := 2 to 3 do
            begin
                if( Foldings[ _Folding_Index, V ] <> 0 ) then
                begin
                    Insert( Dummy + 1, Foldings[ _Folding_Index, V ] ) ;
                    inc( Dummy ) ;
                end ;
            end ;
        end else
        begin
            Contents[ Dummy ] := V ;
        end ;
        inc( Dummy ) ;
    end ;
end ; // TUnicode_String.Lowercase

The Lowercase method converts the unicode string to lowercase. It simply loops through the characters, calling the lowcase function to convert to a lowercase equivalent. If Lowcase returns 0, it indicates that the current character folds into multiple lowercase characters. The second parameter (passed by reference) indicates which offset in the Foldings array is to be used. We then loop through that row in the array and insert any non-zero values.
The Lowcase function determines the lowercase equivalent of the passed character value. There are three possible results from this function:
  1. No lowercase equivalent - simply return the passed character.
  2. Character maps to a single lowercase character - return the lowercase character value.
  3. Character folds to multiple lowercase characters - return 0 and set the second parameter to the appropriate row.
To have a complete list of all possible Unicode characters would require an excessive amount of memory for our folding array. So, we include only the characters that have lowercase values. It would be too slow to read through the entire array to find matches, so Lowcase does a binary-search of the array. This is slower than a simple lookup in a huge array, but takes much much less memory for the table. I won't list the entire array here, but we include the first few entries (this includes all of the ASCII character values and a few others):
const Foldings : array[ 0..1320, 0..3 ] of cardinal =
      (
        ( $41, $61, 0, 0 ), // 65 "A"
        ( $42, $62, 0, 0 ),
        ( $43, $63, 0, 0 ),
        ( $44, $64, 0, 0 ),
        ( $45, $65, 0, 0 ),
        ( $46, $66, 0, 0 ),
        ( $47, $67, 0, 0 ),
        ( $48, $68, 0, 0 ),
        ( $49, $69, 0, 0 ),
        ( $4A, $6A, 0, 0 ),
        ( $4B, $6B, 0, 0 ),
        ( $4C, $6C, 0, 0 ),
        ( $4D, $6D, 0, 0 ),
        ( $4E, $6E, 0, 0 ),
        ( $4F, $6F, 0, 0 ),
        ( $50, $70, 0, 0 ),
        ( $51, $71, 0, 0 ),
        ( $52, $72, 0, 0 ),
        ( $53, $73, 0, 0 ),
        ( $54, $74, 0, 0 ),
        ( $55, $75, 0, 0 ),
        ( $56, $76, 0, 0 ),
        ( $57, $77, 0, 0 ),
        ( $58, $78, 0, 0 ),
        ( $59, $79, 0, 0 ),
        ( $5A, $7A, 0, 0 ), // "Z"
        ( $B5, $3BC, 0, 0 ),
        ( $C0, $E0, 0, 0 ),
        ( $C1, $E1, 0, 0 ),
        ( $C2, $E2, 0, 0 ),
        ( $C3, $E3, 0, 0 ),
        ( $C4, $E4, 0, 0 ),
        ( $C5, $E5, 0, 0 ),
        ( $C6, $E6, 0, 0 ),
        ( $C7, $E7, 0, 0 ),
        ( $C8, $E8, 0, 0 ),
        ( $C9, $E9, 0, 0 ),
        ( $CA, $EA, 0, 0 ),
        ( $CB, $EB, 0, 0 ),
        ( $CC, $EC, 0, 0 ),
        ( $CD, $ED, 0, 0 ),
        ( $CE, $EE, 0, 0 ),
        ( $CF, $EF, 0, 0 ),
        ( $D0, $F0, 0, 0 ),
        ( $D1, $F1, 0, 0 ),
        ( $D2, $F2, 0, 0 ),
        ( $D3, $F3, 0, 0 ),
        ( $D4, $F4, 0, 0 ),
        ( $D5, $F5, 0, 0 ),
        ( $D6, $F6, 0, 0 ),
        ( $D8, $F8, 0, 0 ),
        ( $D9, $F9, 0, 0 ),
        ( $DA, $FA, 0, 0 ),
        ( $DB, $FB, 0, 0 ),
        ( $DC, $FC, 0, 0 ),
        ( $DD, $FD, 0, 0 ),
        ( $DE, $FE, 0, 0 ),
        ( $DF, $73, $73, 0 ),

Each row in the array consists of 4 elements. The first item is the uppercase character that we are searching for. The next three items are the lowercase characters to fold into (a zero indicates a value to ignore). So, for example, the last item we list in the array indicates that a value of hex DF is converted into two characters of hex value 73.

function TUnicode_String.Copy( Start, Len : integer ) : TUnicode_String ;

begin
    // Setup...
    Result := TUnicode_String.Create ;
    if( Start > Length ) then
    begin
        exit ;
    end ;
    if( Start < 1 ) then
    begin
        Start := 1 ;
    end ;
    if( Start + Len - 1 > Length ) then
    begin
        Len := Length - Start + 1 ;
    end ;

    Result.Length := Len ;
    move( Contents[ Start ], Result.Contents[ 1 ], Len * sizeof( cardinal ) ) ;
end ; // TUnicode_String.Copy

The Copy method creates a new instance of the class and copies the specified substring into that new class.

function TUnicode_String.Compare( Wildcard_Start, _Length : integer ;
    Match : TUnicode_String ; Match_Start : integer ) : boolean ;

var Loop : integer ;

begin
    // Setup...
    Result := False ;
    if( _Length < 1 ) then
    begin
        exit ;
    end ;
    if( Wildcard_Start < 1 ) then
    begin
        Wildcard_Start := 1 ;
    end ;
    if( Match_Start < 1 ) then
    begin
        Match_Start := 1 ;
    end ;
    if( Wildcard_Start + _Length > Length + 1 ) then
    begin
        _Length := Length - Wildcard_Start + 1 ;
    end ;
    if( Match_Start + _Length > Match.Length + 1 ) then
    begin
        _Length := Match.Length - Match_Start + 1 ;
    end ;

    // Do comparison...
    for Loop := 0 to _Length - 1 do
    begin
        if( Contents[ Wildcard_Start + Loop ] <>
            Match.Contents[ Match_Start + Loop ] ) then
        begin
            if(
                ( Contents[ Wildcard_Start + Loop ] <> ord( '?' ) )
                and
                ( Match.Contents[ Wildcard_Start + Loop ] <> ord( '?' ) )
              ) then
            begin
                exit ;
            end ;
        end ;
    end ;
    Result := True ;
end ; // TUnicode_String.Compare


function TUnicode_String.Equal( Match : TUnicode_String ) : boolean ;

var Loop : integer ;

begin
    Result := False ;
    if( Length <> Match.Length ) then
    begin
        exit ;
    end ;
    for Loop := 1 to Length do
    begin
        if( ( Contents[ Loop ] <> Match.Contents[ Loop ] ) ) then
        begin
            if(
                ( Contents[ Loop ] <> ord( '?' ) )
                and
                ( Match.Contents[ Loop ] <> ord( '?' ) )
              ) then
            begin
                exit ;
            end ;
        end ;
    end ;
    Result := True ;
end ; // TUnicode_String.Equal

These two comparison routines are equivalent to the earlier examples we showed, but for use with our class. The algorithm is exactly the same.

procedure TUnicode_String.Insert( Position : integer ; Value : cardinal ) ;

begin
    move( Contents[ Position ], Contents[ Position + 1 ],
	    sizeof( Contents ) - sizeof( cardinal ) * ( Position + 1 ) ) ;
    Contents[ Position ] := Value ;
    inc( Contents[ 0 ] ) ;
end ;

The insert method is used in the Lowercase method during folding. It should be self-explanatory.

function TUnicode_String.Pos( const Value : TUnicode_String ;
    Start : integer = 1 ) : integer ;

var Dummy, Dummy1 : integer ;
    Found : boolean ;

begin
    Result := 0 ;
    if( Start > Length ) then
    begin
        exit ;
    end ;
    if( Value.Length > Length - Start + 1 ) then
    begin
        exit ; // Substring is longer than our contents
    end ;
    for Dummy := Start to Length - Value.Length + 1 do
    begin
        Found := True ;
        for Dummy1 := 1 to Value.Length do
        begin
            if(
                ( Value.Contents[ Dummy1 ] <> Contents[ Dummy1 + Dummy - 1 ] )
                and
                ( Contents[ Dummy1 + Dummy - 1 ] <> ord( '?' ) )
                and
                ( Value.Contents[ Dummy1 ] <> ord( '?' ) )
              ) then
            begin
                Found := False ;
                break ;
            end ;
        end ; // for Dummy1
        if( Found ) then
        begin
            Result := Dummy ;
            exit ;
        end ;
    end ; // for Dummy
end ; // TUnicode_String.Pos


function TUnicode_String.Pos( const Value : string ;
    Start : integer = 1 ) : integer ;

var Dummy, Dummy1 : integer ;
    Found : boolean ;

begin
    Result := 0 ;
    if( Start > Length ) then
    begin
        exit ;
    end ;
    if( System.Length( Value ) > Length - Start + 1 ) then
    begin
        exit ; // Substring is longer than our contents
    end ;
    for Dummy := Start to Length - system.length( Value ) + 1 do
    begin
        Found := True ;
        for Dummy1 := 1 to system.length( Value ) do
        begin
            if( ord( Value[ Dummy1 ] ) <> Contents[ Dummy1 + Dummy - 1 ] ) then
            begin
                Found := False ;
                break ;
            end ;
        end ; // for Dummy1
        if( Found ) then
        begin
            Result := Dummy ;
            exit ;
        end ;
    end ; // for Dummy
end ; // TUnicode_String.Pos


function TUnicode_String.RPos( Value : char ) : integer ;

var Loop, V : cardinal ;

begin
    V := ord( Value ) ;
    for Loop := Length downto 1 do
    begin
        if( Contents[ Loop ] = V ) then
        begin
            Result := Loop ;
            exit ;
        end ;
    end ;
    Result := 0 ;
end ;

These three position methods return the offset of a substring within our contents. RPos searches for the substring from the right to the left. The overloaded Pos methods search from left to right. The first Pos method searches for a TUnicode_String substring, while the other one searches for a Pascal substring. The second one doesn't match on question mark wildcards since we don't need it to for the cases where we use it (so why include the overhead?).

Wildcard_Match and WC_Match
You've seen a version of Wilcard_Match earlier - this is the same algorithm but uses our class instead of Pascal strings.

function Wildcard_Match( Wildcard, Name : TUnicode_String ) : boolean ;

var Dummy, Dummy1 : integer ;
    Wildcard_Start, Wildcard_End, Name_Start, Name_End : integer ;
    S : TUnicode_String ;

begin
    // Quick check and setup...
    if( Wildcard.As_String = '*' ) then
    begin
        Result := True ;
        exit ;
    end ;

    // Non-wildcard check...
    if( not Wildcard.Has_Asterisk ) then
    begin
        Result := Wildcard.Equal( Name ) ;
        exit ;
    end ;
    Result := False ; // Assume failure

    // Do wildcard match...
    Wildcard_Start := 1 ;
    Name_Start := 1 ;
    Wildcard_End := Wildcard.Length ;
    Name_End := Name.Length ;

    // Check prefix before first wildcard...
    Dummy := Wildcard.Pos( '*' ) ;
    if( Dummy > Wildcard_Start ) then // Something before the asterisk
    begin
        if( not Wildcard.Compare( Wildcard_Start, Dummy - Wildcard_Start, Name,
          Name_Start ) ) then
        begin
            exit ;
        end ;
        Wildcard_Start := Dummy ;
        Name_Start := Dummy ;
    end ;

    // Check suffix after last wildcard...
    Dummy := Wildcard.RPos( '*' ) ;
    if( Dummy < Wildcard.Length ) then
    begin
        if( not Wildcard.Compare( Dummy + 1, Wildcard.Length - Dummy, Name,
          Name.Length - ( Wildcard.Length - Dummy ) + 1 ) ) then
        begin
            exit ;
        end ;
        Wildcard_End := Dummy ;
        Name_End := Name.Length - ( Wildcard.Length - Dummy ) ;
    end ;

    // Check for remaining matches, left-to-right...
    while( Wildcard_Start <= Wildcard_End ) do
    begin
        if( Wildcard_Start >= Wildcard_End ) then
        begin
            break ; // All that's left in the wildcard spec is an asterisk - we match
        end ;
        Dummy := Wildcard.Pos( '*', Wildcard_Start + 1 ) ;
        S := Wildcard.Copy( Wildcard_Start + 1, Dummy - Wildcard_Start - 1 ) ;
        Dummy1 := Name.Pos( S, Name_Start ) ;
        S.Free ;
        if( Dummy1 = 0 ) then
        begin
            exit ; // Not found
        end ;

        // Move past wildcard and matching characters...
        Name_Start := Dummy1 + Dummy - Wildcard_Start - 1 ;
        Wildcard_Start := Dummy ;
    end ; // while

    Result := True ;
end ; // Wildcard_Match

Note that the Wildcard match takes two Unicode string classes. It is assumed that the strings are normalized and converted from UTF8. For this reason, we use a WC_Match function which, in turn, calls the Wildcard_Match. Actually, there are three functions. WC_Match calls Create_WC_Context to prepare the wildcard specification and then calls Context_WC_Match, which prepares the Name and calls Wildcard_match. The reason why we do this will be explained momentarily. Here are the functions:
function Context_WC_Match( _Spec, _Spec_Extension : TUnicode_String ;
    Name : string ; Format : integer ) : boolean ;

var Dummy : integer ;
    _Name, _Name_Extension : TUnicode_String ;

begin
    // Setup...
    Result := False ; // Assume no match
    if( ( Format < 1 ) or ( Format > 4 ) ) then
    begin
        Format := 1 ;
    end ;
    Name := Edit( Name, 16 ) ;

    // Create the Unicode string instances...
    _Name := TUnicode_String.Create ;
    try
        _Name.Assign_From_String( Name, Format ) ;

        // Normalize to lowercase...
        _Name.Lowercase ;

        // Parse out the extensions...
        Dummy := _Name.RPos( '.' ) ;
        if( Dummy = 0 ) then
        begin
            Dummy := _Name.Length + 1 ;
        end ;
        _Name_Extension := _Name.Copy( Dummy + 1, 1024 ) ;
        _Name.Length := Dummy - 1 ;
        _Name.Has_Asterisk := ( _Name.Pos( '*' ) > 0 ) ;
        _Name_Extension.Has_Asterisk := ( _Name_Extension.Pos( '*' ) > 0 ) ;

        // Check for extension...
        try
            if( not Wildcard_Match( _Spec_Extension, _Name_Extension ) ) then
            begin
                exit ; // Extension doesn't match - exit with false
            end ;
            Result := Wildcard_Match( _Spec, _Name ) ;
        finally
            _Name_Extension.Free ;
        end ;
    finally
        _Name.Free ;
    end ;
end ; // Context_WC_Match


function Create_WC_Context( Spec : string ; Format : integer ;
    var Extension : TUnicode_String ) : TUnicode_String ;

var Dummy : integer ;

begin
    // Setup...
    if( ( Format < 1 ) or ( Format > 4 ) ) then
    begin
        Format := 1 ;
    end ;
    Spec := Edit( Spec, 16 ) ;

    // Pre-normalize the specification...
    Dummy := pos( '**', Spec ) ;
    while( Dummy > 0 ) do
    begin
        delete( Spec, Dummy, 1 ) ;
        Dummy := pos( '**', Spec ) ;
    end ;
    Dummy := pos( '*?', Spec ) ;
    while( Dummy > 0 ) do
    begin
        Spec := copy( Spec, 1, Dummy - 1 ) + '?*' +
            copy( Spec, Dummy + 2, length( Spec ) ) ;
        Dummy := pos( '*?', Spec ) ;
    end ;
    if( pos( '.', Spec ) = 0 ) then // No explicit extension
    begin
        Spec := Spec + '.*' ;
    end ;

    // Create the Unicode string instances...
    Result := TUnicode_String.Create ;
    Result.Assign_From_String( Spec, Format ) ;

    // Normalize to lowercase...
    Result.Lowercase ;

    // Parse out the extensions...
    Dummy := Result.RPos( '.' ) ;
    if( Dummy = 0 ) then
    begin
        Dummy := Result.Length + 1 ;
    end ;
    Extension := Result.Copy( Dummy + 1, 1024 ) ;
    Result.Length := Dummy - 1 ;
    Result.Has_Asterisk := ( Result.Pos( '*' ) > 0 ) ;
    Extension.Has_Asterisk := ( Extension.Pos( '*' ) > 0 ) ;
end ; // Create_WC_Context


function WC_Match( Spec, Name : string ; Format : integer ) : boolean ;

var _Spec, _Spec_Extension : TUnicode_String ;
    S : string ;

begin
    _Spec := Create_WC_Context( Spec, Format, _Spec_Extension ) ;
    S := _Spec.As_String ;
    if( ( S = '*' ) or ( S = '*.*' ) ) then
    begin
        Result := True ; // Wildcard matches anything/everything
        exit ;
    end ;
    try
        Result := Context_WC_Match( _Spec, _Spec_Extension, Name, Format ) ;
    finally
        _Spec.Free ;
        _Spec_Extension.Free ;
    end ;
end ; // WC_Match

The first two routines prepare their respective strings in (mostly) the same way, with these steps:
  1. Reduce multiple contiguous spaces to a single space (The Edit function does this). Note that we do not remove non-display characters such as control characters (those less than value 32). Since UOS doesn't allow them for any of its file systems, the file processor will handle that part of normalization.
  2. Normalize the wildcard (only applies to WC_Match), by reducing multiple contiguous asterisks to a single asterisk, making sure there is an extension and making sure that an asterisk/question mark combo is ordered with the question mark first (this ensures that the Wildcard_Match works properly and the order is irrelevant to the logical match)
  3. Convert the strings to TUnicode_String objects, which also decodes the UTF8 as a side-effect.
  4. Convert the string to lowercase.
  5. Extract the extension portion of the specification/name into a separate Unicode_String instance.
  6. Make sure the Has_Asterisk value is set properly. This is done once during the preparation of the string rather than looking it up multiple times.

Back to our File System class
The preceeding code supports the Find_File and Lookup_Directory methods, but there is one more File System method that makes use of the wildcard matching. When a wildcard is used in a file specification, there is a possibility that multiple file matches may be found. In the most extreme case, the wildcard could match every file on the disk. Rather than returning a list of all matches, which could be quite long, most operating systems support a one-at-a-time wildcard lookup. The initial call sets up the lookup operation and returns the first match. Subsequent calls find the next match after the one returned by the previous match. This continues until no more matches are found. Thus, the calling code loops through the results until it is done. To support multiple lookups happening at the same time (multiple users or tasks), each lookup operation is represented by a lookup context. The calls to the lookup function pass this context.
Lookup is an API function that provides the file lookup ability to the end user via the File Processor.

function TUOS_Native_File_System.Lookup( Spec : PChar ; var Context : pointer ) : PChar ;

var _Context : TLookup_Context ;
    Dir : string ;
    Dummy : integer ;
    Directory : TUOS_Native_File ;
    Filespec : string ;
    H : TUOS_File_Header ;
    S : string ;

begin
    Result := nil ;
    if( Context = nil ) then
    begin
        // Parse specification...
        S := Spec ;
        if( S = '' ) then
        begin
            exit ; // No context and no file spec
        end ;
        Dummy := length( S ) ;
        while( ( Dummy > 0 ) and ( S[ Dummy ] <> '\' ) ) do
        begin
            dec( Dummy ) ;
        end ;
        Dir := copy( S, 1, Dummy ) ;
        S := copy( S, Dummy + 1, length( S ) + 1 ) ;
        FileSpec := S ;
        Directory := Lookup_Directory( Dir ) ;
        if( Directory = nil ) then
        begin
            exit ; // Directory not found
        end ;

        // Create context...
        _Context := TLookup_Context.Create ;
        _Context.Spec := S ;
        _Context.Dir := Dir ;
        _Context.Index := 0 ; // Next file to read is index 0
        _Context.Directory := Directory ;
        _Context._Spec := Create_WC_Context( S, 3, _Context._Spec_Extension ) ;
        Context := _Context ;
    end ; // if( Context = nil )

    _Context := TLookup_Context( Context ) ;
    if( Lookup_Contexts.IndexOf( _Context ) = -1 ) then // Not a valid context
    begin
        exit ;
    end ;
    while( true ) do
    begin
        inc( _Context.Index ) ;
        if( _Context.Directory.Read( 0, ( _Context.Index - 1 ) * sizeof( H ),
	  sizeof( H ), H ) = 0 ) then
        begin
            // No more entries in this directory...
            _Context.Free ;
            Context := nil ;
            exit ;
        end ;
        if( ( H.Name <> 0 ) and ( ( H.Flags and FAF_DELETED ) = 0 ) ) then 
        begin // Not a null entry or deleted file
            S := Get_String( H.Name ) ;
            if( Context_WC_Match( _Context._Spec, _Context._Spec_Extension, S, 3 ) ) then
            begin
                Temp := S ;
                Result := PChar( Temp ) ;
                exit ;
            end ;
        end ; // if( H.Name <> 0 )
    end ;  // while( true )
end ; // TUOS_Native_File_System.Lookup

The context parameter is passed by reference because the lookup function modifies it under two conditions: if nil is passed, this is considered the first call of the lookup operation and a new context object is constructed and returned via the parameter. When the lookup operation reaches the end of matching file names, the context is destructed and the parameter is set to nil to indicate the end of the operation. Note that each file system will have its own context object and they are not compatible with each other. From the standpoint of the caller, this is just a pointer value with no meaning except to the file system to which it belongs. Our context contains the index of the last matching entry in the directory. The function simply marches through the directory entries until a match is found or we reach the end of the directory. Note that any file with the deleted flag set is treated as if it isn't there since it is logically deleted even if it hasn't be physically deleted yet. We use the WC_Match functions, described above, to do the actual wildcard matching. Note the check against the Lookup_Contexts list to make sure that the passed context truly is one of ours.
Now the reason for having the three WC_Match functions is now made apparent. When constructing the context, Create_WC_Context is called to set up the specification string. This means the subsequent calls do not need to do the work of normalizing and converting the specification, since we've already done that. We save the specification (and extension) in the context and use it in calls to Context_WC_Match. WC_Match is used by other methods of the file system which are looking up a single file and don't need to keep the converted file name or specification after the call.

Finally, here is the TLookup_Context class and related function, used by our File System class:

var _Lookup_Contexts : TList = nil ;

function Lookup_Contexts : TList ;

begin
    if( _Lookup_Contexts = nil ) then
    begin
        _Lookup_Contexts := TList.Create ;
    end ;
    Result := _Lookup_Contexts ;
end ;


type TLookup_Context = class
                           public // Constructors and destructors...
                               constructor Create ;
                               destructor Destroy ; override ;

                           public // Instance data...
                               Dir, Spec : string ;
                               _Spec, _Spec_Extension : TUnicode_String ;
                               Directory : TUOS_Native_File ;
                               Index : longint ;
                       end ;


// Constructors and destructors...

constructor TLookup_Context.Create ;

begin
    inherited Create ;

    Lookup_Contexts.Add( self ) ;
end ;


destructor TLookup_Context.Destroy ;

begin
    _Spec.Free ;
    _Spec := nil ;
    _Spec_Extension.Free ;
    _Spec_Extension := nil ;
    Lookup_Contexts.Remove( self ) ;

    inherited Destroy ;
end ;

The Lookup_Contexts function provides access to the _Lookup_Contexts list, deferring the creation until it is needed. The whole point of the list is to keep track of lookup contexts that we create so we can validate any contexts passed to the Lookup function. The class is quite simple: just some data, a constructor that adds the instance to the list and a destructor that removes it.

I ran some performance tests comparing the speed of the wildcard match using the Pascal string and using our TUnicode_String class. So, how does the overhead of the Unicode handling affect the speed of our comparisons? It is hard to say since different operations have different comparative speeds. Some operations were faster with our Unicode codes and some were faster with the Pascal string code. It is hard to predict what mix of operations would occur during normal UOS operation. In fact, it is likely that in an average environment, the mix would differ from time to time. So, in my test, I ran a number of what I thought were fairly "typical" operations in a loop - the exact same set on both types of routines. For what its worth, the Unicode tests was almost exactly one-half the speed of the non-Unicode test. I would like it to be faster, but considering the extra work involved and the tradeoffs we've discussed, I'm not entirely displeased.

I apologize for the length of this article. In the next (and shorter!) article, we will address other support methods for our file system class.