1 Introduction
2 Ground Rules

Building a File System
3 File Systems
4 File Content Data Structure
5 Allocation Cluster Manager
6 Exceptions and Emancipation
7 Base Classes, Testing, and More
8 File Meta Data
9 Native File Class
10 Our File System
11 Allocation Table
12 File System Support Code
13 Initializing the File System
14 Contiguous Files
15 Rebuilding the File System
16 Native File System Support Methods
17 Lookups, Wildcards, and Unicode, Oh My
18 Finishing the File System Class

The Init Program
19 Hardware Abstraction and UOS Architecture
20 Init Command Mode
21 Using Our File System
22 Hardware and Device Lists
23 Fun with Stores: Partitions
24 Fun with Stores: RAID
25 Fun with Stores: RAM Disks
26 Init wrap-up

The Executive
27 Overview of The Executive
28 Starting the Kernel
29 The Kernel
30 Making a Store Bootable
31 The MMC
32 The HMC
33 Loading the components
34 Using the File Processor
35 Symbols and the SSC
36 The File Processor and Device Management
37 The File Processor and File System Management
38 Finishing Executive Startup

Users and Security
39 Introduction to Users and Security
40 More Fun With Stores: File Heaps
41 File Heaps, part 2
42 SysUAF
43 TUser
44 SysUAF API

Terminal I/O
45 Shells and UCL
46 UOS API, the Application Side
47 UOS API, the Executive Side
48 I/O Devices
49 Streams
50 Terminal Output Filters
51 The TTerminal Class
52 Handles
53 Putting it All Together
54 Getting Terminal Input
55 QIO
56 Cooking Terminal Input
57 Putting it all together, part 2
58 Quotas and I/O

UCL
59 UCL Basics
60 Symbol Substitution
61 Command execution
62 Command execution, part 2
63 Command Abbreviation
64 ASTs
65 Expressions, Part 1
66 Expressions, Part 2: Support code
67 Expressions, part 3: Parsing
68 SYS_GETJPIW and SYS_TRNLNM
69 Expressions, part 4: Evaluation

UCL Lexical Functions
70 PROCESS_SCAN
71 PROCESS_SCAN, Part 2
72 TProcess updates
73 Unicode revisted
74 Lexical functions: F$CONTEXT
75 Lexical functions: F$PID
76 Lexical Functions: F$CUNITS
77 Lexical Functions: F$CVSI and F$CVUI
78 UOS Date and Time Formatting
79 Lexical Functions: F$CVTIME
80 LIB_CVTIME
81 Date/Time Contexts
82 SYS_GETTIM, LIB_Get_Timestamp, SYS_ASCTIM, and LIB_SYS_ASCTIM
83 Lexical Functions: F$DELTA_TIME
84 Lexical functions: F$DEVICE
85 SYS_DEVICE_SCAN
86 Lexical functions: F$DIRECTORY
87 Lexical functions: F$EDIT and F$ELEMENT
88 Lexical functions: F$ENVIRONMENT
89 SYS_GETUAI
90 Lexical functions: F$EXTRACT and F$IDENTIFIER
91 LIB_FAO and LIB_FAOL
92 LIB_FAO and LIB_FAOL, part 2
93 Lexical functions: F$FAO
94 File Processing Structures
95 Lexical functions: F$FILE_ATTRIBUTES
96 SYS_DISPLAY
97 Lexical functions: F$GETDVI
98 Parse_GetDVI
99 GetDVI
100 GetDVI, part 2
101 GetDVI, part 3
102 Lexical functions: F$GETJPI
103 GETJPI
104 Lexical functions: F$GETSYI
105 GETSYI
106 Lexical functions: F$INTEGER, F$LENGTH, F$LOCATE, and F$MATCH_WILD
107 Lexical function: F$PARSE
108 FILESCAN
109 SYS_PARSE
110 Lexical Functions: F$MODE, F$PRIVILEGE, and F$PROCESS
111 File Lookup Service
112 Lexical Functions: F$SEARCH
113 SYS_SEARCH
114 F$SETPRV and SYS_SETPRV
115 Lexical Functions: F$STRING, F$TIME, and F$TYPE
116 More on symbols
117 Lexical Functions: F$TRNLNM
118 SYS_TRNLNM, Part 2
119 Lexical functions: F$UNIQUE, F$USER, and F$VERIFY
120 Lexical functions: F$MESSAGE
121 TUOS_File_Wrapper
122 OPEN, CLOSE, and READ system services

UCL Commands
123 WRITE
124 Symbol assignment
125 The @ command
126 @ and EXIT
127 CRELNT system service
128 DELLNT system service
129 IF...THEN...ELSE
130 Comments, labels, and GOTO
131 GOSUB and RETURN
132 CALL, SUBROUTINE, and ENDSUBROUTINE
133 ON, SET {NO}ON, and error handling
134 INQUIRE
135 SYS_WRITE Service
136 OPEN
137 CLOSE
138 DELLNM system service
139 READ
140 Command Recall
141 RECALL
142 RUN
143 LIB_RUN
144 The Data Stream Interface
145 Preparing for execution
146 EOJ and LOGOUT
147 SYS_DELPROC and LIB_GET_FOREIGN

CUSPs and utilities
148 The I/O Queue
149 Timers
150 Logging in, part one
151 Logging in, part 2
152 System configuration
153 SET NODE utility
154 UUI
155 SETTERM utility
156 SETTERM utility, part 2
157 SETTERM utility, part 3
158 AUTHORIZE utility
159 AUTHORIZE utility, UI
160 AUTHORIZE utility, Access Restrictions
161 AUTHORIZE utility, Part 4
162 AUTHORIZE utility, Reporting
163 AUTHORIZE utility, Part 6
164 Authentication
165 Hashlib
166 Authenticate, Part 7
167 Logging in, part 3
168 DAY_OF_WEEK, CVT_FROM_INTERNAL_TIME, and SPAWN
169 DAY_OF_WEEK and CVT_FROM_INTERNAL_TIME
170 LIB_SPAWN
171 CREPRC
172 CREPRC, Part 2
173 COPY
174 COPY, part 2
175 COPY, part 3
176 COPY, part 4
177 LIB_Get_Default_File_Protection and LIB_Substitute_Wildcards
178 CREATESTREAM, STREAMNAME, and Set_Contiguous
179 Help Files
180 LBR Services
181 LBR Services, Part 2
182 LIBRARY utility
183 LIBRARY utility, Part 2
184 FS Services
185 FS Services, Part 2
186 Implementing Help
187 HELP
188 HELP, Part 2
189 DMG_Get_Key and LIB_Put_Formatted_Output
190 LIBRARY utility, Part 3
191 Shutting Down UOS
192 SHUTDOWN
193 WAIT
194 SETIMR
195 WAITFR and Scheduling
196 REPLY, OPCOM, and Mailboxes
197 REPLY utility
198 Mailboxes
199 BRKTHRU
200 OPCOM
201 Mailbox Services
202 Mailboxes, Part 2
203 DEFINE
204 CRELNM
205 DISABLE
206 STOP
207 OPCCRASH and SHUTDOWN
208 APPEND

Glossary/Index


Downloads

Unicode revisted
We discussed unicode in depth back in article 17. We introduced the TUnicode_String class that we used for file processing. The strings are static with a fixed maximum length of 384 Unicode UTF32 characters. While this works fine for our needs up to this point, and is faster than using a dynamic-length string, we need something of more general utility for future uses. So, we've renamed the TUnicode_String class to TStatic_Unicode_String, and created a new class named TUnicode_String. This new class has the same methods (plus a couple extra that we'll cover later in the article). The main difference is that the Contents array is dynamic in this class - it is resized as necessary.

There are different ways we could have handled the new class - including making the new one a descendent of the previous one, or having them both descend from a common ancestor. However, virtualizing and generalizing to make this happen would result in additional overhead. For the file system, we're concerned about performance, so we will leave the old static string class the way it is and the file processing will continue to make use of it.

Here is the new TUnicode_String class (the new Compare method is described later). We shan't describe the code in any detail as it is almost identical to the old class, with the addition of some dynamic array handling - except for the new methods.

type TUnicode_String = class
                           public // Constructors and destructors...
                               constructor Create ;
                               destructor Destroy ; override ;

                           private // Instance data...
                               Has_Asterisk : boolean ;
                               Contents : array of cardinal ;

                           protected // Property handlers...
                               // Return length of our contents...
                               function Get_Length : integer ;
                               procedure Set_Length( Value : integer ) ;

                           public // API...
                               procedure Append( S : TUnicode_String ) ;
                                   overload ;

                               procedure Append( S : string ) ;
                                   overload ;

                               procedure Append( S : cardinal ) ;
                                   overload ;

                               function As_String : string ;

                               // Assign our contents from a UTF8 string...
                               procedure Assign_From_String( const S : string ;
                                   Format : integer ) ;

                               // Remove Length characters starting at Index
                               procedure Delete( Index, Len : integer ) ;

                               { Compare our substring with the passed
                                 substring.  W_Start indicates the start
                                 position and _Length is the length of the
                                 substring.  Name_Start indicates the start in the
                                 compared string.  Result:
                                     -1 = Less than Match
                                     0 = Equals Match
                                     1 = Greater than Match }
                               function Compare( Wildcard_Start, _Length : integer ;
                                   Match : TUnicode_String ;
                                   Match_Start : integer ) : integer ;

                               // Create...
                               function Copy( Start, Len : integer ) : TUnicode_String ;

                               // Return edited string...
                               function Edit( Options, Escape : cardinal ) : TUnicode_String ;

                               // Return true if our contents are equal to the match
                               function Equal( Match : TUnicode_String ) : boolean ;

                               // Return character at specific index
                               function Get_Char( Index : integer ) : cardinal ;

                               // Insert character at given position
                               procedure Insert( Position : integer ;
                                   Value : cardinal ) ;

                               // Convert our characters to lowercase...
                               procedure Lowercase ;

                               // Position of substring...
                               function Pos( const Value : string ;
                                   Start : integer = 1 ) : integer ; overload ;

                               function Pos( const Value : TUnicode_String ;
                                   Start : integer = 1 ) : integer ; overload ;

                               // Return rightmost instance of Value
                               function RPos( Value : char ) : integer ;

                               // Return Pos, considering wildcards
                               function Wildcard_Pos( Value : TUnicode_String ;
                                   Start : integer = 1 ) : integer ;

                           public // Properties...
                               property Length : integer
                                   read Get_Length
                                   write Set_Length ;
                       end ; // TUnicode_String

And here are the updated methods.

// TUnicode_String methods...

// Constructors and destructors...

constructor TUnicode_String.Create ;

begin
    inherited Create ;

    setlength( Contents, 1 ) ;
    Contents[ 0 ] := 0 ;
end ;


destructor TUnicode_String.Destroy ;

begin
    setlength( Contents, 0 ) ;

    inherited Destroy ;
end ;


// API...

function TUnicode_String.As_String : string ;

var Dummy, Loop : integer ;

begin
    System.setlength( Result, Length ) ;
    for Loop := 1 to Length do
    begin
        Dummy := Contents[ Loop ] ;
        if( Dummy > 127 ) then
        begin
            Dummy := Dummy or 128 ;
        end ;
        Result[ Loop ] := chr( Dummy ) ;
    end ;
end ;


procedure TUnicode_String.Assign_From_String( const S : string ;
    Format : integer ) ;

var Index, Size, Mask : integer ;
    Value : cardinal ;

begin
    Index := 1 ; // Index in Spec
    Contents[ 0 ] := 0 ;
    if( Format = ST_UTF8 ) then // UTF8
    begin
        while( Index <= system.length( S ) ) do
        begin
            Value := 0 ;
            if( S[ Index ] > #$FC ) then
            begin
                Size := 6 ;
                Mask := 1 ;
            end else
            if( S[ Index ] > #$F8 ) then
            begin
                Size := 5 ;
                Mask := 3 ;
            end else
            if( S[ Index ] > #$F0 ) then
            begin
                Size := 4 ;
                Mask := 7 ;
            end else
            if( S[ Index ] > #$E0 ) then
            begin
                Size := 3 ;
                Mask := $F ;
            end else
            if( S[ Index ] > #$C0 ) then
            begin
                Size := 2 ;
                Mask := $1F ;
            end else
            begin
                Size := 1 ;
                Mask := $7F ;
            end ;
            while( Size > 0 ) do
            begin
                dec( Size ) ;
                Value := Value or ( ord( S[ Index ] ) and Mask ) ;
                if( Size > 0 ) then
                begin
                    Value := Value shl 6 ;
                end ;
                Mask := $3F ;
                inc( Index ) ;
            end ;
            inc( Contents[ 0 ] ) ;
            setlength( Contents, Contents[ 0 ] + 1 ) ;
            Contents[ Contents[ 0 ] ] := Value ;
        end ; // while( Index < system.length( S ) )
    end else
    begin
        setlength( Contents, system.Length( S ) div Format + 1 ) ;
        Value := 0 ;
        Index := 1 ; // Index in S
        while( Index <= system.length( S ) ) do
        begin
            move( PChar( S )[ Index - 1 ], Value, Format ) ;
            Index := Index + Format ;
            inc( Contents[ 0 ] ) ;
            Contents[ Contents[ 0 ] ] := Value ;
        end ;
    end ; // if( Format = ST_UTF8 )
end ; // TUnicode_String.Assign_From_String


function TUnicode_String.Get_Length : integer ;

begin
    Result := Contents[ 0 ] ;
end ;


procedure TUnicode_String.Set_Length( Value : Integer ) ;

begin
    Contents[ 0 ] := Value ;
    setlength( Contents, Value + 1 ) ;
end ;


function TUnicode_String.Copy( Start, Len : integer ) : TUnicode_String ;

begin
    // Setup...
    Result := TUnicode_String.Create ;
    if( Start > Length ) then
    begin
        exit ;
    end ;
    if( Start < 1 ) then
    begin
        Start := 1 ;
    end ;
    if( Start + Len - 1 > Length ) then
    begin
        Len := Length - Start + 1 ;
    end ;

    Result.Length := Len ;
    move( Contents[ Start ], Result.Contents[ 1 ], Len * sizeof( cardinal ) ) ;
end ; // TUnicode_String.Copy


// Remove Length characters starting at Index
procedure TUnicode_String.Delete( Index, Len : integer ) ;

begin
    if( Index < 1 ) then
    begin
        Index := 1 ;
    end ;
    if( Index > Length ) then
    begin
        exit ;
    end ;
    if( Index + Len > Length ) then
    begin
        Length := Index - 1 ;
        exit ;
    end ;
    move( Contents[ Index + Len ], Contents[ Index ], Len ) ;
    Length := Length - Len ;
end ; // TUnicode_String.Delete


function TUnicode_String.Equal( Match : TUnicode_String ) : boolean ;

var Loop : integer ;

begin
    Result := False ;
    if( Length <> Match.Length ) then
    begin
        exit ;
    end ;
    for Loop := 1 to Length do
    begin
        if( ( Contents[ Loop ] <> Match.Contents[ Loop ] ) ) then
        begin
            if(
                ( Contents[ Loop ] <> ord( '?' ) )
                and
                ( Match.Contents[ Loop ] <> ord( '?' ) )
              ) then
            begin
                exit ;
            end ;
        end ;
    end ;
    Result := True ;
end ; // TUnicode_String.Equal


procedure TUnicode_String.Insert( Position : integer ; Value : cardinal ) ;

begin
    setlength( Contents, system.length( Contents ) + 1 ) ;
    move( Contents[ Position ], Contents[ Position + 1 ], 
        ( system.Length( Contents ) - Position - 1 ) * sizeof( cardinal ) ) ;
    Contents[ Position ] := Value ;
    inc( Contents[ 0 ] ) ;
end ;


procedure TUnicode_String.Lowercase ;

var Dummy, V : integer ;
    _Folding_Index : integer ;

begin
    Dummy := 1 ;
    while( Dummy <= Length ) do
    begin
        V := lowcase( Contents[ Dummy ], _Folding_Index ) ;
        if( ( V = 0 ) and ( _Folding_Index >= 0 ) ) then
        begin
            Contents[ Dummy ] := Foldings[ _Folding_Index, 1 ] ;
            for V := 2 to 3 do
            begin
                if( Foldings[ _Folding_Index, V ] <> 0 ) then
                begin
                    inc( Dummy ) ;
                    Insert( Dummy, Foldings[ _Folding_Index, V ] ) ;
                end ;
            end ;
        end else
        begin
            Contents[ Dummy ] := V ;
        end ;
        inc( Dummy ) ;
    end ;
end ; // TUnicode_String.Lowercase


function TUnicode_String.Pos( const Value : TUnicode_String ;
    Start : integer = 1 ) : integer ;

var Dummy, Dummy1 : integer ;
    Found : boolean ;

begin
    Result := 0 ;
    if( Start > Length ) then
    begin
        exit ;
    end ;
    if( Value.Length > Length - Start + 1 ) then
    begin
        exit ; // Substring is longer than our contents
    end ;
    for Dummy := Start to Length - Value.Length + 1 do
    begin
        Found := True ;
        for Dummy1 := 1 to Value.Length do
        begin
            if(
                ( Value.Contents[ Dummy1 ] <> Contents[ Dummy1 + Dummy - 1 ] )
                and
                ( Contents[ Dummy1 + Dummy - 1 ] <> ord( '?' ) )
                and
                ( Value.Contents[ Dummy1 ] <> ord( '?' ) )
              ) then
            begin
                Found := False ;
                break ;
            end ;
        end ; // for Dummy1
        if( Found ) then
        begin
            Result := Dummy ;
            exit ;
        end ;
    end ; // for Dummy
end ; // TUnicode_String.Pos


function TUnicode_String.Pos( const Value : string ;
    Start : integer = 1 ) : integer ;

var Dummy, Dummy1 : integer ;
    Found : boolean ;

begin
    Result := 0 ;
    if( Start > Length ) then
    begin
        exit ;
    end ;
    if( System.Length( Value ) > Length - Start + 1 ) then
    begin
        exit ; // Substring is longer than our contents
    end ;
    for Dummy := Start to Length - system.length( Value ) + 1 do
    begin
        Found := True ;
        for Dummy1 := 1 to system.length( Value ) do
        begin
            if( ord( Value[ Dummy1 ] ) <> Contents[ Dummy1 + Dummy - 1 ] ) then
            begin
                Found := False ;
                break ;
            end ;
        end ; // for Dummy1
        if( Found ) then
        begin
            Result := Dummy ;
            exit ;
        end ;
    end ; // for Dummy
end ; // TUnicode_String.Pos


function TUnicode_String.RPos( Value : char ) : integer ;

var Loop, V : cardinal ;

begin
    V := ord( Value ) ;
    for Loop := Length downto 1 do
    begin
        if( Contents[ Loop ] = V ) then
        begin
            Result := Loop ;
            exit ;
        end ;
    end ;
    Result := 0 ;
end ;


function TUnicode_String.Wildcard_Pos( Value : TUnicode_String ;
    Start : integer = 1 ) : integer ;

var Dummy, Dummy1 : integer ;
    Found : boolean ;

begin
    Result := 0 ;
    if( Start > Length ) then
    begin
        exit ;
    end ;
    if( Value.Length > Length - Start + 1 ) then
    begin
        exit ; // Substring is longer than our contents
    end ;
    for Dummy := Start to Length - Value.Length + 1 do
    begin
        Found := True ;
        for Dummy1 := 1 to Value.Length do
        begin
            if(
                ( Value.Contents[ Dummy1 ] <> Contents[ Dummy1 + Dummy - 1 ] )
                and
                ( Contents[ Dummy1 + Dummy - 1 ] <> ord( '?' ) )
                and
                ( Value.Contents[ Dummy1 ] <> ord( '?' ) )
              ) then
            begin
                Found := False ;
                break ;
            end ;
        end ; // for Dummy1
        if( Found ) then
        begin
            Result := Dummy ;
            exit ;
        end ;
    end ; // for Dummy
end ; // TUnicode_String.Wildcard_Pos

Now, let's look at the new methods for this class.

procedure TUnicode_String.Append( S : TUnicode_String ) ;

var L : integer ;

begin
    if( S = nil ) then
    begin
        exit ;
    end ;
    L := Length + 1 ;
    Length := Length + S.Length ;
    move( S.Contents[ 1 ], Contents[ L ], S.Length * sizeof( cardinal ) ) ;
end ;


procedure TUnicode_String.Append( S : string ) ;

var I, L : integer ;

begin
    L := Length ;
    Length := Length + system.length( S ) ;
    for I := 1 to system.length( S ) do
    begin
        Contents[ L + I ] := ord( S[ I ] ) ;
    end ;
end ;


procedure TUnicode_String.Append( S : cardinal ) ;

begin
    Length := Length + 1 ;
    Contents[ Length ] := S ;
end ;
These methods are used to append another string to ourselves. This overloaded function has three versions: one takes a Pascal string, one takes a TUnicode_String, and one takes a single Unicode character value.

function TUnicode_String.Get_Char( Index : integer ) : cardinal ;

begin
    if( ( Index < 1 ) or ( Index > Length ) ) then
    begin
        Result := 0 ;
        exit ;
    end ;
    Result := Contents[ Index ] ;
end ;
This method simply provides a means of accessing the internal contents, one character at a time.

// Do a wildcard comparison...
function TUnicode_String.Compare( Wildcard_Start, _Length : integer ;
    Match : TUnicode_String ; Match_Start : integer ) : integer ;

var Loop : integer ;

begin
    // Setup...
    Result := 0 ;
    if( _Length < 1 ) then
    begin
        exit ;
    end ;
    if( Wildcard_Start < 1 ) then
    begin
        Wildcard_Start := 1 ;
    end ;
    if( Match_Start < 1 ) then
    begin
        Match_Start := 1 ;
    end ;
    if( Wildcard_Start + _Length > Length + 1 ) then
    begin
        _Length := Length - Wildcard_Start + 1 ;
    end ;
    if( Match_Start + _Length > Match.Length + 1 ) then
    begin
        _Length := Match.Length - Match_Start + 1 ;
    end ;

    // Do comparison...
    for Loop := 0 to _Length - 1 do
    begin
        if( Contents[ Wildcard_Start + Loop ] <>
            Match.Contents[ Match_Start + Loop ] ) then
        begin
            if(
                ( Contents[ Wildcard_Start + Loop ] <> ord( '?' ) )
                and
                ( Match.Contents[ Wildcard_Start + Loop ] <> ord( '?' ) )
              ) then
            begin
                if( Contents[ Wildcard_Start + Loop ] < Match.Contents[ Wildcard_Start + Loop ] ) then
                begin
                    Result := -1 ;
                end else
                begin
                    Result := 1 ;
                end ;
                exit ;
            end ;
        end ;
    end ;
end ; // TUnicode_String.Compare
This function is like the equivalent method in TStatic_Unicode_Sring. Except, unlike that method, this one doesn't return a boolean, but an integer value which indicates the following:
  • -1 = Our contents are less than the Match
  • 0 = Strings are equal
  • 1 = Our contents are greater than the Match
After the setup, we loop through the contents

Let's look at the Compare function which compares two Unicode strings.

// Compare strings.  Result: 0 = equal, -1 = L < R, 1 = L > R
function Compare( L, R : TUnicode_String ; Wildcard : boolean ) : integer ;

var Dummy, Dummy1, I, Len : integer ;
    LC, RC : cardinal ;
    _L, _R, S, Temp : TUnicode_String ;
    _Has_Asterisk : boolean ;
    L_Start, R_Start : integer ;
    L_End, R_End : integer ;

begin
    // Setup...
    Result := 0 ; // Assume equal
    _L := L.Copy( 1, L.Length ) ;
    _R := R.Copy( 1, R.Length ) ;
    _Has_Asterisk := False ;
First, we make copies of the two strings and set up for the rest of the function.
    // Pre-normalize the specification...
    if( Wildcard ) then
    begin
        Dummy := _R.pos( '**' ) ;
        while( Dummy > 0 ) do
        begin
            _R.Delete( Dummy, 1 ) ;
            Dummy := _R.pos( '**' ) ;
        end ;
        if( _R.As_String = '*' ) then
        begin
            exit ; // Wildcard matches anything/everything
        end ;
        Dummy := _R.pos( '*?' ) ;
        while( Dummy > 0 ) do
        begin
            Temp := _R.copy( Dummy + 2, _R.length ) ;
            _R.Length := Dummy ;
            _R.Append( '?*' ) ;
            _R.Append( Temp ) ;
            Temp.Free ;
            Dummy := _R.pos( '*?' ) ;
        end ;
        _Has_Asterisk := _R.pos( '*' ) > 0 ;
    end ; // if( Wildcard )
The Compare function can perform wildcard or normal comparisons. If the Wildcard parameter is true, we will do a wildcard comparison. In that case, we perform some normalization of the strings to simplify the following comparison code. For instance, we convert double asterisks to single asterisks, and switch all "?*" to "*?". If an asterisk is present, we set the Has_Asterisk flag.

    Len := _L.Length ;
    if( Len > _R.Length ) then
    begin
        Len := _R.Length ;
    end ;
Next, we determine the maximum number of characters to compare by minimizing the two lengths and setting Len to that value. Without this, we might try to index beyond the end of one of the strings as we loop through the data.

    // Non-wildcard check...
    if( not _Has_Asterisk ) then
    begin
        for I := 1 to Len do
        begin
            LC := _L.Contents[ I ] ;
            RC := _R.Contents[ I ] ;
            if( LC <> RC ) then
            begin
                if( Wildcard ) then
                begin
                    if( LC = ord( '?' ) ) or ( RC = ord( '?' ) ) then
                    begin
                        continue ;
                    end ;
                end ;
                if( LC < RC ) then
                begin
                    Result := -1 ;
                    exit ;
                end else
                begin
                    Result := 1 ;
                    exit ;
                end ;
            end ; // if( LC <> RC )
        end ; // for I := 1 to Len

        // If we get here, they are equal up to position Len...
        if( L.Length <> R.Length ) then
        begin
            if( L.Length > R.Length ) then
            begin
                Result := 1 ;
            end else
            begin
                Result := -1 ;
            end ;
        end ;
        exit ;
    end ; // if( not _Has_Asterisk )
If we don't have an asterisk (and a wildcard) then we have a straight-forward task. We loop through the contents, comparing each character. If a character doesn't match, we allow a match on either character being a "?". Otherwise, we set the result to -1 or 1, as appropriate and exit.
If we get through the entire contents (up to Len) with everything being equal, we still aren't done. If the lengths of the strings differ, we set the result appropriately. On otherwise equal strings, the longer one will be "greater".

    // Do wildcard match...
    R_Start := 1 ;
    L_Start := 1 ;
    R_End := R.Length ;
    L_End := L.Length ;

    // Check prefix before first wildcard...
    Dummy := _R.pos( '*' ) ;
    if( Dummy > R_Start ) then // Something before the asterisk
    begin
        Result := R.Compare( R_Start, Dummy - R_Start, L, L_Start ) ;
        if( Result <> 0 ) then
        begin
            exit ;
        end ;
        R_Start := Dummy ;
        L_Start := Dummy ;
    end ; // if( Dummy > R_Start )

    // Check suffix after last wildcard...
    Dummy := _R.RPos( '*' ) ;
    if( Dummy < _R.Length ) then
    begin
        Result := R.Compare( Dummy + 1, R.Length - Dummy, L, L.Length - ( R.Length - Dummy ) + 1 ) ;
        if( Result <> 0 ) then
        begin
            exit ;
        end ;
        R_End := Dummy ;
        L_End := L.Length - ( R.Length - Dummy ) ;
    end ; // if( Dummy < R.Length )

    // Check for remaining matches, left-to-right...
    while( R_Start <= R_End ) do
    begin
        if( R_Start >= R_End ) then
        begin
            break ; // All that's left in the wildcard spec is an asterisk - we match
        end ;
        Dummy := _R.Pos( '*', R_Start + 1 ) ;
        S := R.Copy( R_Start + 1, Dummy - R_Start - 1 ) ;
        Dummy1 := _L.Wildcard_Pos( S, L_Start ) ;
        S.Free ;
        if( Dummy1 = 0 ) then
        begin
            exit ; // Not found
        end ;

        // Move past wildcard and matching characters...
        L_Start := Dummy1 + Dummy - R_Start - 1 ;
        R_Start := Dummy ; // Move past wildcard and matching characters
    end ; // while
end ; // Compare
We won't go over this code since it is almost identical to the _Compare function we discussed in article 17. The only differences are that it deals with the new TUnicode_String class and returns the -1/0/1 values rather than a boolean.

Another new function is the Edit function:

function TUnicode_String.Edit( Options, Escape : cardinal ) : TUnicode_String ;

var AH, AL : cardinal ;
    Dummy : integer ;
    Escaped : boolean ;
    ESI : integer ;
    _Folding_Index : integer ;
    Quote_Type : cardinal ;
    Last : integer ;
    Leading : boolean ;
    OK : boolean ;
    Space : boolean ;
    V, V2, V3 : integer ;

begin
    Result := TUnicode_String.Create ;

    // Quick check...
    if( Length = 0 ) then // No edits on zero-length strings
    begin
        Exit ;
    end ;

    // Normalize the Options...
    if ( Options And ( 1024 or 64 ) ) = 1024 or 64 then
    begin
        Options := Options And Not 1024 ;
    end ;
    // Disallow [] to {} if [] to ()

    if ( Options And 6144 ) = 6144 then
    begin
        Options := Options And Not 4096 ;
    end ;
    // Disallow () to {} if () to []

    if ( Options And 24576 ) = 24576 then
    begin
        Options := Options And Not 16384 ;
    end ;
    // Disallow {} to [] if {} to ()

    // Setup...
    Space := False ; // No spaces
    Last := 0 ;
    Quote_Type := 0 ; // Not within significant quotes
    ESI := 0 ;
    Leading := True ;
    Escaped := False ;

    // Process string...
    while( ESI < Length ) do
    begin
        inc( ESI ) ; // Increment source string pointer
        OK := True ; // This byte is OK - so far
        AH := Contents[ ESI ] ; // Current character
        if( Quote_Type = 0 ) then // No quotes
        begin
            AL := AH ; // Save original value
            if(
                ( AH <> ord( ' ' ) )
                and
                ( AH <> _HT )
              ) then
            begin
                Leading := False ;
            end ;
            if( ( Options and 2 ) = 2 ) then // Remove all spaces/tabs
            begin
                if(
                    ( AH = ord( ' ' ) )
                    or
                    ( AH = _HT )
                  ) then
                begin
                    OK := False ;
                end ;
            end ;
            if( ( Options and 4 ) = 4 ) then // Ignore special values?
            begin
                if( ( AH = _NUL ) or ( AH = _LF ) or ( AH = _FF ) or ( AH = _CR ) or ( AH = _DEL ) ) then
                begin
                    OK := False ;
                end ;
            end ;
            if( ( Options and 8 ) = 8 ) then // Ignore leading spaces
            begin
                if( Leading ) then
                begin
                    OK := False ;
                end ;
            end ;
            if( ( Options and 16 ) = 16 ) then // Reduce tabs/spaces to single space
            begin
                if(
                    ( AH = ord( ' ' ) )
                    or
                    ( AH = _HT )
                  ) then
                begin
                    if( Space ) then
                    begin
                        OK := False ;
                    end else
                    begin
                        Space := True ;
                        if( AH = _HT ) then
                        begin
                            AH := ord( ' ' ) ;
                        end ;
                    end ;
                end ;
            end ;
            if( OK and ( ( Options and $40000 ) <> 0 ) ) then
            begin
                OK := AH >= ord( ' ' ) ;
            end ;
            if( ( Options and 32 ) = 32 ) then // Lower to upper case
            begin
                V2 := 0 ;
                V3 := 0 ;
                if( ESI < Length ) then
                begin
                    V2 := Contents[ ESI + 1 ] ;
                end ;
                if( ESI + 1 < Length ) then
                begin
                    V3 := Contents[ ESI + 2 ] ;
                end ;
                AH := Upcase( AH, V2, V3, _Folding_Index ) ;
                ESI := ESI + _Folding_Index - 1 ;
            end ;
            if( ( AH = AL ) and ( ( Options and 512 ) = 512 ) ) then
            begin // Upper to lower case (and not already converted the other way)
                V := lowcase( AH, _Folding_Index ) ;
                if( ( V = 0 ) and ( _Folding_Index >= 0 ) ) then
                begin
                    AH := Foldings[ _Folding_Index, 1 ] ;
                    for V := 2 to 3 do
                    begin
                        if( Foldings[ _Folding_Index, V ] <> 0 ) then
                        begin
                            Result.Append( AH ) ;
                            AH := Foldings[ _Folding_Index, V ] ;
                            inc( Dummy ) ;
                        end ;
                    end ;
                end else
                begin
                    AH := V ;
                end ;
            end ;

            if( ( Options and 64 ) = 64 ) then // Convert [] to ()
            begin
                if( AL = ord( '[' ) ) then
                begin
                    AH := ord( '(' ) ;
                end else
                if( AL = ord( ']' ) ) then
                begin
                    AH := ord( ')' ) ;
                end ;
            end ;
            if( ( AH = AL ) and ( ( Options and 2048 ) = 2048 ) ) then // Convert () to []
            begin
                if( AL = ord( '(' ) ) then
                begin
                    AH := ord( '[' ) ;
                end else
                if( AL = ord( ')' ) ) then
                begin
                    AH := ord( ']' ) ;
                end ;
            end ;

            if( ( Options and 4096 ) = 4096 ) then // Convert () to braces
            begin
                if( AL = ord( '(' ) ) then
                begin
                    AH := ord( '{' ) ;
                end else
                if( AL = ord( ')' ) ) then
                begin
                    AH := ord( '}' ) ;
                end ;
            end ;
            if( ( AH = AL ) and ( ( Options and 8192 ) = 8192 ) ) then // Convert braces to ()
            begin
                if( AL = ord( '{' ) ) then
                begin
                    AH := ord( '(' ) ;
                end else
                if( AL = ord( '}' ) ) then
                begin
                    AH := ord( ')' ) ;
                end ;
            end ;

            if( ( Options and 1024 ) = 1024 ) then // Convert [] to braces
            begin
                if( ( AL = ord( '[' ) ) or ( AL = ord( ']' ) ) ) then
                begin
                    AH := ord( AL ) + 32 ;
                end ;
            end ;
            if( ( AH = AL ) and ( ( Options and 16384 ) = 16384 ) ) then // Convert braces to []
            begin
                if( ( AL = ord( '{' ) ) or ( AL = ord( '}' ) ) ) then
                begin
                    AH := ord( AL ) - 32 ;
                end ;
            end ;
        end ; // if( Quote_Type = 0 )

        if( OK ) then
        begin
            Result.Append( AH ) ; // Build result
            if( ( AH = ord( ' ' ) ) or ( AH = _HT ) ) then
            begin
                Space := True ;
            end ;
        end ;
        if( ( AH <> ord( ' ' ) ) and ( AH <> _HT ) ) then
        begin
            Space := False ;
            if( ( Options and 256 ) = 256 ) then // Allow no alter within quotes
            begin
                if(
                    ( ( AH = ord( '"' ) ) or ( AH = 39 ) )
                    and
                    ( not Escaped )
                  ) then
                begin
                    if( Quote_Type = 0 ) then // Not in quotes
                    begin
                        Quote_Type := AH ;
                    end else
                    if( Quote_Type = AH ) then
                    begin
                        Quote_Type := 0 ;
                    end ;
                end ;
            end ;
        end ; // if( ( AH <> ord( ' ' ) ) and ( AH <> _HT ) )
        if(
            ( AH <> ord( ' ' ) )
            and
            ( AH <> _HT )
          ) then
        begin
            Last := Result.Length ; // Last non-space character
        end ;
        if( Escape <> 0 ) then // Have an escape character
        begin
            if( AL = Escape ) then // This is an escape character
            begin
                if( ESI = 1 ) then // If first character, then always an escape
                begin
                    Escaped := True ;
                end else
                begin
                    Escaped := not Escaped ;
                    // If previous character was an escape, then it is escaping this character
                end ;
            end else
            begin
                Escaped := False ;
            end ; // if( AL = Escape )
        end ; // if( Escape <> #0 )
    end ; // while( ESI < Length )
    if( ( Options and 128 ) = 128 ) then // Trim following spaces
    begin
        if( Quote_Type = 0 ) then // Not within quotes
        begin
            Result.Length := Last ; // Last non-space character position
        end ;
    end ;
end ; // TUnicode_String.Edit
We won't cover this function line-by-line. Suffice it to say that this method returns a string which is a copy of our string, with certain textual transformations. The specific transformation(s) performed depends on the bitmask Options passed to the method:
BitMeaning
2Remove all white space (spaces and tabs).
4Remove all nulls, linefeeds, formfeeds, carriage returns, and DELs.
8Remove leading spaces/tabs.
16Reduce multiple white space (spaces and tabs) to a single space.
32Convert lower to upper case.
64Convert square parentheses to normal parentheses: [] to ()
128Remove all trailing white space.
256Leave characters within quotes (" or ') unmodified.
512Convert upper case to lower case.
1024Convert square parentheses to braces: [] to {}
2048Convert parentheses to square parentheses: () to []
4096Convert parentheses to braces: () to {}
8192Convert braces to parentheses {} to ()
16384Convert braces to square parentheses {} to []
The Escape parameter (if non-zero) can be treated as an "escape" code to mark the following quote as one that should not be treated as a quote in terms of option 256.

The function basically sets up for the processing, including removing some bits where they conflict with each other (such as 64 and 1024). Then we step through each character in the string, and move it (or the transformed value) to the result.

One might wonder why we don't have a different function for the different transformations. The reason is that we often want to perform multiple transformations on a given string and it is more efficient to have a single routine do all of the requested transformations in one fell swoop.

function upcase( V1, V2, V3 : cardinal ; var _Count : integer ) : cardinal ;

var L : integer ;

begin
    Result := V1 ;
    _Count := 1 ; // Indicates one character was translated
    if( ( V1 < $61 ) or ( V1 > $118DF ) ) then // Not within range of our table
    begin
        exit ;
    end ;
    for L := 0 to high( Foldings ) do
    begin
        if(
            ( Foldings[ L, 1 ] = V1 )
            and
            ( ( Foldings[ L, 2 ] = V2 ) or ( Foldings[ L, 2 ] = 0 ) )
            and
            ( ( Foldings[ L, 3 ] = V3 ) or ( Foldings[ L, 3 ] = 0 ) )
          ) then
        begin
            Result := Foldings[ L, 0 ] ;
            if( Foldings[ L, 3 ] <> 0 ) then
            begin
                _Count := 3 ;
            end else
            if( Foldings[ L, 2 ] <> 0 ) then
            begin
                _Count := 2 ;
            end ;
            exit ;
        end ;
    end ; // for L := 0 to high( Foldings )
end ; // upcase
The Edit method makes use of one new function. UpCase does the opposite of the LowCase function - it converts lowercase characters to upper case. The process is a little bit more complicated because going from upper to lower case involves a single input character. But the other way might require up to three input characters to generate a single output character. So we have to search the Foldings array for matching 1-3 lowercase characters. Since some conversions involve only one or two lowercase values, we have to take that into account. We return, via the _Count parameter, the number of input characters that were used to convert to a single upper case character.

function Lowercase( const S : string ) : string ;

begin
    Result := Edit( S, 512, 0 ) ;
end ;


function Edit( S : string ; Options, Escape : integer ) : string ;

var I : integer ;
    US, US1 : TUnicode_String ;

begin
    Result := S ;
    for I := 1 to length( S ) do
    begin
        if( S[ I ] > #127 ) then // Have UTF8
        begin
            US := TUnicode_String.Create ;
            US.Assign_From_String( S, ST_UTF8 ) ;
            US1 := US.Edit( Options, Escape ) ;
            US.Free ;
            Result := US1.As_String ;
            US1.Free ;
            exit ;
        end ;
    end ;
    Result := CommonUt.Edit( S, Options ) ;
end ;
Lastly, we have two functions that perform actions on UTF-8 strings without the calling code having to construct TUnicode_String instances solely to do them. Lowercase simply calls Edit with the lowercase option flag of 512.
Edit assumes a UTF-8 string. It scans the string for any character with the top bit set. Such are interpreted as a UTF-8 character and the function constructs a Unicode_String instance, assigns the string to it, calls the instance's Edit method, returns the Pascal string version of the result, and frees the instance. If there are no UTF-8 characters, we call a version of Edit that is in the CommonUt unit. We won't cover that function here. It is basically the same code as the Edit method, but operates on a normal ANSI string. It is a little more efficient in memory space and performance on these strings, not to mention that we don't need to construct a TUnicode_String instance and free it. It is possible that on a really long string, the cost of scanning the entire string to find an UTF-8 character exceeds the cost of constructing the instance and skipping the scan, but most of our use of this routine will be dealing with fairly short strings.

In the next article, now that we've laid the groundwork, we will begin our examination of UCL lexical functions.

 

Copyright © 2019 by Alan Conroy. This article may be copied in whole or in part as long as this copyright is included.