1 Introduction
2 Ground Rules
Building a File System
3 File Systems
4 File Content Data Structure
5 Allocation Cluster Manager
6 Exceptions and Emancipation
7 Base Classes, Testing, and More
8 File Meta Data
9 Native File Class
10 Our File System
11 Allocation Table
12 File System Support Code
13 Initializing the File System
14 Contiguous Files
15 Rebuilding the File System
16 Native File System Support Methods
17 Lookups, Wildcards, and Unicode, Oh My
18 Finishing the File System Class
The Init Program
19 Hardware Abstraction and UOS Architecture
20 Init Command Mode
21 Using Our File System
22 Hardware and Device Lists
23 Fun with Stores: Partitions
24 Fun with Stores: RAID
25 Fun with Stores: RAM Disks
26 Init wrap-up
The Executive
27 Overview of The Executive
28 Starting the Kernel
29 The Kernel
30 Making a Store Bootable
31 The MMC
32 The HMC
33 Loading the components
34 Using the File Processor
35 Symbols and the SSC
36 The File Processor and Device Management
37 The File Processor and File System Management
38 Finishing Executive Startup
Users and Security
39 Introduction to Users and Security
40 More Fun With Stores: File Heaps
41 File Heaps, part 2
42 SysUAF
43 TUser
44 SysUAF API
Terminal I/O
45 Shells and UCL
46 UOS API, the Application Side
47 UOS API, the Executive Side
48 I/O Devices
49 Streams
50 Terminal Output Filters
51 The TTerminal Class
52 Handles
53 Putting it All Together
54 Getting Terminal Input
55 QIO
56 Cooking Terminal Input
57 Putting it all together, part 2
58 Quotas and I/O
UCL
59 UCL Basics
60 Symbol Substitution
61 Command execution
62 Command execution, part 2
63 Command Abbreviation
64 ASTs
65 Expressions, Part 1
66 Expressions, Part 2: Support code
67 Expressions, part 3: Parsing
68 SYS_GETJPIW and SYS_TRNLNM
69 Expressions, part 4: Evaluation
UCL Lexical Functions
70 PROCESS_SCAN
71 PROCESS_SCAN, Part 2
72 TProcess updates
73 Unicode revisted
74 Lexical functions: F$CONTEXT
75 Lexical functions: F$PID
76 Lexical Functions: F$CUNITS
77 Lexical Functions: F$CVSI and F$CVUI
78 UOS Date and Time Formatting
79 Lexical Functions: F$CVTIME
80 LIB_CVTIME
81 Date/Time Contexts
82 SYS_GETTIM, LIB_Get_Timestamp, SYS_ASCTIM, and LIB_SYS_ASCTIM
83 Lexical Functions: F$DELTA_TIME
84 Lexical functions: F$DEVICE
85 SYS_DEVICE_SCAN
86 Lexical functions: F$DIRECTORY
87 Lexical functions: F$EDIT and F$ELEMENT
88 Lexical functions: F$ENVIRONMENT
89 SYS_GETUAI
90 Lexical functions: F$EXTRACT and F$IDENTIFIER
91 LIB_FAO and LIB_FAOL
92 LIB_FAO and LIB_FAOL, part 2
93 Lexical functions: F$FAO
94 File Processing Structures
95 Lexical functions: F$FILE_ATTRIBUTES
96 SYS_DISPLAY
97 Lexical functions: F$GETDVI
98 Parse_GetDVI
99 GetDVI
100 GetDVI, part 2
101 GetDVI, part 3
102 Lexical functions: F$GETJPI
103 GETJPI
104 Lexical functions: F$GETSYI
105 GETSYI
106 Lexical functions: F$INTEGER, F$LENGTH, F$LOCATE, and F$MATCH_WILD
107 Lexical function: F$PARSE
108 FILESCAN
109 SYS_PARSE
110 Lexical Functions: F$MODE, F$PRIVILEGE, and F$PROCESS
111 File Lookup Service
112 Lexical Functions: F$SEARCH
113 SYS_SEARCH
114 F$SETPRV and SYS_SETPRV
115 Lexical Functions: F$STRING, F$TIME, and F$TYPE
116 More on symbols
117 Lexical Functions: F$TRNLNM
118 SYS_TRNLNM, Part 2
119 Lexical functions: F$UNIQUE, F$USER, and F$VERIFY
120 Lexical functions: F$MESSAGE
121 TUOS_File_Wrapper
122 OPEN, CLOSE, and READ system services
UCL Commands
123 WRITE
124 Symbol assignment
125 The @ command
126 @ and EXIT
127 CRELNT system service
128 DELLNT system service
129 IF...THEN...ELSE
130 Comments, labels, and GOTO
131 GOSUB and RETURN
132 CALL, SUBROUTINE, and ENDSUBROUTINE
133 ON, SET {NO}ON, and error handling
134 INQUIRE
135 SYS_WRITE Service
136 OPEN
137 CLOSE
138 DELLNM system service
139 READ
140 Command Recall
141 RECALL
142 RUN
143 LIB_RUN
144 The Data Stream Interface
145 Preparing for execution
146 EOJ and LOGOUT
147 SYS_DELPROC and LIB_GET_FOREIGN
CUSPs and utilities
148 The I/O Queue
149 Timers
150 Logging in, part one
151 Logging in, part 2
152 System configuration
153 SET NODE utility
154 UUI
155 SETTERM utility
156 SETTERM utility, part 2
157 SETTERM utility, part 3
158 AUTHORIZE utility
159 AUTHORIZE utility, UI
160 AUTHORIZE utility, Access Restrictions
161 AUTHORIZE utility, Part 4
162 AUTHORIZE utility, Reporting
163 AUTHORIZE utility, Part 6
164 Authentication
165 Hashlib
166 Authenticate, Part 7
167 Logging in, part 3
168 DAY_OF_WEEK, CVT_FROM_INTERNAL_TIME, and SPAWN
169 DAY_OF_WEEK and CVT_FROM_INTERNAL_TIME
170 LIB_SPAWN
171 CREPRC
172 CREPRC, Part 2
173 COPY
174 COPY, part 2
175 COPY, part 3
176 COPY, part 4
177 LIB_Get_Default_File_Protection and LIB_Substitute_Wildcards
178 CREATESTREAM, STREAMNAME, and Set_Contiguous
179 Help Files
180 LBR Services
181 LBR Services, Part 2
182 LIBRARY utility
183 LIBRARY utility, Part 2
184 FS Services
185 FS Services, Part 2
186 Implementing Help
187 HELP
188 HELP, Part 2
189 DMG_Get_Key and LIB_Put_Formatted_Output
190 LIBRARY utility, Part 3
191 Shutting Down UOS
192 SHUTDOWN
193 WAIT
194 SETIMR
195 WAITFR and Scheduling
196 REPLY, OPCOM, and Mailboxes
197 REPLY utility
198 Mailboxes
199 BRKTHRU
200 OPCOM
201 Mailbox Services
202 Mailboxes, Part 2
203 DEFINE
204 CRELNM
205 DISABLE
206 STOP
207 OPCCRASH and SHUTDOWN
208 APPEND
Glossary/Index
Downloads
|
Unicode revisted
We discussed unicode in depth back in article 17.
We introduced the TUnicode_String class
that we used for file processing. The strings are static with a fixed maximum
length of 384 Unicode UTF32 characters. While this works fine for our needs up to this point, and
is faster than using a dynamic-length string, we need something of more general
utility for future uses. So, we've renamed the TUnicode_String class
to TStatic_Unicode_String , and created a new class named TUnicode_String .
This new class has the same methods (plus a couple extra that we'll cover later
in the article). The main difference is that the Contents array is dynamic in
this class - it is resized as necessary.
There are different ways we could have handled the new class - including
making the new one a descendent of the previous one, or having them both descend
from a common ancestor. However, virtualizing and generalizing to make this
happen would result in additional overhead. For the file system, we're concerned
about performance, so we will leave the old static string class the way it is and
the file processing will continue to make use of it.
Here is the new TUnicode_String class (the new Compare
method is described later). We shan't describe the code in any detail as it is
almost identical to the old class, with the addition of some dynamic array
handling - except for the new methods.
type TUnicode_String = class
public // Constructors and destructors...
constructor Create ;
destructor Destroy ; override ;
private // Instance data...
Has_Asterisk : boolean ;
Contents : array of cardinal ;
protected // Property handlers...
// Return length of our contents...
function Get_Length : integer ;
procedure Set_Length( Value : integer ) ;
public // API...
procedure Append( S : TUnicode_String ) ;
overload ;
procedure Append( S : string ) ;
overload ;
procedure Append( S : cardinal ) ;
overload ;
function As_String : string ;
// Assign our contents from a UTF8 string...
procedure Assign_From_String( const S : string ;
Format : integer ) ;
// Remove Length characters starting at Index
procedure Delete( Index, Len : integer ) ;
{ Compare our substring with the passed
substring. W_Start indicates the start
position and _Length is the length of the
substring. Name_Start indicates the start in the
compared string. Result:
-1 = Less than Match
0 = Equals Match
1 = Greater than Match }
function Compare( Wildcard_Start, _Length : integer ;
Match : TUnicode_String ;
Match_Start : integer ) : integer ;
// Create...
function Copy( Start, Len : integer ) : TUnicode_String ;
// Return edited string...
function Edit( Options, Escape : cardinal ) : TUnicode_String ;
// Return true if our contents are equal to the match
function Equal( Match : TUnicode_String ) : boolean ;
// Return character at specific index
function Get_Char( Index : integer ) : cardinal ;
// Insert character at given position
procedure Insert( Position : integer ;
Value : cardinal ) ;
// Convert our characters to lowercase...
procedure Lowercase ;
// Position of substring...
function Pos( const Value : string ;
Start : integer = 1 ) : integer ; overload ;
function Pos( const Value : TUnicode_String ;
Start : integer = 1 ) : integer ; overload ;
// Return rightmost instance of Value
function RPos( Value : char ) : integer ;
// Return Pos, considering wildcards
function Wildcard_Pos( Value : TUnicode_String ;
Start : integer = 1 ) : integer ;
public // Properties...
property Length : integer
read Get_Length
write Set_Length ;
end ; // TUnicode_String
And here are the updated methods.
// TUnicode_String methods...
// Constructors and destructors...
constructor TUnicode_String.Create ;
begin
inherited Create ;
setlength( Contents, 1 ) ;
Contents[ 0 ] := 0 ;
end ;
destructor TUnicode_String.Destroy ;
begin
setlength( Contents, 0 ) ;
inherited Destroy ;
end ;
// API...
function TUnicode_String.As_String : string ;
var Dummy, Loop : integer ;
begin
System.setlength( Result, Length ) ;
for Loop := 1 to Length do
begin
Dummy := Contents[ Loop ] ;
if( Dummy > 127 ) then
begin
Dummy := Dummy or 128 ;
end ;
Result[ Loop ] := chr( Dummy ) ;
end ;
end ;
procedure TUnicode_String.Assign_From_String( const S : string ;
Format : integer ) ;
var Index, Size, Mask : integer ;
Value : cardinal ;
begin
Index := 1 ; // Index in Spec
Contents[ 0 ] := 0 ;
if( Format = ST_UTF8 ) then // UTF8
begin
while( Index <= system.length( S ) ) do
begin
Value := 0 ;
if( S[ Index ] > #$FC ) then
begin
Size := 6 ;
Mask := 1 ;
end else
if( S[ Index ] > #$F8 ) then
begin
Size := 5 ;
Mask := 3 ;
end else
if( S[ Index ] > #$F0 ) then
begin
Size := 4 ;
Mask := 7 ;
end else
if( S[ Index ] > #$E0 ) then
begin
Size := 3 ;
Mask := $F ;
end else
if( S[ Index ] > #$C0 ) then
begin
Size := 2 ;
Mask := $1F ;
end else
begin
Size := 1 ;
Mask := $7F ;
end ;
while( Size > 0 ) do
begin
dec( Size ) ;
Value := Value or ( ord( S[ Index ] ) and Mask ) ;
if( Size > 0 ) then
begin
Value := Value shl 6 ;
end ;
Mask := $3F ;
inc( Index ) ;
end ;
inc( Contents[ 0 ] ) ;
setlength( Contents, Contents[ 0 ] + 1 ) ;
Contents[ Contents[ 0 ] ] := Value ;
end ; // while( Index < system.length( S ) )
end else
begin
setlength( Contents, system.Length( S ) div Format + 1 ) ;
Value := 0 ;
Index := 1 ; // Index in S
while( Index <= system.length( S ) ) do
begin
move( PChar( S )[ Index - 1 ], Value, Format ) ;
Index := Index + Format ;
inc( Contents[ 0 ] ) ;
Contents[ Contents[ 0 ] ] := Value ;
end ;
end ; // if( Format = ST_UTF8 )
end ; // TUnicode_String.Assign_From_String
function TUnicode_String.Get_Length : integer ;
begin
Result := Contents[ 0 ] ;
end ;
procedure TUnicode_String.Set_Length( Value : Integer ) ;
begin
Contents[ 0 ] := Value ;
setlength( Contents, Value + 1 ) ;
end ;
function TUnicode_String.Copy( Start, Len : integer ) : TUnicode_String ;
begin
// Setup...
Result := TUnicode_String.Create ;
if( Start > Length ) then
begin
exit ;
end ;
if( Start < 1 ) then
begin
Start := 1 ;
end ;
if( Start + Len - 1 > Length ) then
begin
Len := Length - Start + 1 ;
end ;
Result.Length := Len ;
move( Contents[ Start ], Result.Contents[ 1 ], Len * sizeof( cardinal ) ) ;
end ; // TUnicode_String.Copy
// Remove Length characters starting at Index
procedure TUnicode_String.Delete( Index, Len : integer ) ;
begin
if( Index < 1 ) then
begin
Index := 1 ;
end ;
if( Index > Length ) then
begin
exit ;
end ;
if( Index + Len > Length ) then
begin
Length := Index - 1 ;
exit ;
end ;
move( Contents[ Index + Len ], Contents[ Index ], Len ) ;
Length := Length - Len ;
end ; // TUnicode_String.Delete
function TUnicode_String.Equal( Match : TUnicode_String ) : boolean ;
var Loop : integer ;
begin
Result := False ;
if( Length <> Match.Length ) then
begin
exit ;
end ;
for Loop := 1 to Length do
begin
if( ( Contents[ Loop ] <> Match.Contents[ Loop ] ) ) then
begin
if(
( Contents[ Loop ] <> ord( '?' ) )
and
( Match.Contents[ Loop ] <> ord( '?' ) )
) then
begin
exit ;
end ;
end ;
end ;
Result := True ;
end ; // TUnicode_String.Equal
procedure TUnicode_String.Insert( Position : integer ; Value : cardinal ) ;
begin
setlength( Contents, system.length( Contents ) + 1 ) ;
move( Contents[ Position ], Contents[ Position + 1 ],
( system.Length( Contents ) - Position - 1 ) * sizeof( cardinal ) ) ;
Contents[ Position ] := Value ;
inc( Contents[ 0 ] ) ;
end ;
procedure TUnicode_String.Lowercase ;
var Dummy, V : integer ;
_Folding_Index : integer ;
begin
Dummy := 1 ;
while( Dummy <= Length ) do
begin
V := lowcase( Contents[ Dummy ], _Folding_Index ) ;
if( ( V = 0 ) and ( _Folding_Index >= 0 ) ) then
begin
Contents[ Dummy ] := Foldings[ _Folding_Index, 1 ] ;
for V := 2 to 3 do
begin
if( Foldings[ _Folding_Index, V ] <> 0 ) then
begin
inc( Dummy ) ;
Insert( Dummy, Foldings[ _Folding_Index, V ] ) ;
end ;
end ;
end else
begin
Contents[ Dummy ] := V ;
end ;
inc( Dummy ) ;
end ;
end ; // TUnicode_String.Lowercase
function TUnicode_String.Pos( const Value : TUnicode_String ;
Start : integer = 1 ) : integer ;
var Dummy, Dummy1 : integer ;
Found : boolean ;
begin
Result := 0 ;
if( Start > Length ) then
begin
exit ;
end ;
if( Value.Length > Length - Start + 1 ) then
begin
exit ; // Substring is longer than our contents
end ;
for Dummy := Start to Length - Value.Length + 1 do
begin
Found := True ;
for Dummy1 := 1 to Value.Length do
begin
if(
( Value.Contents[ Dummy1 ] <> Contents[ Dummy1 + Dummy - 1 ] )
and
( Contents[ Dummy1 + Dummy - 1 ] <> ord( '?' ) )
and
( Value.Contents[ Dummy1 ] <> ord( '?' ) )
) then
begin
Found := False ;
break ;
end ;
end ; // for Dummy1
if( Found ) then
begin
Result := Dummy ;
exit ;
end ;
end ; // for Dummy
end ; // TUnicode_String.Pos
function TUnicode_String.Pos( const Value : string ;
Start : integer = 1 ) : integer ;
var Dummy, Dummy1 : integer ;
Found : boolean ;
begin
Result := 0 ;
if( Start > Length ) then
begin
exit ;
end ;
if( System.Length( Value ) > Length - Start + 1 ) then
begin
exit ; // Substring is longer than our contents
end ;
for Dummy := Start to Length - system.length( Value ) + 1 do
begin
Found := True ;
for Dummy1 := 1 to system.length( Value ) do
begin
if( ord( Value[ Dummy1 ] ) <> Contents[ Dummy1 + Dummy - 1 ] ) then
begin
Found := False ;
break ;
end ;
end ; // for Dummy1
if( Found ) then
begin
Result := Dummy ;
exit ;
end ;
end ; // for Dummy
end ; // TUnicode_String.Pos
function TUnicode_String.RPos( Value : char ) : integer ;
var Loop, V : cardinal ;
begin
V := ord( Value ) ;
for Loop := Length downto 1 do
begin
if( Contents[ Loop ] = V ) then
begin
Result := Loop ;
exit ;
end ;
end ;
Result := 0 ;
end ;
function TUnicode_String.Wildcard_Pos( Value : TUnicode_String ;
Start : integer = 1 ) : integer ;
var Dummy, Dummy1 : integer ;
Found : boolean ;
begin
Result := 0 ;
if( Start > Length ) then
begin
exit ;
end ;
if( Value.Length > Length - Start + 1 ) then
begin
exit ; // Substring is longer than our contents
end ;
for Dummy := Start to Length - Value.Length + 1 do
begin
Found := True ;
for Dummy1 := 1 to Value.Length do
begin
if(
( Value.Contents[ Dummy1 ] <> Contents[ Dummy1 + Dummy - 1 ] )
and
( Contents[ Dummy1 + Dummy - 1 ] <> ord( '?' ) )
and
( Value.Contents[ Dummy1 ] <> ord( '?' ) )
) then
begin
Found := False ;
break ;
end ;
end ; // for Dummy1
if( Found ) then
begin
Result := Dummy ;
exit ;
end ;
end ; // for Dummy
end ; // TUnicode_String.Wildcard_Pos
Now, let's look at the new methods for this class.
procedure TUnicode_String.Append( S : TUnicode_String ) ;
var L : integer ;
begin
if( S = nil ) then
begin
exit ;
end ;
L := Length + 1 ;
Length := Length + S.Length ;
move( S.Contents[ 1 ], Contents[ L ], S.Length * sizeof( cardinal ) ) ;
end ;
procedure TUnicode_String.Append( S : string ) ;
var I, L : integer ;
begin
L := Length ;
Length := Length + system.length( S ) ;
for I := 1 to system.length( S ) do
begin
Contents[ L + I ] := ord( S[ I ] ) ;
end ;
end ;
procedure TUnicode_String.Append( S : cardinal ) ;
begin
Length := Length + 1 ;
Contents[ Length ] := S ;
end ;
These methods are used to append another string to ourselves. This overloaded
function has three versions: one takes a Pascal string, one takes a TUnicode_String ,
and one takes a single Unicode character value.
function TUnicode_String.Get_Char( Index : integer ) : cardinal ;
begin
if( ( Index < 1 ) or ( Index > Length ) ) then
begin
Result := 0 ;
exit ;
end ;
Result := Contents[ Index ] ;
end ;
This method simply provides a means of accessing the internal contents, one
character at a time.
// Do a wildcard comparison...
function TUnicode_String.Compare( Wildcard_Start, _Length : integer ;
Match : TUnicode_String ; Match_Start : integer ) : integer ;
var Loop : integer ;
begin
// Setup...
Result := 0 ;
if( _Length < 1 ) then
begin
exit ;
end ;
if( Wildcard_Start < 1 ) then
begin
Wildcard_Start := 1 ;
end ;
if( Match_Start < 1 ) then
begin
Match_Start := 1 ;
end ;
if( Wildcard_Start + _Length > Length + 1 ) then
begin
_Length := Length - Wildcard_Start + 1 ;
end ;
if( Match_Start + _Length > Match.Length + 1 ) then
begin
_Length := Match.Length - Match_Start + 1 ;
end ;
// Do comparison...
for Loop := 0 to _Length - 1 do
begin
if( Contents[ Wildcard_Start + Loop ] <>
Match.Contents[ Match_Start + Loop ] ) then
begin
if(
( Contents[ Wildcard_Start + Loop ] <> ord( '?' ) )
and
( Match.Contents[ Wildcard_Start + Loop ] <> ord( '?' ) )
) then
begin
if( Contents[ Wildcard_Start + Loop ] < Match.Contents[ Wildcard_Start + Loop ] ) then
begin
Result := -1 ;
end else
begin
Result := 1 ;
end ;
exit ;
end ;
end ;
end ;
end ; // TUnicode_String.Compare
This function is like the equivalent method in TStatic_Unicode_Sring .
Except, unlike that method, this one doesn't return a boolean, but an integer value which
indicates the following:
- -1 = Our contents are less than the Match
- 0 = Strings are equal
- 1 = Our contents are greater than the Match
After the setup, we loop through the contents
Let's look at the Compare function which compares two
Unicode strings.
// Compare strings. Result: 0 = equal, -1 = L < R, 1 = L > R
function Compare( L, R : TUnicode_String ; Wildcard : boolean ) : integer ;
var Dummy, Dummy1, I, Len : integer ;
LC, RC : cardinal ;
_L, _R, S, Temp : TUnicode_String ;
_Has_Asterisk : boolean ;
L_Start, R_Start : integer ;
L_End, R_End : integer ;
begin
// Setup...
Result := 0 ; // Assume equal
_L := L.Copy( 1, L.Length ) ;
_R := R.Copy( 1, R.Length ) ;
_Has_Asterisk := False ;
First, we make copies of the two strings and set up for the rest of the function.
// Pre-normalize the specification...
if( Wildcard ) then
begin
Dummy := _R.pos( '**' ) ;
while( Dummy > 0 ) do
begin
_R.Delete( Dummy, 1 ) ;
Dummy := _R.pos( '**' ) ;
end ;
if( _R.As_String = '*' ) then
begin
exit ; // Wildcard matches anything/everything
end ;
Dummy := _R.pos( '*?' ) ;
while( Dummy > 0 ) do
begin
Temp := _R.copy( Dummy + 2, _R.length ) ;
_R.Length := Dummy ;
_R.Append( '?*' ) ;
_R.Append( Temp ) ;
Temp.Free ;
Dummy := _R.pos( '*?' ) ;
end ;
_Has_Asterisk := _R.pos( '*' ) > 0 ;
end ; // if( Wildcard )
The Compare function can perform wildcard or normal comparisons. If
the Wildcard parameter is true, we will do a wildcard comparison.
In that case, we perform some normalization of the strings to simplify the following
comparison code. For instance, we convert double asterisks to single
asterisks, and switch all "?*" to "*?". If an asterisk is present, we set the
Has_Asterisk flag.
Len := _L.Length ;
if( Len > _R.Length ) then
begin
Len := _R.Length ;
end ;
Next, we determine the maximum number of characters to compare by minimizing the
two lengths and setting Len to that value. Without this, we might
try to index beyond the end of one of the strings as we loop through the data.
// Non-wildcard check...
if( not _Has_Asterisk ) then
begin
for I := 1 to Len do
begin
LC := _L.Contents[ I ] ;
RC := _R.Contents[ I ] ;
if( LC <> RC ) then
begin
if( Wildcard ) then
begin
if( LC = ord( '?' ) ) or ( RC = ord( '?' ) ) then
begin
continue ;
end ;
end ;
if( LC < RC ) then
begin
Result := -1 ;
exit ;
end else
begin
Result := 1 ;
exit ;
end ;
end ; // if( LC <> RC )
end ; // for I := 1 to Len
// If we get here, they are equal up to position Len...
if( L.Length <> R.Length ) then
begin
if( L.Length > R.Length ) then
begin
Result := 1 ;
end else
begin
Result := -1 ;
end ;
end ;
exit ;
end ; // if( not _Has_Asterisk )
If we don't have an asterisk (and a wildcard) then we have a straight-forward task.
We loop through the contents, comparing each character. If a character doesn't
match, we allow a match on either character being a "?". Otherwise, we set
the result to -1 or 1, as appropriate and exit.
If we get through the entire contents (up to Len ) with everything
being equal, we still aren't done. If the lengths of the strings differ, we
set the result appropriately. On otherwise equal strings, the longer one will
be "greater".
// Do wildcard match...
R_Start := 1 ;
L_Start := 1 ;
R_End := R.Length ;
L_End := L.Length ;
// Check prefix before first wildcard...
Dummy := _R.pos( '*' ) ;
if( Dummy > R_Start ) then // Something before the asterisk
begin
Result := R.Compare( R_Start, Dummy - R_Start, L, L_Start ) ;
if( Result <> 0 ) then
begin
exit ;
end ;
R_Start := Dummy ;
L_Start := Dummy ;
end ; // if( Dummy > R_Start )
// Check suffix after last wildcard...
Dummy := _R.RPos( '*' ) ;
if( Dummy < _R.Length ) then
begin
Result := R.Compare( Dummy + 1, R.Length - Dummy, L, L.Length - ( R.Length - Dummy ) + 1 ) ;
if( Result <> 0 ) then
begin
exit ;
end ;
R_End := Dummy ;
L_End := L.Length - ( R.Length - Dummy ) ;
end ; // if( Dummy < R.Length )
// Check for remaining matches, left-to-right...
while( R_Start <= R_End ) do
begin
if( R_Start >= R_End ) then
begin
break ; // All that's left in the wildcard spec is an asterisk - we match
end ;
Dummy := _R.Pos( '*', R_Start + 1 ) ;
S := R.Copy( R_Start + 1, Dummy - R_Start - 1 ) ;
Dummy1 := _L.Wildcard_Pos( S, L_Start ) ;
S.Free ;
if( Dummy1 = 0 ) then
begin
exit ; // Not found
end ;
// Move past wildcard and matching characters...
L_Start := Dummy1 + Dummy - R_Start - 1 ;
R_Start := Dummy ; // Move past wildcard and matching characters
end ; // while
end ; // Compare
We won't go over this code since it is almost identical to the _Compare
function we discussed in article 17. The only
differences are that it deals with the new TUnicode_String class and
returns the -1/0/1 values rather than a boolean.
Another new function is the Edit function:
function TUnicode_String.Edit( Options, Escape : cardinal ) : TUnicode_String ;
var AH, AL : cardinal ;
Dummy : integer ;
Escaped : boolean ;
ESI : integer ;
_Folding_Index : integer ;
Quote_Type : cardinal ;
Last : integer ;
Leading : boolean ;
OK : boolean ;
Space : boolean ;
V, V2, V3 : integer ;
begin
Result := TUnicode_String.Create ;
// Quick check...
if( Length = 0 ) then // No edits on zero-length strings
begin
Exit ;
end ;
// Normalize the Options...
if ( Options And ( 1024 or 64 ) ) = 1024 or 64 then
begin
Options := Options And Not 1024 ;
end ;
// Disallow [] to {} if [] to ()
if ( Options And 6144 ) = 6144 then
begin
Options := Options And Not 4096 ;
end ;
// Disallow () to {} if () to []
if ( Options And 24576 ) = 24576 then
begin
Options := Options And Not 16384 ;
end ;
// Disallow {} to [] if {} to ()
// Setup...
Space := False ; // No spaces
Last := 0 ;
Quote_Type := 0 ; // Not within significant quotes
ESI := 0 ;
Leading := True ;
Escaped := False ;
// Process string...
while( ESI < Length ) do
begin
inc( ESI ) ; // Increment source string pointer
OK := True ; // This byte is OK - so far
AH := Contents[ ESI ] ; // Current character
if( Quote_Type = 0 ) then // No quotes
begin
AL := AH ; // Save original value
if(
( AH <> ord( ' ' ) )
and
( AH <> _HT )
) then
begin
Leading := False ;
end ;
if( ( Options and 2 ) = 2 ) then // Remove all spaces/tabs
begin
if(
( AH = ord( ' ' ) )
or
( AH = _HT )
) then
begin
OK := False ;
end ;
end ;
if( ( Options and 4 ) = 4 ) then // Ignore special values?
begin
if( ( AH = _NUL ) or ( AH = _LF ) or ( AH = _FF ) or ( AH = _CR ) or ( AH = _DEL ) ) then
begin
OK := False ;
end ;
end ;
if( ( Options and 8 ) = 8 ) then // Ignore leading spaces
begin
if( Leading ) then
begin
OK := False ;
end ;
end ;
if( ( Options and 16 ) = 16 ) then // Reduce tabs/spaces to single space
begin
if(
( AH = ord( ' ' ) )
or
( AH = _HT )
) then
begin
if( Space ) then
begin
OK := False ;
end else
begin
Space := True ;
if( AH = _HT ) then
begin
AH := ord( ' ' ) ;
end ;
end ;
end ;
end ;
if( OK and ( ( Options and $40000 ) <> 0 ) ) then
begin
OK := AH >= ord( ' ' ) ;
end ;
if( ( Options and 32 ) = 32 ) then // Lower to upper case
begin
V2 := 0 ;
V3 := 0 ;
if( ESI < Length ) then
begin
V2 := Contents[ ESI + 1 ] ;
end ;
if( ESI + 1 < Length ) then
begin
V3 := Contents[ ESI + 2 ] ;
end ;
AH := Upcase( AH, V2, V3, _Folding_Index ) ;
ESI := ESI + _Folding_Index - 1 ;
end ;
if( ( AH = AL ) and ( ( Options and 512 ) = 512 ) ) then
begin // Upper to lower case (and not already converted the other way)
V := lowcase( AH, _Folding_Index ) ;
if( ( V = 0 ) and ( _Folding_Index >= 0 ) ) then
begin
AH := Foldings[ _Folding_Index, 1 ] ;
for V := 2 to 3 do
begin
if( Foldings[ _Folding_Index, V ] <> 0 ) then
begin
Result.Append( AH ) ;
AH := Foldings[ _Folding_Index, V ] ;
inc( Dummy ) ;
end ;
end ;
end else
begin
AH := V ;
end ;
end ;
if( ( Options and 64 ) = 64 ) then // Convert [] to ()
begin
if( AL = ord( '[' ) ) then
begin
AH := ord( '(' ) ;
end else
if( AL = ord( ']' ) ) then
begin
AH := ord( ')' ) ;
end ;
end ;
if( ( AH = AL ) and ( ( Options and 2048 ) = 2048 ) ) then // Convert () to []
begin
if( AL = ord( '(' ) ) then
begin
AH := ord( '[' ) ;
end else
if( AL = ord( ')' ) ) then
begin
AH := ord( ']' ) ;
end ;
end ;
if( ( Options and 4096 ) = 4096 ) then // Convert () to braces
begin
if( AL = ord( '(' ) ) then
begin
AH := ord( '{' ) ;
end else
if( AL = ord( ')' ) ) then
begin
AH := ord( '}' ) ;
end ;
end ;
if( ( AH = AL ) and ( ( Options and 8192 ) = 8192 ) ) then // Convert braces to ()
begin
if( AL = ord( '{' ) ) then
begin
AH := ord( '(' ) ;
end else
if( AL = ord( '}' ) ) then
begin
AH := ord( ')' ) ;
end ;
end ;
if( ( Options and 1024 ) = 1024 ) then // Convert [] to braces
begin
if( ( AL = ord( '[' ) ) or ( AL = ord( ']' ) ) ) then
begin
AH := ord( AL ) + 32 ;
end ;
end ;
if( ( AH = AL ) and ( ( Options and 16384 ) = 16384 ) ) then // Convert braces to []
begin
if( ( AL = ord( '{' ) ) or ( AL = ord( '}' ) ) ) then
begin
AH := ord( AL ) - 32 ;
end ;
end ;
end ; // if( Quote_Type = 0 )
if( OK ) then
begin
Result.Append( AH ) ; // Build result
if( ( AH = ord( ' ' ) ) or ( AH = _HT ) ) then
begin
Space := True ;
end ;
end ;
if( ( AH <> ord( ' ' ) ) and ( AH <> _HT ) ) then
begin
Space := False ;
if( ( Options and 256 ) = 256 ) then // Allow no alter within quotes
begin
if(
( ( AH = ord( '"' ) ) or ( AH = 39 ) )
and
( not Escaped )
) then
begin
if( Quote_Type = 0 ) then // Not in quotes
begin
Quote_Type := AH ;
end else
if( Quote_Type = AH ) then
begin
Quote_Type := 0 ;
end ;
end ;
end ;
end ; // if( ( AH <> ord( ' ' ) ) and ( AH <> _HT ) )
if(
( AH <> ord( ' ' ) )
and
( AH <> _HT )
) then
begin
Last := Result.Length ; // Last non-space character
end ;
if( Escape <> 0 ) then // Have an escape character
begin
if( AL = Escape ) then // This is an escape character
begin
if( ESI = 1 ) then // If first character, then always an escape
begin
Escaped := True ;
end else
begin
Escaped := not Escaped ;
// If previous character was an escape, then it is escaping this character
end ;
end else
begin
Escaped := False ;
end ; // if( AL = Escape )
end ; // if( Escape <> #0 )
end ; // while( ESI < Length )
if( ( Options and 128 ) = 128 ) then // Trim following spaces
begin
if( Quote_Type = 0 ) then // Not within quotes
begin
Result.Length := Last ; // Last non-space character position
end ;
end ;
end ; // TUnicode_String.Edit
We won't cover this function line-by-line. Suffice it to say that this method returns
a string which is a copy of our string, with certain textual transformations. The
specific transformation(s) performed depends on the bitmask Options passed to the method:
Bit | Meaning |
2 | Remove all white space (spaces and tabs). |
4 | Remove all nulls, linefeeds, formfeeds, carriage returns, and DELs. |
8 | Remove leading spaces/tabs. |
16 | Reduce multiple white space (spaces and tabs) to a single space. |
32 | Convert lower to upper case. |
64 | Convert square parentheses to normal parentheses: [] to () |
128 | Remove all trailing white space. |
256 | Leave characters within quotes (" or ') unmodified. |
512 | Convert upper case to lower case. |
1024 | Convert square parentheses to braces: [] to {} |
2048 | Convert parentheses to square parentheses: () to [] |
4096 | Convert parentheses to braces: () to {} |
8192 | Convert braces to parentheses {} to () |
16384 | Convert braces to square parentheses {} to [] |
The Escape parameter (if non-zero) can be treated as an "escape" code to mark the
following quote as one that should not be treated as a quote in terms of option 256.
The function basically sets up for the processing, including removing some bits
where they conflict with each other (such as 64 and 1024). Then we step through
each character in the string, and move it (or the transformed value) to the result.
One might wonder why we don't have a different function for the different transformations.
The reason is that we often want to perform multiple transformations on a given
string and it is more efficient to have a single routine do all of the requested
transformations in one fell swoop.
function upcase( V1, V2, V3 : cardinal ; var _Count : integer ) : cardinal ;
var L : integer ;
begin
Result := V1 ;
_Count := 1 ; // Indicates one character was translated
if( ( V1 < $61 ) or ( V1 > $118DF ) ) then // Not within range of our table
begin
exit ;
end ;
for L := 0 to high( Foldings ) do
begin
if(
( Foldings[ L, 1 ] = V1 )
and
( ( Foldings[ L, 2 ] = V2 ) or ( Foldings[ L, 2 ] = 0 ) )
and
( ( Foldings[ L, 3 ] = V3 ) or ( Foldings[ L, 3 ] = 0 ) )
) then
begin
Result := Foldings[ L, 0 ] ;
if( Foldings[ L, 3 ] <> 0 ) then
begin
_Count := 3 ;
end else
if( Foldings[ L, 2 ] <> 0 ) then
begin
_Count := 2 ;
end ;
exit ;
end ;
end ; // for L := 0 to high( Foldings )
end ; // upcase
The Edit method makes use of one new function. UpCase does the
opposite of the LowCase function - it converts lowercase characters
to upper case. The process is a little bit more complicated because going from
upper to lower case involves a single input character. But the other way might
require up to three input characters to generate a single output character. So
we have to search the Foldings array for matching 1-3 lowercase characters. Since
some conversions involve only one or two lowercase values, we have to take
that into account. We return, via the _Count parameter, the number
of input characters that were used to convert to a single upper case character.
function Lowercase( const S : string ) : string ;
begin
Result := Edit( S, 512, 0 ) ;
end ;
function Edit( S : string ; Options, Escape : integer ) : string ;
var I : integer ;
US, US1 : TUnicode_String ;
begin
Result := S ;
for I := 1 to length( S ) do
begin
if( S[ I ] > #127 ) then // Have UTF8
begin
US := TUnicode_String.Create ;
US.Assign_From_String( S, ST_UTF8 ) ;
US1 := US.Edit( Options, Escape ) ;
US.Free ;
Result := US1.As_String ;
US1.Free ;
exit ;
end ;
end ;
Result := CommonUt.Edit( S, Options ) ;
end ;
Lastly, we have two functions that perform actions on UTF-8 strings without the
calling code having to construct TUnicode_String instances solely to do them.
Lowercase simply calls Edit with the lowercase option
flag of 512.
Edit assumes a UTF-8 string. It scans the string for any character
with the top bit set. Such are interpreted as a UTF-8 character and the function
constructs a Unicode_String instance, assigns the string to it, calls
the instance's Edit method, returns the Pascal string version of
the result, and frees the instance. If there are no UTF-8 characters, we call a
version of Edit that is in the CommonUt unit. We won't cover that
function here. It is basically the same code as the Edit method, but operates on
a normal ANSI string. It is a little more efficient in memory space and performance
on these strings, not to mention that we don't need to construct a TUnicode_String
instance and free it. It is possible that on a really long string, the cost of
scanning the entire string to find an UTF-8 character exceeds the cost of constructing
the instance and skipping the scan, but most of our use of this routine will be dealing with fairly short
strings.
In the next article, now that we've laid the groundwork, we will begin our
examination of UCL lexical functions.
Copyright © 2019 by Alan Conroy. This article may be copied
in whole or in part as long as this copyright is included.
|