1 Introduction
2 Ground Rules

Building a File System
3 File Systems
4 File Content Data Structure
5 Allocation Cluster Manager
6 Exceptions and Emancipation
7 Base Classes, Testing, and More
8 File Meta Data
9 Native File Class
10 Our File System
11 Allocation Table
12 File System Support Code
13 Initializing the File System
14 Contiguous Files
15 Rebuilding the File System
16 Native File System Support Methods
17 Lookups, Wildcards, and Unicode, Oh My
18 Finishing the File System Class

The Init Program
19 Hardware Abstraction and UOS Architecture
20 Init Command Mode
21 Using Our File System
22 Hardware and Device Lists
23 Fun with Stores: Partitions
24 Fun with Stores: RAID
25 Fun with Stores: RAM Disks
26 Init wrap-up

The Executive
27 Overview of The Executive
28 Starting the Kernel
29 The Kernel
30 Making a Store Bootable
31 The MMC
32 The HMC
33 Loading the components
34 Using the File Processor
35 Symbols and the SSC
36 The File Processor and Device Management
37 The File Processor and File System Management
38 Finishing Executive Startup

Users and Security
39 Introduction to Users and Security
40 More Fun With Stores: File Heaps
41 File Heaps, part 2
42 SysUAF
43 TUser
44 SysUAF API

Terminal I/O
45 Shells and UCL
46 UOS API, the Application Side
47 UOS API, the Executive Side
48 I/O Devices
49 Streams
50 Terminal Output Filters
51 The TTerminal Class
52 Handles
53 Putting it All Together
54 Getting Terminal Input
55 QIO
56 Cooking Terminal Input
57 Putting it all together, part 2
58 Quotas and I/O

UCL
59 UCL Basics
60 Symbol Substitution
61 Command execution
62 Command execution, part 2
63 Command Abbreviation
64 ASTs
65 Expressions, Part 1
66 Expressions, Part 2: Support code
67 Expressions, part 3: Parsing
68 SYS_GETJPIW and SYS_TRNLNM
69 Expressions, part 4: Evaluation

UCL Lexical Functions
70 PROCESS_SCAN
71 PROCESS_SCAN, Part 2
72 TProcess updates
73 Unicode revisted
74 Lexical functions: F$CONTEXT
75 Lexical functions: F$PID
76 Lexical Functions: F$CUNITS
77 Lexical Functions: F$CVSI and F$CVUI
78 UOS Date and Time Formatting
79 Lexical Functions: F$CVTIME
80 LIB_CVTIME
81 Date/Time Contexts
82 SYS_GETTIM, LIB_Get_Timestamp, SYS_ASCTIM, and LIB_SYS_ASCTIM
83 Lexical Functions: F$DELTA_TIME
84 Lexical functions: F$DEVICE
85 SYS_DEVICE_SCAN
86 Lexical functions: F$DIRECTORY
87 Lexical functions: F$EDIT and F$ELEMENT
88 Lexical functions: F$ENVIRONMENT
89 SYS_GETUAI
90 Lexical functions: F$EXTRACT and F$IDENTIFIER
91 LIB_FAO and LIB_FAOL
92 LIB_FAO and LIB_FAOL, part 2
93 Lexical functions: F$FAO
94 File Processing Structures
95 Lexical functions: F$FILE_ATTRIBUTES
96 SYS_DISPLAY
97 Lexical functions: F$GETDVI
98 Parse_GetDVI
99 GetDVI
100 GetDVI, part 2
101 GetDVI, part 3
102 Lexical functions: F$GETJPI
103 GETJPI
104 Lexical functions: F$GETSYI
105 GETSYI
106 Lexical functions: F$INTEGER, F$LENGTH, F$LOCATE, and F$MATCH_WILD
107 Lexical function: F$PARSE
108 FILESCAN
109 SYS_PARSE
110 Lexical Functions: F$MODE, F$PRIVILEGE, and F$PROCESS
111 File Lookup Service
112 Lexical Functions: F$SEARCH
113 SYS_SEARCH
114 F$SETPRV and SYS_SETPRV
115 Lexical Functions: F$STRING, F$TIME, and F$TYPE
116 More on symbols
117 Lexical Functions: F$TRNLNM
118 SYS_TRNLNM, Part 2
119 Lexical functions: F$UNIQUE, F$USER, and F$VERIFY
120 Lexical functions: F$MESSAGE
121 TUOS_File_Wrapper
122 OPEN, CLOSE, and READ system services

UCL Commands
123 WRITE
124 Symbol assignment
125 The @ command
126 @ and EXIT
127 CRELNT system service
128 DELLNT system service
129 IF...THEN...ELSE
130 Comments, labels, and GOTO
131 GOSUB and RETURN
132 CALL, SUBROUTINE, and ENDSUBROUTINE
133 ON, SET {NO}ON, and error handling
134 INQUIRE
135 SYS_WRITE Service
136 OPEN
137 CLOSE
138 DELLNM system service
139 READ
140 Command Recall
141 RECALL
142 RUN
143 LIB_RUN
144 The Data Stream Interface
145 Preparing for execution
146 EOJ and LOGOUT
147 SYS_DELPROC and LIB_GET_FOREIGN

CUSPs and utilities
148 The I/O Queue
149 Timers
150 Logging in, part one
151 Logging in, part 2
152 System configuration
153 SET NODE utility
154 UUI
155 SETTERM utility
156 SETTERM utility, part 2
157 SETTERM utility, part 3
158 AUTHORIZE utility
159 AUTHORIZE utility, UI
160 AUTHORIZE utility, Access Restrictions
161 AUTHORIZE utility, Part 4
162 AUTHORIZE utility, Reporting
163 AUTHORIZE utility, Part 6
164 Authentication
165 Hashlib
166 Authenticate, Part 7
167 Logging in, part 3
168 DAY_OF_WEEK, CVT_FROM_INTERNAL_TIME, and SPAWN
169 DAY_OF_WEEK and CVT_FROM_INTERNAL_TIME
170 LIB_SPAWN
171 CREPRC
172 CREPRC, Part 2
173 COPY
174 COPY, part 2
175 COPY, part 3
176 COPY, part 4
177 LIB_Get_Default_File_Protection and LIB_Substitute_Wildcards
178 CREATESTREAM, STREAMNAME, and Set_Contiguous
179 Help Files
180 LBR Services
181 LBR Services, Part 2
182 LIBRARY utility
183 LIBRARY utility, Part 2
184 FS Services
185 FS Services, Part 2
186 Implementing Help
187 HELP
188 HELP, Part 2
189 DMG_Get_Key and LIB_Put_Formatted_Output
190 LIBRARY utility, Part 3
191 Shutting Down UOS
192 SHUTDOWN
193 WAIT
194 SETIMR
195 WAITFR and Scheduling
196 REPLY, OPCOM, and Mailboxes
197 REPLY utility
198 Mailboxes
199 BRKTHRU
200 OPCOM
201 Mailbox Services
202 Mailboxes, Part 2
203 DEFINE
204 CRELNM
205 DISABLE
206 STOP
207 OPCCRASH and SHUTDOWN
208 APPEND

Glossary/Index


Downloads

Symbol Substitution

In the previous article we described the UCL symbol substitution feature. In this article, we will examine the source code that implements it. We want to follow the VMS specification as closely as possible at this level. Unfortunately, the VMS documentation is somewhat vague as to the DCL parsing algorithm. In fact, I did some experiments with DCL and found that, under some conditions, the behavior of DCL was inconsistent with the documentation. Which is wrong? The documentation or the implementation? I have no idea. I have to confess some disappointment with the - otherwise very good - VMS documentation. For instance, consider the following situation:

A = "1"
B = "'A'"
C = "'"
Given the above symbol definitions, what should be the operation of the following assignment?
D = "'C'"

Symbol D ought to contain a single apostrophe ('), which it does. So far so good. Likewise, the following:
D='B'

results in D containing "1" due to iterative substitution of B. But now let's consider this assignment:
D:='C''B'

There are two possible ways to interpret how this would be parsed. The first way would be a simple substitution of 'C' and then 'B'. Since B would be iteratively substituted to 'A' and then to "1", D would contain the following:
'1

The other way to interpret the parsing would be to substitute 'C' which is an apostrophe, then use that as part of the following substitution (essentially an intermediate value of D=''B') resulting in a non-iterative substitution of B and resulting in D having the value:
'A'

But what result do we actually get from DCL? Symbol D has the following value:
B

Good grief! This simply doesn't correspond to anything in the documentation. I've tried other odd combinations, such as:
D="''C'''B'"

which resulted in D containing:
B'

Which only makes sense if the terminating apostrophe for C was also considered the first part of three apostrophes (it plus the following two). Now, there were some examples in the VMS literature that seemed to indicate that this was the intended behavior, but they might also have been typos. In the absence of any DCL documentation about this situation, we are left scratching our heads.

I could provide some other odd examples, where DCL behavior appears decidedly non-deterministic. But, frankly, I don't have the time to try to reverse engineer the DCL parsing - and I'm not sure that I'd want to. All we can do is aim at being the most compatible with the documentation as we can be, and with the actual behavior when it provides consistent insight into the interpretation of the documentation.

We could handle the issue of substitution a couple of ways. We could make several passes through the command line, substituting symbols as we come across them and repeating until no more substitutions occur. Or we could make a single pass through the command line, doing subsitutions as we encounter them. Personally, I'd prefer the first since that is somewhat cleaner code to implement. However, I think that the second approach provides a result that is closer to the observed behavior and to a strict reading of the documentation. UCL might not be 100% compatible with DCL, but it is close and it is consistent with the documentation.

function Parse( S : string ) : string ;

var C : char ;
    E, I : integer ;
    In_Quotes : boolean ;
    Single_Apostrophe : boolean ;
    Name : string ;

begin
    // Phase I substitution...
    In_Quotes := False ;
    Result := '' ;
    I := 1 ;
    while( I <= length( S ) ) do
    begin
        C := S[ I ] ; // Get next character
We start with phase 1 subsitution, obviously. First, we clear the result - which we will build as we process through the passed string. We start at the first string offset, and then loop until our index (I) exceeds the end of the string. The first thing we do in the phase 1 substitution loop is get the character at the current index.

        if( C = '"' ) then
        begin
            if( copy( S, I + 1, 1 ) = '"' ) then // Two quotes = single literal quote
            begin
               inc( I ) ;
               Result := Result + '"' ;
            end else
            begin
                In_Quotes := not In_Quotes ;
            end ;
            Result := Result + '"' ;
        end else
If the character is a quote ("), we toggle the In_Quotes flag, and add the quote to the result string. Note that two quotes together indicate a single literal quote character, thus they do not change the state of In_Quotes.

        if( C = #39 ) then // Apostrophe (substitution)
        begin
            Single_Apostrophe := ( copy( S, I + 1, 1 ) <> #39 ) ;
            if( not Single_Apostrophe ) then
            begin
                inc( I ) ;
            end ;
If the current character is an apostrophe, we are possibly processing a symbol substitution. As we discussed in the previous article, apostrophes delimit symbol names, but within quotes, the symbol name must be preceeded by two apostrophes. Single_Apostrophe is true if only a single apostrophe is found and false if there are multiples. If there are two apostrophes, we move the index past the first one.

            E := PosEx( #39, S, I + 1 ) ;
            if(
                ( E = 0  )
                or 
                (
                  ( E > PosEx( '"', S, I + 1 )
                  and
                  In_Quotes
                ) 
              ) then // No end quote to the name, must not be one
            begin
                Result := Result + #39 ; // Treat as literal
                if( not Single_Apostrophe ) then
                begin
                    Result := Result + #39 ;
                end ;
                inc( I ) ;
                continue ;
            end ;
            if( In_Quotes ) then
            begin
                if( Single_Apostrophe ) then // Symbol translation within strings must use 2 apostrophes
                begin
                    Result := Result + #39 ;
                    inc( I ) ;
                    continue ;
                end ;
            end ;
Once we've found the symbol name, we look for the terminating apostrophe. If there is none (or the next one is outside of the quote we are in), we know that this is not a symbol. In that case, we add the apostrophe to the result and increment the index. Within quotes, we make sure that there are two apostrophes.

            Name := copy( S, I + 1, E - I - 1 ) ; // Get symbol name
            I := E ;
            Name := LIB_Get_Symbol( Name ) ;
            if( Single_Apostrophe ) then
            begin
                // Iteratively substitute...
                while(
                       ( copy( Name, 1, 1 ) = #39 )
                       and
                       ( copy( Name, length( Name ), 1 ) = #39 )
                     ) do
                begin
                    Name := LIB_Get_Symbol( Name ) ;
                end ;
            end ;
            Result := Result + Name ;
        end else
Next we extract the name from the string and move the index to the end of the name. Then we translate the symbol's name to a value. Note that if two apostrophes in a row are found (or three in a row within quotes), that is parsed as a null name, which translates to a null string.

        begin
            Result := Result + C ;
        end ; // if( C = #39 )
        inc( I ) ;
    end ; // while I <= length( S )
Next we process any character which is not a quote or an apostrophe simply by adding it to the result. Finally we increment the index and loop back to process the next character. When we reach the end of the string, we exit the loop.

Now we are ready for the phase II substitution.

    // Phase II substitution...
    In_Quotes := False ;
    S := Result ;
    Result := '' ;
    I := 1 ;
    while( I <= length( S ) ) do
    begin
        C := S[ I ] ;
        if( C = '"' ) then
        begin
            In_Quotes := not In_Quotes ;
            Result := Result + '"' ;
        end else
        if(
            ( C = '&' )
            and
            ( pos( copy( S, I - 1, 1 ), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789$_-' ) = 0 )
            and
            ( not In_Quotes )
          ) then // Substitution
        begin
            E := I + 1 ;
            while(
                   ( E <= length( S ) )
                   and
                   ( pos( copy( S, I - 1, 1 ), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789$_-' ) = 0 )
                 ) do
            begin
                inc( E ) ;
            end ;
            Name := copy( S, I + 1, E - I - 1 ) ;
            I := E ;
            Name := LIB_Get_Symbol( Name ) ;
            Result := Result + Name ;
        end else
        begin
            Result := Result + C ;
        end ; // if( C = #39 )
        inc( I ) ;
    end ; // while I <= length( S )
end ; // Parse
Similar to the phase 1 substitution loop, we go through the string, using I as an index into the string. We grab the current character. We check for a quote and set our In_Quotes appropriately. The other special case is the amperand (&), which is our substitution indicator. Finally, any other character is copied to the result. Then we increment the index and loop back.
For the ampersand substitution, we check to make sure we are not inside quotes, and that the preceeding character wasn't one a valid symbol name character. If either of those is true, the ampersand is treated as any other character. Otherwise, we look for the end of the symbol name, which will terminate at the first character that doesn't match the list of symbol name characters. We then grab that symbol name, translate the name to the symbol value, and set the index to the last character of the symbol name so that the end-of-loop index increment will move us to the character after the symbol name. When the loop is done, we return the string with substitutions to the caller.

In the next article, we will look at the Process procedure, which handles the execution of UCL scripts.

 

Copyright © 2019 by Alan Conroy. This article may be copied in whole or in part as long as this copyright is included.