1 Introduction 2 Ground Rules Building a File System 3 File Systems 4 File Content Data Structure 5 Allocation Cluster Manager 6 Exceptions and Emancipation 7 Base Classes, Testing, and More 8 File Meta Data 9 Native File Class 10 Our File System 11 Allocation Table 12 File System Support Code 13 Initializing the File System 14 Contiguous Files 15 Rebuilding the File System 16 Native File System Support Methods 17 Lookups, Wildcards, and Unicode, Oh My 18 Finishing the File System Class The Init Program 19 Hardware Abstraction and UOS Architecture 20 Init Command Mode 21 Using Our File System 22 Hardware and Device Lists 23 Fun with Stores: Partitions 24 Fun with Stores: RAID 25 Fun with Stores: RAM Disks 26 Init wrap-up The Executive 27 Overview of The Executive 28 Starting the Kernel 29 The Kernel 30 Making a Store Bootable 31 The MMC 32 The HMC 33 Loading the components 34 Using the File Processor 35 Symbols and the SSC 36 The File Processor and Device Management 37 The File Processor and File System Management 38 Finishing Executive Startup Users and Security 39 Introduction to Users and Security 40 More Fun With Stores: File Heaps 41 File Heaps, part 2 42 SysUAF 43 TUser 44 SysUAF API Terminal I/O 45 Shells and UCL 46 UOS API, the Application Side 47 UOS API, the Executive Side 48 I/O Devices 49 Streams 50 Terminal Output Filters 51 The TTerminal Class 52 Handles 53 Putting it All Together 54 Getting Terminal Input 55 QIO 56 Cooking Terminal Input 57 Putting it all together, part 2 58 Quotas and I/O UCL 59 UCL Basics 60 Symbol Substitution 61 Command execution 62 Command execution, part 2 63 Command Abbreviation 64 ASTs 65 Expressions, Part 1 66 Expressions, Part 2: Support code 67 Expressions, part 3: Parsing 68 SYS_GETJPIW and SYS_TRNLNM 69 Expressions, part 4: Evaluation UCL Lexical Functions 70 PROCESS_SCAN 71 PROCESS_SCAN, Part 2 72 TProcess updates 73 Unicode revisted 74 Lexical functions: F$CONTEXT 75 Lexical functions: F$PID 76 Lexical Functions: F$CUNITS 77 Lexical Functions: F$CVSI and F$CVUI 78 UOS Date and Time Formatting 79 Lexical Functions: F$CVTIME 80 LIB_CVTIME 81 Date/Time Contexts 82 SYS_GETTIM, LIB_Get_Timestamp, SYS_ASCTIM, and LIB_SYS_ASCTIM 83 Lexical Functions: F$DELTA_TIME 84 Lexical functions: F$DEVICE 85 SYS_DEVICE_SCAN 86 Lexical functions: F$DIRECTORY 87 Lexical functions: F$EDIT and F$ELEMENT 88 Lexical functions: F$ENVIRONMENT 89 SYS_GETUAI 90 Lexical functions: F$EXTRACT and F$IDENTIFIER 91 LIB_FAO and LIB_FAOL 92 LIB_FAO and LIB_FAOL, part 2 93 Lexical functions: F$FAO 94 File Processing Structures 95 Lexical functions: F$FILE_ATTRIBUTES 96 SYS_DISPLAY 97 Lexical functions: F$GETDVI 98 Parse_GetDVI 99 GetDVI 100 GetDVI, part 2 101 GetDVI, part 3 102 Lexical functions: F$GETJPI 103 GETJPI 104 Lexical functions: F$GETSYI 105 GETSYI 106 Lexical functions: F$INTEGER, F$LENGTH, F$LOCATE, and F$MATCH_WILD 107 Lexical function: F$PARSE 108 FILESCAN 109 SYS_PARSE 110 Lexical Functions: F$MODE, F$PRIVILEGE, and F$PROCESS 111 File Lookup Service 112 Lexical Functions: F$SEARCH 113 SYS_SEARCH 114 F$SETPRV and SYS_SETPRV 115 Lexical Functions: F$STRING, F$TIME, and F$TYPE 116 More on symbols 117 Lexical Functions: F$TRNLNM 118 SYS_TRNLNM, Part 2 119 Lexical functions: F$UNIQUE, F$USER, and F$VERIFY 120 Lexical functions: F$MESSAGE 121 TUOS_File_Wrapper 122 OPEN, CLOSE, and READ system services UCL Commands 123 WRITE 124 Symbol assignment 125 The @ command 126 @ and EXIT 127 CRELNT system service 128 DELLNT system service 129 IF...THEN...ELSE 130 Comments, labels, and GOTO 131 GOSUB and RETURN 132 CALL, SUBROUTINE, and ENDSUBROUTINE 133 ON, SET {NO}ON, and error handling 134 INQUIRE 135 SYS_WRITE Service 136 OPEN 137 CLOSE 138 DELLNM system service 139 READ 140 Command Recall 141 RECALL 142 RUN 143 LIB_RUN 144 The Data Stream Interface 145 Preparing for execution 146 EOJ and LOGOUT 147 SYS_DELPROC and LIB_GET_FOREIGN CUSPs and utilities 148 The I/O Queue 149 Timers 150 Logging in, part one 151 Logging in, part 2 152 System configuration 153 SET NODE utility 154 UUI 155 SETTERM utility 156 SETTERM utility, part 2 157 SETTERM utility, part 3 158 AUTHORIZE utility 159 AUTHORIZE utility, UI 160 AUTHORIZE utility, Access Restrictions 161 AUTHORIZE utility, Part 4 162 AUTHORIZE utility, Reporting 163 AUTHORIZE utility, Part 6 164 Authentication 165 Hashlib 166 Authenticate, Part 7 167 Logging in, part 3 168 DAY_OF_WEEK, CVT_FROM_INTERNAL_TIME, and SPAWN 169 DAY_OF_WEEK and CVT_FROM_INTERNAL_TIME 170 LIB_SPAWN 171 CREPRC 172 CREPRC, Part 2 173 COPY 174 COPY, part 2 175 COPY, part 3 176 COPY, part 4 177 LIB_Get_Default_File_Protection and LIB_Substitute_Wildcards 178 CREATESTREAM, STREAMNAME, and Set_Contiguous 179 Help Files 180 LBR Services 181 LBR Services, Part 2 182 LIBRARY utility 183 LIBRARY utility, Part 2 184 FS Services 185 FS Services, Part 2 186 Implementing Help 187 HELP 188 HELP, Part 2 189 DMG_Get_Key and LIB_Put_Formatted_Output 190 LIBRARY utility, Part 3 191 Shutting Down UOS 192 SHUTDOWN 193 WAIT 194 SETIMR 195 WAITFR and Scheduling 196 REPLY, OPCOM, and Mailboxes 197 REPLY utility 198 Mailboxes 199 BRKTHRU 200 OPCOM 201 Mailbox Services 202 Mailboxes, Part 2 203 DEFINE 204 CRELNM 205 DISABLE 206 STOP 207 OPCCRASH and SHUTDOWN 208 APPEND Glossary/Index Downloads |
Symbols and the SSC Symbols The SSC doesn't care what names are used for symbols, as long as they don't contain NUL characters (ASCII 0). However, all official UOS usage will follow the following rules (and for practical purposes, shells should also follow suit): All symbols should start with an alphabetic value (A-Z, or similar Unicode values for other languages). They should only consist of human-readable glyphs (nothing less than ASCII 32), and should not contain spaces, colons, slashes, or backslashes. Further, all UOS-specific symbols will contain a single dollar sign ($). A single dollar sign should not be used by applications in order to avoid a name collision with UOS symbols (multiple dollar signs are okay). Typically, UOS symbols will start with a three-letter prefix, followed by a dollar-sign, and then some alphanumeric characters. An example of this is the sys$system symbol that we mentioned in the previous article. UOS symbols are also case-insensitive. That is, SYS$SYSTEM is the same symbol as sys$system. This might cause a bit of a problem with some Unix shells, and we may talk about ways around this issue in the future. In terms of the values of symbols, any data can be stored in them. There is no official limit to the size of the symbols' content, but large amounts of symbol data will use large amounts of memory. It is best to use files to store big data. Symbols serve the same purpose in shells as variables do in languages such as C++ and Pascal. Symbols are stored in groups called "symbol tables". Within a given symbol table, each symbol must be uniquely named - there cannot be multiple symbols with the same name in the same table. Each process that is running has its own symbol table. Generally, processes do not share symbols with each other (although there are some exceptions to this). This is similar to the different local variables in different functions in a program. That is, the scope of the variables is limited to the code in which they are defined. There are some cases, though, where symbols can be shared between processes. For instance, there is a system symbol table that is accessible by any/all processes on the system. However, although symbols in this table can be read by any process, only certain users are allowed to add new symbols to it, delete them, or modify existing ones. One example of a symbol in the system symbol table is sys$system. There are other symbol tables. Here are their names and short descriptions of them:
One consequence of the foregoing: a symbol defined for a process is used in place of a symbol of the same name that exists in more outer scope. That is, if a process defines sys$system, then the process symbol is used instead of the sys$system value from the system symbol table whenever that symbol is referenced. In other words, symbols in the process table override symbols of the same name in the job symbol table, which overrides symbols in the group symbol table, and so on. Although an application can specify which table to use when it comes to accessing/creating a symbol, the shells will use the aforementioned scoping rules by default. Another use of symbols is for that of defaulting application options. This is something that we will discuss in detail in a future article. For now, I'll summarize it thus: a given application can use a symbol to determine the options to use when it is run. The user can explicitly override these options when they run the program, or they can define their own process symbol which will be used instead of the system default. This happens because the local symbol scope overrides the outer scope of the system symbol table. Physical and Virtual devices Logical devices One of the most important functions served by UOS symbols is that of logical devices, or more simply - just "logicals". Any UOS symbol that is suffixed with a colon and used in a filename specification is evaluated and replaced with its contents. This provides a means of redirection. This is, in fact, what sys$system is used for. It contains the physical device name, and path, for the system directory for the booted installation of UOS. For instance, if sys$system has a value of "DISKA0:\uos\uos1" and we reference "sys$system:\file.txt", it is translated to "DISKA0:\uos\uos1\file.txt". So, in essence, sys$system serves as a logical device. From the standpoint of UOS, it can be used in place of any physical device name. Where would this feature be used in real life? Let's consider your typical Microsoft Windows system. Let's say that you install a new game called Cowcraft. Being a newer game, it takes about 50 Gb of your disk space. It installs to a specific folder on your C: drive. Over time, you install a bunch of other software and the free space on your C: drive disappears. Now you have to make room on your C: drive in order to install even more programs. Fortunately, you have a D: drive, which has a lot of free space. So, what do you do? You uninstall Cowcraft from C: and reinstall it on D:. Because it is a big game, this takes a fair amount of time. Repeat this for each program you need to move to D:. And let's hope you don't have to uninstall/reinstall it again when D: runs low on disk space. UOS provides a much easier way of dealing with this. In the UOS approach, the initial installation of Cowcraft defines a system logical called "cowcraft" that points to its installed location. If you need to move it, you simply copy it from its current location to a new location and change "cowcraft:" to point to the new location. There is no need to reinstall, because all references to the game (whether on your desktop, in the registry, in configuration files, or anywhere else) use Cowcraft: to find it. Further, and perhaps even more importantly, the data associated with the game can be assigned to a different logical, such as Cowcraftdata:, so that it can be moved independently of the program files themselves (and this is a good thing, considering that the data associated with the application may exceed the size of the application by several orders of magnitude). Changing the value of a logical is much simpler and faster than a reinstall. Logicals can also contain references to other logicals, allowing the system administrator (or users) to create an entire hierarchy of redirection. Note that this can create a potential infinite loop as UOS tries to resolve a logical device reference. We will discuss this situation later in this article. But what happens if a logical name matches a physical device name? For instance, what if you define DISKA0: as a logical? How then, do you ever get to the actual physical device? Symbol names cannot start with an underscore, so UOS knows that any device name that is prefixed with an underscore references a physical device. Thus, _DISKA0: can always be used to reference the first physical store, even if DISKA0 has been (re)defined. Finally, logicals can contain several paths. This is something that we will discuss later in this article and in the next article. Link files also provide a means of redirection, as we discussed with installed.sys and default.sys. However, these two files are the only links that UOS makes use of. Logicals provide several advantages over links. For instance, links can only link to one specific place, whereas logicals can link to a list of target locations. Links can only link to another location on the same store, whereas logicals can link to any device. Maintaining link files to keep the file system structure consistent can be time-consuming. In fact, if it weren't for the sake of being Posix-compliance, we wouldn't bother supporting link files at all. One advantage to links over logicals is that the definition of the redirection is essentially "baked" into the file structure, which logicals are not. The Linux file system also provides links, which can be soft links or hard links. They both operate the same, and differ mostly in how they are implemented. We will talk more about links in the future. Windows provides a form of logicals for some system folders. For instance %APPDATA% can be used in Windows explorer to refer to the application data folder. However, these cannot be used in file specifications outside of explorer, nor can they be overridden or redefined, and they cannot contain multiple paths. As is typical with Windows, it provides a half-baked (or, in this case, quarter-baked) implementation of what UOS does. Multiply-scoped symbols are a powerful feature of UOS. However, it is often the case that such a powerful feature can lead to security issues. Consider the case of database server software that stores a file of access information in a directory pointed to by the logical SQLData:. If a user overrides the SQLData symbol to point to his own directory containing his own file of access information, when the database goes to authenticate a user, it could end up authenticating against a faux file and allowing unauthorized people to access data. So, although we generally want to allow the user the flexibility of overriding symbols with his local process table, applications need to consider this possible means of subversion - if it applies. Fortunately, UOS provides a means of easily avoiding this pitfall. Although the application could choose to read the value of the symbol directly from LNM$SYSTEM, it is easier (and more foolproof) to prefix such logicals with an underscore. "Wait!" you say. Doesn't an underscore prefix mean a physical device specification? Yes, it does. However, if the logical doesn't match a physical device (and something like "SQLData" never would) then UOS looks in LNM$SYSTEM for the translation of the logical name. So, using _SQLData: will use the value of SQLData from LNM$SYSTEM even if the user had SQLData defined in LNM$PROCESS. The only restriction here is that any system symbol with the same name as a physical device is essentially invisible when the underscore prefix is used. The solution is simple: don't use physical device names as system symbol names. Do use underscores to force use of the LNM$SYSTEM table for logicals which point to data that the user shouldn't be able to override. In most cases, this won't be an issue. But to build a secure operating system, we need to take such issues into consideration from the start. I would be remiss if I didn't point out that we are diverging a bit from strict adherence to the VMS specification. VMS makes a distinction between symbols and logicals. They are treated as completely separate features. In fact the LNM$ prefix is a contraction for "Logical NaMes". UOS combines the two features into one because it simplifies things. And just because logicals are used differently doesn't mean that they should be implemented differently. UOS simply has symbols, and they can be used as logicals if so desired. There are some other minor differences from VMS as well, but I mentioned back at the start of these articles that we would occasionally not match VMS exactly.
The SSC
Much of the code is similar to previous components that we've discussed. Some unique items include the _System_Symbols and _Cluster_Symbols instance data. Symbol tables such as LNM$PROCESS, LNM$JOB, and LNM$GROUP are stored in the USC component, but there is only one system and cluster symbol table on a given UOS system, so they are in the SSC. _Temp and TempS are used to return symbol contents to callers. Here are the utility methods:
These methods are simply wrappers for instance data. Here are the "standard" support methods used in all executive components:
Most of our methods need to determine which of the various symbol tables to use for a given operation. Rather than replicate the code in each place, we have a utility function that will determine which table to use and to do other validation. Here is that function:
Symbol names cannot be null, so if it is we return an error. Also, symbols cannot start with an underscore. If it does, we handle it two possible ways: if the specified table is the system table, we just trim the underscore. Otherwise we return an error. Note that if, after trimming the underscore, the name is null (meaning that the name was nothing but an underscore), we return an error. Other than a leading underscore, we have no restrictions on the symbol name. Next we grab the appropriate symbol table. If the system or cluster table is specified, we use the appropriate instance variable. If the group or job table is specified, we ask the USC component for the given table for the current process ID. Otherwise, we request the process table from the USC. Now let's look at the Set_Symbol method.
This method will create (and set) the symbol if it doesn't already exist. Otherwise, it changes the value of the existing symbol of that name. The first thing we do is convert the Name parameter from a pchar to a string. Then we clear any errors. This is done at the start of almost all of the methods of components. The reason is that after each call, the component's error status may be checked. A nil value indicates no error. We clear the error to default to "success". If any errors occur, we'll set the error and exit. Next we trim trailing and leading spaces from the name. Then we call the Resolve_Table function to validate the symbol name and determine which table to use. Finally, we set the value in the table. Now that we have a way to set a value, we need a way to get a value.
Our first order of business is to clear the error and normalize the symbol name. If a specific table is passed, we call Resolve_Table to get the table instance. However, if LNM_ALL is passed (the else condition), then we have to search for the table that has the symbol, starting with the process table, and moving up the hierarchy all the way to the cluster symbol table. As soon as we find the symbol, we are done and exit. If we get to LNM_CLUSTER, we return whatever value it returned to us. That is, if it is found in the cluster symbol table, we return that result. If it is not, then our recursive call to ourselves will return nil and we return that. We don't set an exception condition if the symbol is not found. The function simply returns the value of the symbol, or nil if the symbol doesn't exist. However, if a specific table is requested, we execute the following code:
If Resolve_Table returned nil (which shouldn't happen) then there is no table and therefore no symbol, so we simply exit. We check to see if the chosen table has the symbol (Table.Exists). If so, we get the value and return it. The result is a UOS_String which is how string data is passed around UOS. It doesn't rely on Pascal strings because we cannot guarantee that the different executive components will be compiled with versions of Pascal that have compatible strings, or that all of the components will even be in Pascal. TUOS_String provides us with a specific implementation of strings that can be used by all components regardless of the specific compiler used for them. If the symbol is not found in the specified table, we return the default of nil. In this function we have a way to obtain a symbol using the symbol table override hierarchy, or to look for a symbol in a specific table. Now that we can create, modify, and read symbols, the other main operation we need is a way to delete a symbol from a table. The following function does this for us.
This simple function determines which table is specified, and then tells the table to delete the symbol. Note that there is no way for the function to generate an exception - it clears any existing exceptions and then never sets it. So, no error is generated if the symbol doesn't exist. If someone wants to determine if a symbol exists, they can call Get_Symbol. The final function is the most complicated symbol-related code. This function is used to resolve the value of a symbol, looking for it in the symbol table hierarchy. More than that, it also handles logical redirection - including those with multiple paths.
First we clear exceptions, then trim leading/trailing spaces from the passed name. If the name is null, return an error. Because any symbol can resolve to another symbol, it is possible for the user to set up a recursive definition. For instance, symbol A could contain "B:", symbol B could contain "C:" and symbol C could contain "A:". When resolving symbol A, we proceed to B, which resolves to C, which resolves back to A. This is an infinite loop. So, the next thing we do is set a limit as to how many resolution steps we will allow. If we exceed that number of steps, we can be sure that there is a recursion, and we exit with an error. We determine the maximum number of steps by adding the number of symbols in the cluster, system, group, job, and process tables. The worst-case valid situation is that all the symbols are used during recursion. If more than that many recursions happen, then at least one symbol recurses. Couldn't we prevent the infinite recursion problem by disallowing any symbol value change that would result in the problem? Unforuntately, that would be a prohibitively expensive operation since it would require checking all existing symbols for all symbol tables in the system (including those for each process) each time a system symbol is set, and checking all tables in the hierarchy when a process symbol value is changed. Besides, it is valid for a symbol to have any value. The problem only occurs when such a symbol is used as a logical - there is no way to tell how a symbol will be used when it is created/changed. Finally, we call the local _Resolve function, which will perform the actual symbol resolution. After the function returns, we check the value of Max_Iterations, which is decremented by _Resolve on each resolution step it performs. If it is less than zero, it means there were infinite resursions. Here is the code for the local Count_For function:
It simply returns the symbol count for the given table, returning 0 for non-existent tables. Here is the code for _Resolve:
First, we assume success by setting the result to True. Then we do a quick check - if there is no colon in the symbols, we set the parameter (passed by reference) and exit. Next we check the iteration count to see if we exceeded the maximum number of steps. If so, we exit. This is checked at the start of the function because it is called recursively for each step of resolution.
Next, we must process the symbol value (copied to S). We check for multiple paths, delimited by semicolons (;). We will process each path separately, removing it from S and looping until S is empty (meaning that there are no more paths). Because a semicolon is also a valid character in file and directory names, we need a way of indicating whether a semicolon is a delimiter or part of the file name. The default is that a semicolon acts as a delimiter. If we want a literal semicolon, the whole file name must be enclosed within quotes. Quote_Pos is used to search for a semicolon delimiter while ignoring the literal semicolons. N is set to null prior to the start of the loop so that we can build up the resolved value as we iterate through the paths. The current path is extracted to Work and removed from S. Now we process the current path (in Work).
If there is no colon, or the colon is the first character in the path, then we do not have a logical and we simply append the path to our result. Otherwise, we need to resolve the symbol. First we trim everything after the colon to the Suffix variable and remove it from the current path. Next we check the value against the devices in Device_Table (which came from the File Processor). This table contains a list of all device names, with preceeding underscores. If the logical is, in fact, a physical device, then we are done with this path. Otherwise, we need to start a resolution step. We first check for the underscore prefix. If we get to this check, we already know that this is not a physical device. So the presence of an underscore indicates a system symbol reference. So we look up the symbol in the LNM_System table. If it isn't there, we put the underscore back on the path and append it to the result, but leave the function result as False. Otherwise, we take the contents of the system symbol and check for the presence of a colon. If there is one, we recursively call ourselves to do the next resolution step. Otherwise, we are done and append the suffix to the symbol value. To understand this, consider a logical with the value of "X:\folder\". In this case, "\folder\" is the suffix and "X" is the symbol to resolve. Let's say that it resolves to "_DISKA0:". Once we've resolved its value, we have to put the suffix back on, which would leave us with "_DISKA0:\folder\". Note that if there are multiple colons, the second (and later) colons are treated as literal parts of the path. Thus, "X:\http::/" would resolve to "_DISKA0:\http::/". This is an invalid file specification, of course, but it is not the job of the resolution code to validate or ensure valid file specifications. Use of such a symbol would result in an error. If the underscore prefix is not used, we must resolve the symbol using the symbol table hierarchy.
If we find a matching symbol, we use its value. Otherwise, we look for a matching device in the device table. Therefore, the logical "DISKA0:" will match a symbol named "DISKA0", and that symbol's value will be substituted. But if no symbol is found, we look for "_DISKA0:" in the device table. Thus, users can use device names without underscores to reference devices while allowing for redirection. In fact, except for security reasons, devices should be specified without underscores so that redirection will work. If it is found in the device table, we add the device (with underscore) to the result, append the suffix and continue on to the next path. Since we resolved to a physical device, there are no more steps to perform on this path. But if the symbol wasn't found in a symbol table or the device table, we clear Work, since this path doesn't redirect anywhere. Otherwise, we take the symbol value and recursively call ourselves to resolve it. When we are done, we add the path and its suffix to the result. Then we loop to the next path in the list. Here is the code for the Quote_Pos function:
This function simply marches through a string, looking for the search string. What makes this different than typical string search functions, like Pos, is that it keeps track of whether a given position is within a quote and it won't match anything between quotes ("). This is done with the Quote char variable. It is a space when not within a quote and a quote character when within one. Here is the code for the Insert_Suffix function:
The purpose of Insert_Suffix is to append a suffix to a given string (which is the current path being processed). In simple cases, this could be done with a single string concatenation. But a given subsitution could subsitute a path list. For instance, consider the use of "X:\files", where X contains "_DISKA0:;_DISKB1:". A redirection should result in an end value of "_DISKA0:\files;_DISKB1:\files". So this function goes through all possible paths and appends the suffix to each one. In the next article, we will discuss the implementation of the File Processor component. Copyright © 2017 by Alan Conroy. This article may be copied in whole or in part as long as this copyright is included. |