1 Introduction 2 Ground Rules Building a File System 3 File Systems 4 File Content Data Structure 5 Allocation Cluster Manager 6 Exceptions and Emancipation 7 Base Classes, Testing, and More 8 File Meta Data 9 Native File Class 10 Our File System 11 Allocation Table 12 File System Support Code 13 Initializing the File System 14 Contiguous Files 15 Rebuilding the File System 16 Native File System Support Methods 17 Lookups, Wildcards, and Unicode, Oh My 18 Finishing the File System Class The Init Program 19 Hardware Abstraction and UOS Architecture 20 Init Command Mode 21 Using Our File System 22 Hardware and Device Lists 23 Fun with Stores: Partitions 24 Fun with Stores: RAID 25 Fun with Stores: RAM Disks 26 Init wrap-up The Executive 27 Overview of The Executive 28 Starting the Kernel 29 The Kernel 30 Making a Store Bootable 31 The MMC 32 The HMC 33 Loading the components 34 Using the File Processor 35 Symbols and the SSC 36 The File Processor and Device Management 37 The File Processor and File System Management 38 Finishing Executive Startup Users and Security 39 Introduction to Users and Security 40 More Fun With Stores: File Heaps 41 File Heaps, part 2 42 SysUAF 43 TUser 44 SysUAF API Terminal I/O 45 Shells and UCL 46 UOS API, the Application Side 47 UOS API, the Executive Side 48 I/O Devices 49 Streams 50 Terminal Output Filters 51 The TTerminal Class 52 Handles 53 Putting it All Together 54 Getting Terminal Input 55 QIO 56 Cooking Terminal Input 57 Putting it all together, part 2 58 Quotas and I/O UCL 59 UCL Basics 60 Symbol Substitution 61 Command execution 62 Command execution, part 2 63 Command Abbreviation 64 ASTs 65 Expressions, Part 1 66 Expressions, Part 2: Support code 67 Expressions, part 3: Parsing 68 SYS_GETJPIW and SYS_TRNLNM 69 Expressions, part 4: Evaluation UCL Lexical Functions 70 PROCESS_SCAN 71 PROCESS_SCAN, Part 2 72 TProcess updates 73 Unicode revisted 74 Lexical functions: F$CONTEXT 75 Lexical functions: F$PID 76 Lexical Functions: F$CUNITS 77 Lexical Functions: F$CVSI and F$CVUI 78 UOS Date and Time Formatting 79 Lexical Functions: F$CVTIME 80 LIB_CVTIME 81 Date/Time Contexts 82 SYS_GETTIM, LIB_Get_Timestamp, SYS_ASCTIM, and LIB_SYS_ASCTIM 83 Lexical Functions: F$DELTA_TIME 84 Lexical functions: F$DEVICE 85 SYS_DEVICE_SCAN 86 Lexical functions: F$DIRECTORY 87 Lexical functions: F$EDIT and F$ELEMENT 88 Lexical functions: F$ENVIRONMENT 89 SYS_GETUAI 90 Lexical functions: F$EXTRACT and F$IDENTIFIER 91 LIB_FAO and LIB_FAOL 92 LIB_FAO and LIB_FAOL, part 2 93 Lexical functions: F$FAO 94 File Processing Structures 95 Lexical functions: F$FILE_ATTRIBUTES 96 SYS_DISPLAY 97 Lexical functions: F$GETDVI 98 Parse_GetDVI 99 GetDVI 100 GetDVI, part 2 101 GetDVI, part 3 102 Lexical functions: F$GETJPI 103 GETJPI 104 Lexical functions: F$GETSYI 105 GETSYI 106 Lexical functions: F$INTEGER, F$LENGTH, F$LOCATE, and F$MATCH_WILD 107 Lexical function: F$PARSE 108 FILESCAN 109 SYS_PARSE 110 Lexical Functions: F$MODE, F$PRIVILEGE, and F$PROCESS 111 File Lookup Service 112 Lexical Functions: F$SEARCH 113 SYS_SEARCH 114 F$SETPRV and SYS_SETPRV 115 Lexical Functions: F$STRING, F$TIME, and F$TYPE 116 More on symbols 117 Lexical Functions: F$TRNLNM 118 SYS_TRNLNM, Part 2 119 Lexical functions: F$UNIQUE, F$USER, and F$VERIFY 120 Lexical functions: F$MESSAGE 121 TUOS_File_Wrapper 122 OPEN, CLOSE, and READ system services UCL Commands 123 WRITE 124 Symbol assignment 125 The @ command 126 @ and EXIT 127 CRELNT system service 128 DELLNT system service 129 IF...THEN...ELSE 130 Comments, labels, and GOTO 131 GOSUB and RETURN 132 CALL, SUBROUTINE, and ENDSUBROUTINE 133 ON, SET {NO}ON, and error handling 134 INQUIRE 135 SYS_WRITE Service 136 OPEN 137 CLOSE 138 DELLNM system service 139 READ 140 Command Recall 141 RECALL 142 RUN 143 LIB_RUN 144 The Data Stream Interface 145 Preparing for execution 146 EOJ and LOGOUT 147 SYS_DELPROC and LIB_GET_FOREIGN CUSPs and utilities 148 The I/O Queue 149 Timers 150 Logging in, part one 151 Logging in, part 2 152 System configuration 153 SET NODE utility 154 UUI 155 SETTERM utility 156 SETTERM utility, part 2 157 SETTERM utility, part 3 158 AUTHORIZE utility 159 AUTHORIZE utility, UI 160 AUTHORIZE utility, Access Restrictions 161 AUTHORIZE utility, Part 4 162 AUTHORIZE utility, Reporting 163 AUTHORIZE utility, Part 6 164 Authentication 165 Hashlib 166 Authenticate, Part 7 167 Logging in, part 3 168 DAY_OF_WEEK, CVT_FROM_INTERNAL_TIME, and SPAWN 169 DAY_OF_WEEK and CVT_FROM_INTERNAL_TIME 170 LIB_SPAWN 171 CREPRC 172 CREPRC, Part 2 173 COPY 174 COPY, part 2 175 COPY, part 3 176 COPY, part 4 177 LIB_Get_Default_File_Protection and LIB_Substitute_Wildcards 178 CREATESTREAM, STREAMNAME, and Set_Contiguous 179 Help Files 180 LBR Services 181 LBR Services, Part 2 182 LIBRARY utility 183 LIBRARY utility, Part 2 184 FS Services 185 FS Services, Part 2 186 Implementing Help 187 HELP 188 HELP, Part 2 189 DMG_Get_Key and LIB_Put_Formatted_Output 190 LIBRARY utility, Part 3 191 Shutting Down UOS 192 SHUTDOWN 193 WAIT 194 SETIMR 195 WAITFR and Scheduling 196 REPLY, OPCOM, and Mailboxes 197 REPLY utility 198 Mailboxes 199 BRKTHRU 200 OPCOM 201 Mailbox Services 202 Mailboxes, Part 2 203 DEFINE 204 CRELNM 205 DISABLE 206 STOP 207 OPCCRASH and SHUTDOWN 208 APPEND Glossary/Index Downloads |
Allocation Cluster Manager
Managing file clusters involves managing a chain of buffers containing arrays of pointers to the file data. Most of the work involved with creating, reading, and writing file data is performed by the allocation cluster manager class. So, let's write the class. First, the definition:
There are two basic functions we need to provide for the file system software:
You may notice that we are not using the stdcall calling convention, which is the standard for UOS components. However, this class is not a component; it is a helper class. For performance sake, we do not use stdcall here. The first function allows us to set the size, in bytes, of the space allocated to this chain of allocation clusters. The second is a means of obtaining the address of data on the store. That is, given a file offset, we return the location on the store where that offset is located. Both of these operations will require us to process variable-sized allocation clusters. Remember that the minimum storage allocation will vary from store to store, so we can't be guaranteed how large the buffer will be. So, we will need to dynamically allocate a buffer. But we don't want to allocate/deallocate the buffer on each function call, for performance reasons. But, we can assume that the minimum allocation size of a store is not going to change once we write to it. So, once we have a store assigned to us, we can allocate a buffer of the appropriate size and use it for all calls. Note that the buffer size will be _Max_Index times 8 (the size of a 64-bit pointer), but we will cache it to avoid constant multiply (or shift) operations. Here are the class instance data to support this buffer:
PAllocation_Record64 is a pointer to an array of TStore_Address64 values, which is defined as:
This allows us to allocate a buffer of any size. The _Buffer pointer points to a potentially huge array, but we will set _Max_Index to the maximum index in the buffer, based on the cluster allocation size. When the store is assigned to our class, we will allocate the buffer and set _Max_Index.
We also need to know the cluster size of the extents that we are managing. This may be different than the cluster size of the allocation clusters. We will leave the allocation cluster size tied to the store cluster size, but with a minimum value of 128, which would provide 16 pointers per allocation cluster. On a standard hard disk cluster size of 512, it would be 64 pointers. To finish off the rest of the data our class will need to operate: the root address of the first allocation cluster on the store (0 if not allocated), the store we are using, and the file cluster size.
To determine the size of the data that we manage, we could run down the chain of allocation clusters, adding up the number of extents times the cluster size. That could be an expensive operation, so we will cache the size in another variable. And finally, we need a heap to store our buffer in. You see, we can't use the default Pascal (or C, or C++) heap since it will try to ask the Windows OS for memory, but there is no Windows OS, there is only UOS. UOS has heap management as well, but until we have a native UOS compiler, or a Windows emulator (which won't run in the kernel), the code generated by our compiler isn't going to use the UOS heap. We will get into the UOS heap class(es) later, but for now, we will assume that the caller who creates this class also assigns a heap instance to it. So we need a pointer to that heap object. Here are the remaining instance data variables:
Since heaps can operate on any store, they return integer values for pointers. Since we trust that a heap class for memory is being assigned to us, we can cast these integers to pointers. TCOM_Heap64 descends from the managed store, and so we can (for now) just define TCOM_Heap64 as a synonym for TCOM_Managed_Store64.
Obviously, we need methods to allow our caller to assign the heap, store, and clustersize. Further, for existing files, we will need to know the root address of the first allocation cluster. We will also need a way for the caller to obtain the current values. Here are the methods to handle these assignments:
We also need a destructor to clean up after ourselves:
Now that we have defined the methods, we can look at the implementation. Here is the destructor:
We set our heap to nil, because the Set_Heap method will deallocate the buffer whenever the heap is changed.
Here are the Get_* and Set_* method implementations, except for Set_Size, which we will cover last:
We deallocate the buffer from the existing heap, but - to be safe - we will make sure the heap is assigned before we do. Also, to be safe, we set the buffer to nil and the buffer size to 0.
For performance sake, we only process a new cluster size if it differs from the current value. If different, we set the instance data and call Set_Root with the current Root value. Set_Root recalculates the file size.
Setting the root is meaningless if we have no store for the root to point to, so we exit if there is no store. Note: this means that the caller needs to make sure the store is assigned to the instance before the root is set. Otherwise, we set the root, clear the size to 0, and then process all the allocation clusters to determine the size of the data. We will loop until we hit the end of the allocation cluster chain. In the loop, we read the next (first) allocation cluster, then we loop through the pointers. Once we hit a null pointer (value of 0), we can exit the loop because it means that we have run out of extents. Otherwise we add in another cluster size worth of data to the size. We only loop to _Max_Index - 1 because that last pointer in each allocation cluster is a pointer to the next allocation cluster. So, we use that value for the next allocation cluster to read, and loop back to do it all over again.
Setting the heap is the first thing that should be done. Next is setting the store. Last is setting the root (if pre-existing data). Once the store is set, we clear the cached size and the root pointer, and set the store pointer to the passed value. Assuming that a nil hasn't been passed, we default the file cluster size to the minimum allocation size of the store. We also set the default max_index, which will determine the buffer size, based on the same value - but we always assume at least 128 bytes per allocation cluster. As you can see, the Get_* methods are easy; they simply return the instance data.
Note that nowhere in these setup routines do we allocate the buffer - we assume that it was already set up. But, since we can't be sure what order various methods will be called, we don't know where the buffer might be assigned. For instance, Set_Root might not be called at all, if the file starts off as zero-length. And Set_Size will only be called if we change the file's size. In general, we want to avoid allocating data until it is needed, so we will implement a function that will only allocate a buffer when it is needed. Instead of referring directly to _Buffer, we will refer to the Buffer method, which will return _Buffer, and allocate it if it hasn't already been allocated. Here's the method:
And, so, we will modify Set_Root as follows:
There is actually a bug in this method, which we will address in a bit... But, first, let's look at the Offset_To_Pointer method.
The first thing we do is validate the offset. A negative offset, or an offset larger than the cached size is outside the range of the file contents, so we return 0. Note that since the first offset is 0, the comparison is ">= Size" rather than just "> Size". Then we calculate the allocation cluster number and the index in that cluster. Offset is the byte offset in the file, so, we convert it from byte offset to the file's cluster offset. Then we divide the file cluster by the number of pointers in each allocation cluster to get the allocation cluster offset, and the mod gives us the pointer index in that allocation cluster. So, next we need to read the appropriate allocation cluster, which is what the loop does. Then we simply return the value of the appropriate pointer.
Here, we see a potential for improving performance. Most data access tends to cluster around previous data access. This is especially true for files that are being read in sequential order. In the case of a file being read sequentially (such as being read into an editor), up to 63 clusters accessed will be through the same allocation cluster. If we already have the cluster in our buffer, why read it in again from the store? In the case of a RAM disk, reading the buffer over and over from the store isn't going to slow things down too much, but doing that from your typical hard disk is going to dramatically impact performance in a negative way. Even in the case of the RAM disk, the whole point of the memory-resident disk is faster performance, so why unnecessary overhead? Instead, we'll add a function to pull in a specific allocation cluster, and if it is already in the buffer, we don't need to do anything. On a large file, this could reduce the number of disk reads by as much as 64 times on a disk with a cluster size of 512. Even when we have caching on the disk, this small change in our class will likely have a noticable effect. Here's the new method, and the updated version of Offset_To_Pointer:
You may notice the first line in the Pull method calls Buffer, but does nothing with the result from that method. The reason we do this is because if the buffer doesn't already exist, _Buffer_Size will still be 0. Which means that the call to _Store.Read( P, _Buffer_Size, Buffer^ ) might be passing 0 as the length to _Store.Read, depending upon what order the compiler processes the parameters. We don't want our code to fail depending upon which version of Pascal compiler is used to build it, so this guarantees that _Buffer_Size is properly set before it is referenced. Remember the bug I mentioned in Set_Root? This is the same issue. So, at the start of the Set_Root method as well, we'll add a call to Buffer, just to make sure the size is properly initialized. We save the current allocation cluster index (_Current_AC) after we get the requested allocation cluster, and we check that at the start of the routine. If the requested allocation cluster is the last one we read, then there is nothing to do and we've saved ourselves a read operation on the store. If the requested cluster is after the one requested, we can just following the links from our current position, rather than starting from the root. Depending upon the type of operations being done on the file, this may save us a great many read operations. However, if the requested cluster is prior to our current location, we start from the root.
Finally, we come to the Set_Size method, which is where all the real work in this class is done. We'll take it in pieces:
It is possible that the caller passed a size for the data which is not a multiple of the clustersize, so we do some math to guarantee that the size requested is a multiple. Note that there are two sizes associated with files. One is the amount of disk space taken by the file. By definition, this will be a multiple of the store's minimum allocation size. The other file size has to do with the logical end of the file. This is kept track of elsewhere. As an example, let's say that you have a text file on a disk with 512 byte sectors, which contains only the text "hello world". The size of the file on the disk is 512 bytes. But the logical size of the file is only 11 bytes. This is an important thing to note for us, long term, but this class is only concerned with the actual disk space allocated to the file. So, if a request comes in for the file to be set to 11 bytes, our math will make sure that we allocate one clustersize worth of bytes for it, no matter how much larger than 11 that is. When we are done, Value will be the total number of file clusters required for the requested size. We then save the total size in New_Size by multiplying the cluster count by the cluster size. If we succeed, we will update the cached _Size with this value. We check to make sure we are actually changing the size and exit if not. Next, we make sure the buffer is allocated and the buffer_Size set. We set _Current_AC to -1 to force the Pull method to search for a requested cluster from the start. We do this because we are going to be reading and/or writing multiple allocation clusters in this routine as we adjust the file's size, and whatever was cached before most likely won't be when we finish. Then we handle the situation when _Root is 0, indicating the file has no allocation clusters (which only happens for files with size 0). In this case, we allocate the first allocate cluster for the file and assign the pointer to _Root. The next phase of the routine needs to handle two possible situations: truncating (shortening) or extending (expanding) the file. We use an if...then...else to handle both situations separately. My initial code did a single loop which did various checks for whether or not we were deleting or adding clusters, but despite the fact that it worked just fine, it wasn't the easiest code to work with. The new code takes up the same number of lines of code, but is much simpler to understand.
For expanding the file size, we need to add additional file clusters to the end of the current ones. So, we first pull in the last allocation cluster in the file. We subtract 1 from _Size because the clusters are 0-based, and we want the cluster offset (or index), not the cluster number (eg cluster number 1 is cluster index 0). Then we convert from the requested size to the requested additional clusters, including any used indexes in the current allocation cluster. Then starting in the loaded allocation cluster, we process through all pointers in it, decrementing the number of clusters remaining, and allocating space on the store and assigning the address to each null (0) pointer. When we reach the end of the allocation cluster, we see if we need more allocation clusters. If so, we allocate one on the store and assign it to the last pointer in the current buffer. Then we update the store with the new allocation buffer. If we created a new allocation cluster, we clear the buffer and run through the loop again. Otherwise, we're done. Now to the truncation code:
Truncating the file means freeing up file clusters and zeroing the pointers to them, for all clusters beyond the requested new size. We pull the last allocation cluster, figure out how many file cluster pointers we need to keep in the last allocation cluster. Then we process through the pointers in the allocation cluster, deallocating file clusters, and setting the pointer to 0. After going through all the pointers, if the first pointer is 0 then we know that all of them are zero and the allocation cluster itself is deallocated. Otherwise, we clear the last pointer (the pointer to the next allocation cluster), and save the current allocation cluster to the store. Then we read in the next cluster and go through the process all over again. Once we find a null pointer, we know we are at the end of the pointers, and we can early out of the loop. As a general rule, code should be written so that data isn't lost or corrupted. This goes double for operating system code. The last thing we want to do is have UOS be unreliable. For this reason, each time we deallocate a file data cluster, we update the on-store copy of the allocation cluster so that, if something goes wrong, we don't have allocation cluster pointers referencing areas of the disk that have been reallocated to another file. If we wait until the end of the loop before we update the on-store copy of the allocation cluster, we run the risk that if something fails (hardware or software) before we can update the store, we will have left-over, and errant, pointers in our allocation clusters. Note that we didn't update the on-store allocation cluster each time we allocated something during a file expansion. It wasn't necessary in that case, because the on-store allocation cluster didn't have any errant pointers (all pointers were null or pointed to allocated file clusters). But, doesn't that mean that there is the possiblilty of orphaning some clusters if we don't complete the process of writing the updated allocation cluster back to the store? Yes. But we don't lose or corrupt any data as a consequence. We can always run a check on the disk to recover allocated, but unused, areas of the store. In fact, this should be automatic if the system crashes. But more about these issues in article 15. Now the rest of the method:
We set _Current_AC to -1 to force Pull to search from the root. This is because the last cached allocation cluster is most likely no longer what it was before the call to Set_Size. We then update the cached size, and clear _Root if the new size is 0. So, now we have a working class. But the multiple updates of the on-store allocation cluster during a truncate has terrible performance. It would be nice to move the dealloations to the end of the loop so we only update the allocation cluster once. But we can't trade reliability for speed. Can we have the best of both worlds? Yes, we can. We simply have to make sure that we update the allocation cluster before we deallocate the file data clusters. So, we can save up a list of data clusters to deallocate and then deallocate them after we deallocate the allocation cluster, which we can now safely do once after the loop ends. The most extreme case of a file trunction is when a file is set to size 0, which happens when it is deleted. Obviously, a file can only be deleted once and other forms of truncation tend to happen with much less frequency than expansions, therefore we can create a dynamic buffer to hold the list of pending deallocates when a truncation is required, without worrying about undue performance issues. We could create a buffer for this list when the class is created, but chances are that it would not be used and thus it would waste memory for each file UOS had open. We could also use a static buffer for all instances of this class to use, but then we would have problems with multithreaded access to that buffer, which would be a headache on several levels. So, here is our improved version of the truncation code:
And, of course, this requires a new variable declaration at the top:
TInteger64_List is a class that implements a list of 64-bit integer values. But wait! We are creating and freeing this dynamically-allocated class without using the heap object pointed to by _Heap. Won't that be a problem? As it turns out, we can intercept dynamic memory allocation/deallocations in Delphi/FPC and revector them to our heap class instance. I brought up the whole issue of the _Heap merely to illustrate the point that we can't rely on any compiler feature which assumes the presence of a particular operating system. If you are using a compiler which doesn't allow you to intercept built-in heap calls, then you have no choice but to assign a heap class instance to your classes. And you will have to replace the TInteger64_List with a class that makes use of the heap class.
UOS needs to track various metrics in order to provide statistics for performance monitoring and feedback. In terms of our allocation cluster management class, the only metric we need to keep track of is the process of following a pointer from one allocation cluster to another. We call this a "cluster turn", or "turn". The number of turns that occur is important because it directly relates to the performance of file operations. A high number of turns could indicate that file cluster sizes need to be increased, for instance. So, we will add a static variable to track the number of turns. This will be used by all instances of our class. Here's the definition:
Then we modify the Pull routine to increment the turns each time it moves from one allocation cluster to another via the link pointer. Simple as that. Remember that there are some turns that occur in the Set_Size and Set_Root methods as well, so we'll add an increment statement there. This is how Pull looks now:
In the next article, we'll discuss some more changes we can make to the class to improve it. |