Each of the Rigi source parsers generate as output a flat file consisting of a stream of either 3-tuples or 4-tuples mixed with 3-tuples. These tuples are said to be in Rigi Standard Format (RSF). The tuples declare objects parsed from the program (nodes) and describe the relationships between them (directional arcs). For example, an RSF file typically declares program functions and their call relationships.
The 4-tuple RSF variant carries as its fourth component the location in the source where either the node or arc is declared. This is in addition to such attributes in the RSF file as file and lineno which are often generated for nodes. The fourth component makes it possible to locate the node or arc reference in the source code, a feature that is necessary for marking up the source for web browsing.
Consider, for example, the following source code named example.c that is written for the Microsoft® Windows95® platform:
1 /* example.c */ 2 #include "msvckw.h" 3 #include <stdio.h> 4 5 void Hello( char *name ) { 6 fprintf( stdout, "Hello %s\n", name ); 7 } 8 9 void main() { 10 char *userName; 11 fputs( "Please enter your name: ", stdout ); 12 fgets( userName, stdin ); 13 Hello( userName ); 14 }When this is parsed using the command:
rigiparse -4 example.c > rsfthe following RSF is generated:
type _iobuf Data \\msvc20\\include\\stdio.h,120,15 type Hello Function example.c,5,26 call Hello fprintf example.c,6,38 type main Function example.c,9,13 call main fputs example.c,11,30 call main fgets example.c,12,37 call main Hello example.c,13,25The the location information given in the fourth component of a tuple can be used to place an HTML anchor or bookmark. The command:
htmlrsf -a call -b type -p < rsfresults in the following HTML file named example.c_0001.html:
<html><head><body bgcolor=#ffffff><pre>/* example.c */ #include "msvckw.h" #include <stdio.h> <a name=5></a>void Hello( char *name ) { fprintf( stdout, "Hello %s\n", name ); } <a name=9></a>void main() { char *userName; fputs( "Please enter your name: ", stdout ); fgets( userName, stdin ); <a href=example.c_0001.html#5>Hello</a>( userName ); } </pre></body></html>and a new, 3-tuple version of the RSF file that contains a new set of node attributes, nodeurl, that can be used by rigiedit, the Rigi graph editor:
type "Hello" "Function" nodeurl "Hello" "example.c_0001.html#5" type "_iobuf" "Data" nodeurl "_iobuf" "stdio.h_0001.html#120" type "main" "Function" nodeurl "main" "example.c_0001.html#9" call "Hello" "fprintf" call "main" "Hello" call "main" "fgets" call "main" "fputs"The HTML file has anchors and bookmarks for all of the valid Functions and calls found in the RSF file. The RSF data indicate that there should be a bookmark for _iobuf on line 120 of stdio.h. In fact, htmlrsf does create a file named stdio.h_0120.html that contains such a bookmark. However, because _iobuf is a data type rather than a Function, there is no call arc that references it and, thus, there is no hyperlink to it.
The htmlrsf program is run as a filter. It accepts a stream of 4-tuple RSF data from stdin and writes a stream of 3-tuple RSF to stdout. Errors are reported to stderr. The input stream describes the source in an application subsystem. The program uses the input data to locate the source files, to copy them to the current working directory, and to mark them up with HTML tags.
A typical command might be:
rigiparse -4 pgm.c | sortrsf -4 | htmlrsf -pxa call -b Function > rsf
The following command line arguments are supported:
-a | specifies those RSF keywords that should trigger the insertion of an anchor. If there are two or more keywords, the -a argument should be followed by a comma delimited list, e.g., -a call,fetch,store. In the example above, call is used in the C-language domain to indicate that one Function calls another. By specifying -a call, we have indicated that htmlrsf should establish an anchor at each location associated with a call tuple. An anchor will not be created unless it has a valid bookmark; htmlrsf will not permit broken links to be created. The RSF keywords are determined by the parser and the domain model for which it is coded. |
-b | specifies those RSF keywords that should trigger the insertion of a bookmark. If there are two or more keywords, the -b argument should be followed by a comma delimited list, e.g., -b procdef,filedef. In the example above, type is used in the C-language domain to specify the type of an object, e.g., Function (C-language procedure) or Data (data structure). By specifying -b type, we have indicated that htmlrsf should establish an anchor at each location associated with a type tuple. A bookmark will be created regardless of whether or not an anchor references it. |
-h | displays a banner and a brief description of the command line options. |
-l | causes the source file(s) to be segmented. Every occurence of a bookmark triggers a new segment. If -l is followed by a number, maxLines, the maximum number of lines in a segment is limited to maxLines. When the maximum number of lines is reached, a new segment is created. The segments of each source file are connected together by Up and Down hyperlinks. |
-p | writes the HTML file using the preformatted HTML tag ("<pre>"). In this mode, the "<" characters are converted to "<" to prevent them from being interpreted as HTML markups. |
-t | writes each HTML file using the template file that is provided as an accompanying argument: -t templateFile. The template file may contain a header and a footer specified as follows:header [=] { text } footer [=] { text } |
-x | expands tab characters to spaces. If -x numSpaces is specified, the tab settings are set at intervals of numSpaces, otherwise the interval size defaults to four. If -x is not specified, there is no substitution of tab characters. |
-l | causes RSF attributes designated as loc to be concatenated. The loc attribute identifies the file and line number where a specific node, e.g., a given procedure, is defined in the source. Sometimes when the RSF is derived from more than one compilation unit, the parser may generate calculate discrepant locations for a given globally visible node. In these cases, the different loc attributes can be concatenated and shown as a single composite loc attribute. Note that loc tuples will always be 3-tuples. |
-m | generates a multiarc if two or more arcs of different arc types have a common source node and a common destination node. This command line argument operates on 3-tuples only. |
-4 | causes 4-tuples to be preserved. If there are 4-tuples in the RSF file but this command line argument is unspecified, the fourth component of the tuple is removed. If htmlrsf is run with a -4 command line argument, 4-tuples are left unchanged. |
htmlrsf writes the HTML text to files in the current working directory. The file names are generated by appending "_nnnn.html" to the name of the original source file and by replacing all directory separators (forward/backward slashes and colons) by underline characters (i.e. /usr/include/stdio.h becomes _usr_include_stdio.h). The "nnnn" is the line number in the original source file that corresponds to the first line in the HTML text file. Thus, "example.c_0009.html" indicates that the first line of the HTML file corresponds to line nine of the source file "example.c".
The source files are located by following the path given in the fourth component of the RSF tuples. If a source file is not found, htmlrsf terminates after displaying an error message.
All tuples in the RSF input file (stdin) are written to the RSF output file (stdout). Comments in the RSF file (those lines in the file that begin with "#") are written to stderr but not to stdout.
Consider the C-language program described in the introduction. Suppose that the program is parsed using the following command.
rigiparse example.c | sortrsf -4 | htmlrsf -lxa call -b type > rsf
Although execution of sortrsf is not strictly necessary, this filter program reduces the size of the input RSF file by removing duplicate tuples (which in this example do not exist).
Because the -l argument is specified, htmlrsf segments the example.c file into three HTML files: example.c_0001.html, example.c_0005.html, and example.c_0009.html. stdio.h is split into stdio.h_0001.html and stdio.h_0120.h as well.
Each tab is converted to spaces because -x is specified. Each space is represented as " in the marked up source because -p argument is not specified. The three HTML text files follow:
example.c_0001.html <html><head><body bgcolor=#ffffff><br> /* example.c */<br> #include "msvckw.h"<br> #include <stdio.h><br> <br> <a href=example.c_0005.html>Down</a></body></html><br> example.c_0005.html <html><head><body bgcolor=#ffffff><a href=example.c_0001.html>Up</a> <br> void Hello( char *name ) {<br> fprintf( stdout, "Hello %s\n", name );<br> }<br> <br> <a href=example.c_0009.html>Down</a></body></html><br> example.c_0009.html <html><head><body bgcolor=#ffffff><a href=example.c_0005.html>Up</a> <br> void main() {<br> char *userName;<br> fputs( "Please enter your name: ", stdout );<br> fgets( userName, stdin );<br> <a href=example.c_0005.html>Hello</a>( userName );<br> }<br> </body></html><br>
Each of the three files has a default header, <html><head><body bgcolor=#ffffff>, and a default footer, </body></html><br>. These can be replaced by user-specified headers and footers by specifying a template file. The three files also have Up and Down anchors to link the segments together.
A given component in the RSF tuple is used in conjunction with the file location to position an anchor around its target identifier. If the identifier is not found in the text, no anchor is inserted. Unlike anchors, bookmarks can be positioned without finding their target. However, htmlrsf verifies each anchor by searching for a corresponding bookmark identifier. For example, with:
type Hello Function example.c,5,26 call main Hello example.c,13,25the "call" tuple specifies the anchor and the "type" tuple specifies the bookmark. Assuming that "Hello" can be found on line 13 of example.c, the anchor determined by the "call" tuple will be inserted because there is a bookmark similarly named "Hello".
It is possible to specify more than one type of tuple for anchors and bookmarks. There is, however, no means to group types of anchors with types of bookmarks. Any given anchor is resolved using a table of all possible bookmarks.
Anchors can be identified by a sequence of subnames separated by any punctuation characters except "$", "_", or "^". Whenever a compound name is detected, only the first subname in the sequence is used to match the bookmarks. To determine the position of the anchor in the text, the line of text is searched for each subname in the sequence starting at the left-most. The search ends when a match is found or the sequence is exhausted.
It is possible to create inaccurate hyperlinks; the anchor search does not mediate between competing RSF tuples. Furthermore, because the search is conducted on only a portion of a name sequence, it is possible to tag the wrong identifier. As a safeguard, htmlrsf can not generate nested anchors or bookmarks.
The semantics of tuples in RSF are determined by the domain in which their keywords are defined and for which the source parser was written. Consider the following arc:
call main Hello example.c,13,25The first tuple component is the keyword itself. The second component is the caller and the third component is the function that is being called. The fourth component is the location in the source file where the call occurred. The location consists of the name of the source file, example.c, the line number, 13, and the column number in the preprocessed source file (which is ignored). In this example a call arc behaves like an anchor in the HTML model; the third component is the target identifier for the anchor and the fourth component indicates the placement of the anchor.
An anchor is not inserted in the HTML text unless it has a corresponding bookmark. For the RSF data to be used in a manner consistent with the HTML model, there must be a tuple with the same identifer as the call arc that can be designated a bookmark. In this example, we must use type tuples to describe the bookmark locations.
type Hello Function example.c,5,26In this case, the second component of the tuple is the target identifier for the bookmark and the fourth component indicates its placement in the source.
type is an RSF built-in keyword. Tuples that are type tuples always have the same semantics with the second component naming the object, which is shown as a node in rigiedit. This presents three problems: 1) unlike other tuples, the third component is required to discriminate between different types of objects, 2) the typed object is identified by the second component, not the third, and 3) a type tuple is frequently treated by tools as a statment of existence and classification, not of object definition.
The rigiparse C-language parser generates a type 4-tuple with the location of the object definition. Other parsers may generate a 4-tuple that explicitly defines an object rather than overloading the type tuple:
filedef example.c Hello example.c,5,26The bookmark identifer in one of these tuples can be found in the third component unlike with the type tuple. Thus, the type tuples are treated as being semantically different.
In summary, with the exception of type tuples, htmlrsf requires that the parser generate tuples for anchors and bookmarks such that the third component is the target identifier and the fourth component is the location of the respective anchor or bookmark. Furthermore, each anchor should have a corresponding bookmark.
The following are runtime warnings and errors that may be reported by htmlrsf to stderr.