SORTRSF

Introduction

The Rigi graph editor, rigiedit, accepts as input a flat file of 3-tuples that 1) declare nodes by their types, 2) describe the relationships (directional arcs) between specific nodes, and 3) annotate a given node. The file format used is called Rigi Standard Format (RSF). The Rigi parser, rigiparse, produces RSF in two formats, unstructured and structured. An RSF file without structure lacks even the most rudimentary hierarchical graph format. The simplest structured graph has at least two hierarchical levels: the topmost consists of a single node named Root that is connected to all other nodes in the graph by level arcs, i.e., arcs that traverse between nodes at different levels in the hierarchy.

The parser can produce either RSF format but only use of the unstructured format is recommended because the unstructured file can be manipulated with utilities like the UNIX sort, sortrsf and htmlrsf. rigiedit is capable of reading the unstructured RSF and converting it into structured RSF. These utilities can have unexpected results when applied to structured RSF.

The sortrsf filter program is used primarily to sort an RSF file. The program also removes duplicate tuples. Duplicates are caused by merging the RSF output from multiple compilation units. They can also be generated inadvertently by the parser (although rigiparse does not emit duplicates, other parsers may). Furthermore, sortrsf can replace with a single multiarc arc two or more arcs of different types having a common node as their source and a common node as their destination. This capability is useful because rigiedit can represent only a single arc between a given source and destination.

RSF has a variant, a 4-tuple format, that is used by the htmlrsf program to position HTML links in the source files. For 4-tuples, sortrsf, can be used to remove duplicate tuples and, optionally, it can be used to strip the fourth element from each of the tuples. However, sortrsf can not be used to generate multiarc arcs.

Invoking sortrsf

The program is run as a filter. It accepts input from stdin and writes its output to stdout. Errors are reported to stderr. A typical command might be:

rigiparse pgm.c | sortrsf -lm > pgm.rsf

The following command line arguments are supported:

 -h
displays a banner and a brief description of the command line options.
  -l
causes RSF attributes designated as loc to be concatenated. The loc attribute identifies the file and line number where a specific node, e.g., a given procedure, is defined in the source. Sometimes when the RSF is derived from more than one compilation unit, the parser may generate calculate discrepant locations for a given globally visible node. In these cases, the different loc attributes can be concatenated and shown as a single composite loc attribute. Note that loc tuples will always be 3-tuples.
 
Example:
   procdef      Myproc          file1.c,10
   procdef      Myproc          file1.c,17
becomes
   procdef      Myproc          file1.c,10;file1.c,17
 -m
generates a multiarc if two or more arcs of different arc types have a common source node and a common destination node. This command line argument operates on 3-tuples only.
 
Example:
   fetch        Myproc          myvar
   store        Myproc          myvar
becomes
   multiarc     Myproc          myvar
 -4
causes 4-tuples to be preserved. If there are 4-tuples in the RSF file but this command line argument is unspecified, the fourth component of the tuple is removed. If sortrsf is run with a -4 command line argument, 4-tuples are left unchanged.

Line-based comments in the input RSF file can be represented with a hash "#" character. The comments are removed by sortrsf from the RSF stream and written to stderr.

An example

The following is a C-language program.

/* example.c */
struct struct_t { int i; } sglobal;

int func(struct struct_t *sarg) {
    struct struct_t sfunc = *sarg;
    return sfunc.i;
}

void main() {
    int j;
    struct struct_t smain, *psmain;
    smain.i = 3;
    psmain = &smain;
    j = func(psmain);
    sglobal.i = j;
}

The program is parsed using the following command.

rigiparse example.c > tmp.rsf

The following is the output file from rigiparse.

type    struct_t  Data
file    struct_t  example.c
lineno  struct_t  2
type    func      Function
file    func      example.c
lineno  func      4
data    func      struct_t
type    main      Function
file    main      example.c
lineno  main      9
data    main      struct_t
call    main      func
data    main      struct_t

The RSF file is sorted with the following command.

sortrsf < tmp.rsf > srt.rsf

The following is the output file from sortrsf.

type    func      Function
type    main      Function
type    struct_t  Data
call    main      func
data    func      struct_t
data    main      struct_t
file    func      example.c
file    main      example.c
file    struct_t  example.c
lineno  func      4
lineno  main      9
lineno  struct_t  2

The RSF file is sorted with type at the top. The remaining tuples are sorted in ascending order. The duplicate data entry for main has been removed.

Exceptions

The following are runtime warnings and errors that may be reported by sortrsf to stderr.

Invalid argument <badArg>
One or more command line arguments have been incorrectly entered. <badArg> is the first in error argument that has been detected.
Insufficient memory
sortrsf is unable to allocate sufficient memory for its tables. The RSF file is too large to process.
RSF tuple has fewer than 3 components at line <lineNum>
An RSF tuple is only a 1-tuple or a 2-tuple. Blank lines and comments (those lines starting with a "#" character) are ignored. The error occurs at <lineNum> in the input RSF file.
RSF tuple has greater than <num> components at line <lineNum>
An RSF tuple has either greater than 1) three tuples if no -4 command line argument was specified, or 2) four tuples if the -4 command line argument was specified. This is a warning message; the superfluous components are removed from the tuple and sortrsf continues to execute. The warning occurs at <lineNum> in the input RSF file.


30 June 1998   Maintainer: Johannes Martin jmartin@csr.uvic.ca