This document explains the usage of cparse, the Rigi C-Language parser. The Rigi C-language parser is designed to extract entities and the relationships between entities from files of C source code. Currently the entities that the parser extracts are functions, datatypes, and variables. The relationships that the parser extracts are function calls between function entities, data accesses between functions and data structure entities, and variable references between functions and variables. The parser also extracts other attributes of the entities, such as their type (either Function or Data) and optionally the line number in the source code file that they were extracted from.
cparse reads preprocessed C programs from standard input and writes the entities, relationships, and attributes it extracts as RSF (Rigi Standard Format) 3-tuples to standard output. Usually, rigiparse is used to preprocess the C programs and pass them on to cparse.
Besides the standard Rigi node types (Unknown and Collapse), the cparse domain contains the following node types:
Besides the standard Rigi node types (level, composite, and multiarc), the cparse domain contains the following arc types. If you read the elements of an arc in the order element 2, element 1, element 3, then you will have an English sentence that explains the relation:
/* somefile.h */ extern int number; /* somefile.c */ #include "somefile.h" int number;cparse will generate an arc from the definition of the variable in somefile.c to the declaration in somefile.h.
cparse emits the following attributes for nodes:
Click here for a sample c program with annotated cparse output.
cparse accepts the following command line parameters:
-a: generate 'lineno' and 'file' tuples
If this option is used, attributes of type 'file' and 'lineno' will be created for every node. They will contain the file name and line number of the file where the artifact represented by node was defined.
-f[<prefix>]: emit full path name for all file names in RSF output
If this option is used, all file names in the RSF output will include a fully qualified absolute path. Otherwise, relative or absolute paths will be used (depending on the preprocessor output and compiler options). You will probably want to use this option, if you parse a large program that is distributed over several directories.
If the optional <prefix> is specified, the specified prefix will be removed from all file names that start with this prefix.
-h: print help on cparse usage
-o <file>: write output to <file> rather than to standard output.
-q: issue no relations in output
-s: stop at first syntax error
If this option is used, cparse will stop parsing and prints an error message with program context, whenever it encounters an error.
-x[<line>]: lex debug (start at <line>), used for debugging cparse
-4 emit file name and line number as element 4 of tuple
If this option is used, cparse will emit 4 element tuples. The fourth element of each node, arc, or attribute tuple will contain the file name, line number, and column of the artifact that triggered the creation of the tuple.
rigiparse can be used to parse one or more source code files and takes several command line arguments. The general form of the command is:
rigiparse [-h] [-w] [-pparser] [-Ipath] [-Ddefine] [-o file] file [file [file ...]]
When run, rigiparse processes the input files and generates RSF output that represents the dependencies contained in those files. Each file is first passed through the C preprocessor and then to the Rigi parser. By default the Rigi C parser (cparse) is used, but this can be changed through the -p flag (see below). The output of rigiparse is written to standard output, unless the -o option is used to specify an output file.
The RIGICPP environment variable can be used to override the name of the preprocessor and to supply command line arguments to the preprocessor. For example, to specify a directory where include files can be found, you might set RIGICPP to "cc -E -I/usr/include".
rigiparse accepts the following command line parameters:
-h: print help on cparse usage
-w: wait for user input before parsing
Causes the parser to wait for the user to press the enter key before parsing a file. This option may be useful when parsing multiple files.
-o <file>: use file for RSF output
Writes the generated RSF tuples to the file specified.
-p<parser>: use alternative parser
Causes a parser other than the default parser, cparse, to be invoked.-I<path>: path for include files
This option is passed on to the C preprocessor.
-D<define>: macro definition
This option is passed on to the C preprocessor.
The environment variable RIGICPP specifies which preprocessor program is to be used as a first stage in the parsing process. Command line arguments can also be passed to the preprocessor by using this environment variable. By default, rigiparse uses the command "cc -E" on the Unix platforms. On the IBM-compatible PC running Windows95 or NT, the default is "cl -E". Other common preprocessor commands are "CC -E" and "cpp -C".
If the program you want to investigate consists of several source files, you are probably using the make utility in order to build your program. If this is the case, you can use the rigimake script to build your program and parse it at the same time. rigimake reads the rigimake.env script for configuration options. If the default configuration does not work for you, check the cparse documentation directory for example configuration files.
Note that rigimake only parses those files that have to be recompiled (according to make). In order to obtain a full parse of your program, remove all object files before running rigimake.
Typically, you will use rigiparse to parse your C programs:
rigiparse -o myprogram.rsf myprogram.c
If your program consists of more than one source file, specify all the source files in the command line:
rigiparse -o myprogram.rsf file1.c file2. file3.c
Use sortrsf to remove duplicate tuples from the RSF file (the -m option will bundle multiple arcs with the same source and destination into a single arc of type multiarc.
sortrsf -m < myprogram.rsf > myprogram.rsf.sorted
Now you can load myprogram.rsf.sorted into rigiedit
Use rigiparse with the -4 option to parse your C programs(s):
rigiparse -4 -o myprogram.rsf myprogram.c ...
Use sortrsf with the -4 option to remove duplicate tuples from the RSF file:
sortrsf -4 < myprogram.rsf > myprogram.rsf.sorted
Use htmlrsf to htmlize the source and create nodeurl attributes:
htmlrsf -pxa prototypes,calls,references,declares,tagged -b type < myprogram.rsf.sorted > myprogram.rsf.htmld
Now you can load myprogram.rsf.htmld into rigiedit. If you double-click on a node, netscape will be started to display the source code for the node.
Some C compilers use custom keywords that cparse does not know about. In most cases, you can just define them (using #define) to an empty string. Store all your custom definitions in a file.
Here are some sample definition files: If the environment variable RIGICPP is defined, rigiparse uses the command stored in it to preprocess the C source files. To tell rigiparse to use the definitions in your definitions file, type in the command:export RIGICPP="gcc -E -imacros <filename>"or, if you are using csh
setenv RIGICPP "gcc -E -imacros <filename>"This manual page is maintained by Johannes Martin