Front End

Parsing intentionally was separated from the IR-building methods in the high-level interface so that other front ends could be added independently. Some front ends may require more effort than others. For example, writing a parser for C++ is a challenge because its grammar does not fit easily into any of the grammar classes supported by standard generators. The GNU C++ compiler was able to use an LALR(1) grammar, but it looks nothing like the ISO C++ grammar. If any rules must be rearranged to add actions in a particular location, it must be done with extreme care to avoid breaking the grammar. Another problem is C++ has much more complicated rules than C as far as determining which symbols are identifiers versus type names, requiring substantial symbol table maintenance while parsing.

The C language was the original focus of the Cetus project.

C++ was the primary reason for allowing separate parsers, as discussed above.

The output of either the C parser or C++ parser is a parse tree file. The parse tree file is compatible with the graphviz package. It is possible to visualize the parse trees using graphviz, but only for small programs. Parse trees become unmanageably large for programs of more than a few hundred lines, however graphviz was useful for verifying the parser worked correctly by examining parse trees for small chunks of code.