Homework 3
Parsing  

powered by FreeFind

Modified: 

Overview

The text examines two programs that generate Java parsers: JavaCC and SableCC. Examples from the text for each generator will be given with instructions on their use. Chapter 3 notes provide greater details of using the two generators.

SableCC is the result of a master's degree project, the thesis forms the basic documentation. It can be obtained at: http://sablecc.org/downloads/thesis.pdf

Assignment

Complete the parsing part of the SableCC specifications for MiniJava grammar.

Files

Use the MiniJava files from Homework 2 lexical analysis as a starting point.

Getting Started

Turn In

  1. Cover sheet with your name, C431, date, and Homework 3.
  2. Print out of SableCC specification file for MiniJava.
  3. Email SableCC specification file for MiniJava to: with subject: HW3 Solution
  4. Screen shot listing of Command prompt window showing generation and analysis of Sample.java program for SableCC. If you are only able to complete part of the grammar (e.g. Exp and Statement), print the test files used.

Basic SableCC Grammar

Tokens
  add = '+';
  sub = '-';
  mul = '*';
  div = '/';
  left_paren = '(';
  right_paren = ')';
  number = [0-9]+;
  whitespace = (' ')+;
Ignored Tokens
  whitespace;
Productions
  expr = {add} [left]:expr add [right]:factor
          | {sub} [left]:expr sub [right]:factor
          | {factor} factor;

  factor= {mul} [left]:factor mul [right]:value
           | {div} [left]:factor div [right]:value
           | {value} value;

  value = {number} number
           | {parens} left_paren expr right_paren;

  exprs = expr*;

Complete Example

The following is an example using the StraightLine grammar that includes the SableCC grammar, a Java program to call the parser and a test StraightLine program.

StaightLine.js

Package StraightLine;
 
Tokens
            semi = ';';
            lpar = '(';
            rpar = ')';
            comma = ',';
            plus = '+';
            minus = '-';
            times = '*';
            div = '/';
            assign = ':=';   
            print = 'print';
            whitespace = (' ' | '\t' | 13 10  | 10 | 13 )+;  // 13 10 is '\r' '\n'
            num = ['0'..'9']+;
            id = ['a'..'z'] (['a'..'z'] | ['0'..'9'])*;
 
Ignored Tokens
            whitespace;
 
Productions
            program = stm;
 
            stm =   {compoundstm} onestm stmtail*;
            onestm ={assignstm} id assign exp |
                        {printstm} print lpar explist rpar;
            stmtail=semi onestm;
            exp =   {plusexp} exp plus term |
                       {minusexp} exp minus term |
                       {term} term |
                       {eseqexp} lpar stm comma exp rpar;
            term =  {timesterm} term times factor |
                        {divterm} term div factor |
                       {factor} factor;
            factor= {idexp} id |
                        {numexp} num;
            explist={pairexplist} exp comma explist |
                        {lastexplist} exp; 

    Main.java

package StraightLine;

import StraightLine.lexer.*;
import StraightLine.node.*;
import StraightLine.parser.*;
import java.io.*;

public class Main{

   public static void main(String[] arguments){
     try{
       Parser parser = new Parser(
            new Lexer(
               new PushbackReader(
                    new InputStreamReader(System.in), 1024)));

      Start ast = parser.parse();
    }
    catch(Exception e){
           System.out.println("Error: " + e.getMessage());
    }
  }
}

Test

a := 5 + 3 * 4;

b:=(print(a,a-1),10*a);

print(b)

 

Hints

  1. Start from the bottom-up by defining Exp first, then Statement, etc. Test by parsing a file that contains only the defined grammar. For example:
  2. SableCC is an LALR parser so left-recursion is not an issue.
  3. Should you get a shift-reduce error or reduce-reduce errors, look for ambiguous rules (e.g. Exp = Exp + Exp). This particular one can be made unambiguous by standard transformation such as in Grammar 3.8 and the StraightLine grammar above.