Tuesday, January 28, 2014

Parsing the input

Parsing the input

There are many different file formats for storing 3d models.  The format most commonly used in 3d printing is STL, so this is the file format the slicer will support.  There are two different types of STL files, binary and ASCII.  Intially the slicer will support only ASCII STL files.

The structure of the STL file can be expressed with the following BNF grammar:

<stl>::=<solid>+
<solid>::= "solid" [STRING] <facet>* "endsolid" [STRING]
<facet>::= "facet" <normal> <loop> "endfacet"
<normal>::= "normal" NUMBER NUMBER NUMBER
<loop>::= "outer" "loop" <vertex> <vertex> <vertex> "endloop"
<vertex>::= "vertex" NUMBER NUMBER NUMBER

The terminal symbols (also known as tokens) are:  NUMBER, STRING, "solid", "endsolid","facet","endfacet","normal","outer","loop","endloop","vertex"

The non-terminals are:  <stl>, <solid>, <facet>, <normal>, <loop>, and <vertex>

For more information about the STL file format you can peruse the wiki page.

Scanning for tokens

We can represent the tokens (terminal symbols) with an enumeration.  The code would look like to the following:

enum Token {
    NUMBER
   ,STRING
   ,SOLID
   ,ENDSOLID
   ,FACET
   ,ENDFACET
   ,NORMAL
   ,OUTER
   ,LOOP
   ,ENDLOOP
   ,VERTEX
}

Now we need some way to distinguish the tokens so that our scanner can pick them out of the input.  So we associate a regex pattern with each instance.


enum Token {
    NUMBER(/^[+-]?[0-9]*\.?[0-9]*(e[+-]?[0-9]+)?$/)
   ,STRING(/^\S+$/)
   ,SOLID(/^solid$/)
   ,ENDSOLID(/^endsolid$/)
   ,FACET(/^facet$/)
   ,ENDFACET(/^endfacet$/)
   ,NORMAL(/^normal$/)
   ,OUTER(/^outer$/)
   ,LOOP(/^loop$/)
   ,ENDLOOP(/^endloop$/)
   ,VERTEX(/^vertex$/)

   String pattern
   Token(String pattern) {
      this.pattern = pattern
   }
}

We're now ready start the outline of our TokenScanner class.  Typically a scanner needs to be able to:
  1. Peek at the the next token to find out if it matches a specific, expected token.
  2. Advance past the next token if it matches a specific, expected token.

class TokenScanner {

   /**
    * Returns true if the specified token is the next token.
    * @param token
    * @return
    */
   boolean peek(Token token) {
      return false 
   }
 
   /**
    * Advances if the next token matches the specified token.
    * @param token
    * @return
    * @throws ParseException if the specified token doesn't match the next token.
    */
   String advance(Token token) {
      return "" 
   }
}

Tokens are queued up as they're read from the input. So we'll add a LinkedList to hold the tokens, and implement peek() and advance() as follows:

class TokenScanner {

   LinkedList<String> tokens = [] //will be treated as a queue of tokens

   /**
    * Returns true if the specified token is the next token.
    * @param token
    * @return
    */
   boolean peek(Token token) {
      (tokens?.peek() ==~ token.pattern)
   }
 
   /**
    * Advances if the next token matches the specified token.
    * @param token
    * @return
    * @throws ParseException if the specified token doesn't match the next token.
    */
   String advance(Token token) {
      if (!peek(token)) {
         throw new ParseException("Invalid token: '${tokens.peek()}'.  Expected ${token.name()}".toString(),-1)
      }
      (tokens?.pollFirst())
   }
}

The tokens property is explicitly specified as LinkedList. This allows tokens to be efficiently added to, and removed from the queue.

The class is still missing code to populate the queue. For ease of implementation we'll use a BufferedReader so that we can read the input a line at a time. We'll also need to modify peek() to ensure that the token queue is not empty. Finally we'll add two convenience constructors, one that take a filename, and one that takes an InputStream.

class TokenScanner {

   BufferedReader reader
   LinkedList<String> tokens = [] //will be treated as a queue of tokens

   TokenScanner(String filename) {
      this(new BufferedReader(new FileReader(filename)))
   }
 
   TokenScanner(InputStream stream) {
      this(new BufferedReader(new InputStreamReader(stream)))
   }
 
   TokenScanner(BufferedReader reader) {
      this.reader = reader
   }

   /**
    * Ensures that the token queue is populated
    */
   private void readTokens() {
      while (!tokens) {
         String line = reader.readLine()
         if (!line) break
         line.split("\\s+").each { token ->
            if (token)
               tokens << token
         }
      }
   }

   /**
    * Returns true if the specified token is the next token.
    * @param token
    * @return
    */
   boolean peek(Token token) {
           readTokens()
           (tokens?.peek() ==~ token.pattern)
   }

   /**
    * Advances if the next token matches the specified token.
    * @param token
    * @return
    * @throws ParseException if the specified token doesn't match the next token.
    */
   String advance(Token token) {
      if (!peek(token)) {
         throw new ParseException("Invalid token: '${tokens.peek()}'.  Expected ${token.name()}".toString(),-1)
      }
      (tokens?.pollFirst())
   }
}

Creating the parser

Since this is such a simple grammar we'll use recursive descent to do our parsing.  Initially we'll create our parser class with one method for each non-terminal.  Each method will return an object representing the non-terminal.  Since the non-terminals are:  <stl>, <solid>, <facet>, <normal>, <loop>, and <vertex>,  initially this will look like the following:

class STLParser {
   STLModel stl() {
   }
 
   STLSolid solid() {
   }

   STLFacet facet() {
   }

   STLNormal normal() {
   }
 
   STLLoop loop() {
   }
 
   STLVertex vertex() {
   }
}

The parser will use our TokenScanner class to scan the input for tokens.  We'll need to add a TokenScanner property, and constructors to initialize that property:

class STLParser {

   TokenScanner scanner

   STLParser(String filename) {
      scanner = new TokenScanner(filename)
   }
 
   STLParser(InputStream stream) {
      scanner = new TokenScanner(stream)
   }

   STLModel stl() {
   }
 
   STLSolid solid() {
   }

   STLFacet facet() {
   }

   STLNormal normal() {
   }
 
   STLLoop loop() {
   }
 
   STLVertex vertex() {
   }
}
 

Peeling the onion...

Now let's start to flesh out our parser. The top level production in our grammar is:

<stl>::=<solid>+

This indicates that an stl is composed of one or more <solid> expressions. When writing the code for our parser methods, each symbol on the right side of the ::= must be processed in order.  So our stl() method needs to process one or more <solid> expressions.  Processing of a non-terminal expression is done by calling the non-terminal expression's corresponding parser method.  After processing a <solid> expression, we check to see if there are more <solid> expressions by using the scanner to peek at the input and see if the next symbol is the token represented by the string literal, "solid".

   STLModel stl() {
      List<STLSolid> solids = []
      solids << solid()
      while(scanner.peek(Token.SOLID)) {
         solids << solid()
      }
      new STLModel(solids)
   }

Our method also needs to return an object that stores the information that has been parsed.  Since stl() processes one or more solids,  it needs to store the list of solids and provide a way to initialize it.

class STLModel {
   List<STLSolid> solids

   public STLModel(List<STLSolid> solids = []) {
      this.solids = solids
   }
}

Slicing deeper...

Having finished the method for parsing <stl> expressions, we now need to define the method for parsing <solid> expressions.  The production has the following form:

<solid>::= "solid" [STRING] <facet>* "endsolid" [STRING]

Again we need to parse the symbols on the right side of the ::= in the order they appear.  Tokens are processed by using the TokenScanner to advance the input stream past them.  Non-terminals are processed by calling their corresponding parser method.  So in general terms, the solid() method should:
  1. Scan for the keyword "solid"
  2. Create an STLSolid object to represent the return value.
  3. Scan for a String (the name of the solid)
  4. Peek at the input for a <facet> expression.
  5. If there is a <facet> expression present in the input stream, process it by calling facet() and store the facet information in the STLSolid return value.
  6. Repeat steps 3 and 4 until there are no more <facet> expressions to be processed.
  7. Scan for the keyword "endsolid".
  8. Scan for a String (the name of the solid, it should match the name from step 2)
Accordingly, we'll update the solid() method as follows:

   STLSolid solid() {
      scanner.advance(Token.SOLID)
      STLSolid solid = new STLSolid()

      if (!scanner.peek(Token.FACET)) {
         solid.name = scanner.advance(Token.STRING)
      }
      List<STLFacet> facets = []
      while(scanner.peek(Token.FACET)) {
         STLFacet facet = facet()
         facets << facet
      }
      solid.facets = facets
      scanner.advance(Token.ENDSOLID) 
 
      //the string is optional, so peek for name, and make sure it's not the
      //start of a second solid.
      if (!scanner.peek(Token.SOLID) && scanner.peek(Token.STRING)) {
         def endName = scanner.advance(Token.STRING)
 
         //if the name from the start of the solid doesn't match
         //the name from the end, then part of the file is missing
         //or corrupt, so throw a ParseException.
         if (solid.name != endName) throw new ParseException("Invalid name: '${endName}'.  Expected '${solid.name}'".toString(),-1)
      }
      solid
   }

This code adds a little bit of error checking to make sure that the name specified after endsolid matches the name specified after solid.

Deeper still...

Next comes the implementation of facet().  The grammar rule is:
<facet>::= "facet" <normal> <loop> "endfacet" 
 

STLFacet facet() {
   scanner.advance(Token.FACET)
   STLNormal normal = normal()
   STLLoop loop = loop()
   scanner.advance(Token.ENDFACET)
   new STLFacet(normal: normal, loop: loop)
}

STLFacet will need to be updated to store the normal and loop:

class STLFacet {
   STLNormal normal
   STLLoop loop
}

Hopefully you're starting to see the pattern here. Every terminal has a corresponding value in the Token enumeration, and every non-terminal has a corresponding parser method and class definition for storing the return value of the parser method.  In many cases, creating a separate class for each non-terminal is not necessary.  For now, we'll just mechanically apply the pattern and get the parser working.

Changes to parser method normal():

STLNormal normal() {
      scanner.advance(Token.NORMAL)
      double x = scanner.advance(Token.NUMBER) as double
      double y = scanner.advance(Token.NUMBER) as double
      double z = scanner.advance(Token.NUMBER) as double
      new STLNormal(x: x, y: y, z: z)
}

and the updates to STLNormal:

class STLNormal {
 double x
 double y
 double z
}

Changes to parser method loop():

STLLoop loop() {
   scanner.advance(Token.OUTER)
   scanner.advance(Token.LOOP)

   STLVertex v1 = vertex()
   STLVertex v2 = vertex()
   STLVertex v3 = vertex()

   scanner.advance(Token.ENDLOOP)
   new STLLoop(vertices: [v1,v2,v3])      
}


Changes to STLLoop:

class STLLoop {
 List<stlvertex> vertices
}
Changes to parser method vertex():

STLVertex vertex() {
   scanner.advance(Token.VERTEX)
   double x = scanner.advance(Token.NUMBER) as double
   double y = scanner.advance(Token.NUMBER) as double
   double z = scanner.advance(Token.NUMBER) as double
   new STLVertex(x: x, y: y, z: z)
}
Changes to class STLVertex:

class STLVertex {
 double x
 double y
 double z
}

No comments:

Post a Comment