Customizable Parsing Test Repository

Class

Language

The Language class represents one of the languages among which a Converter can convert notation. To add a new language to a Converter instance, just call the constructor of this class, passing the Converter instance. Every Language instance needs a Converter instance to which it belongs, because the purpose of this class is to enable conversions among languages, by working together with a Converter instance.

Each instance of this class contains the data passed to its constructor, plus a reference to its Converter, plus a tokenizer and parser for reading notation written in the language represented by this instance. After creating an instance of this class, you specify the language itself by repeated calls to the addNotation() function. This is necessary in order for the language to have any definition at all, before using it to parse text.

Although you can call methods in this class directly to parse text, such as parse() and convertTo(), it is simpler if you instead use the convert() method of the corresponding Converter instance. Using the Converter will save you from having to deal with intermediate forms like abstract syntax trees, but you can create and work with such objects if you wish.

Constructor

new Language(name, converter, groupers, linternullable)

To add a new language to a Converter, just call this constructor, passing the converter object as one of the parameters, and this constructor will add this language to that converter.

The default grouping symbols for any newly installed language are ( and ). You may wish to override this default for a number of reasons. For example, internally, when this class adds the putdown language, it specifies the empty array, since putdown uses no grouping symbols. And if you were to define a parser for LaTeX, you might want to add the symbols { and } to the list, since they are groupers in LaTeX. Note that the array must be of even length, pairs of open and close groupers, in that order, as in [ '(', ')', '{', '}' ] for LaTeX.

The default linter for any language is the identity function, meaning that no cleanup is needed for expressions of that language. If you want the convert() function, upon creating an expression in this language, to apply to it any specific formatting conventions you would like to see in the output, you can specify a linter, which will be run before convert() returns its result. Add such a function only if you see output from convert() that doesn't meet your standards, aesthetically or for some functional reason.

For example, when installing putdown as the initial language in the constructor for this class, it provides a linter that removes unnecessary spaces around parentheses.

Parameters

  • name String

    the name of the new language (e.g., "latex")

  • converter Converter

    the converter into which to install this language

  • groupers Array.<String>

    any pairs of grouping symbols used by the language (as documented above)

  • linter function <nullable>

    a function that cleans up notation in this language (as documented above)

Source

Classes

Language

Members

regularExpressions

This static member of the class contains regular expressions for some common types of notation. The following regular expressions are available to make it easier to define new concepts or notations.

  • oneLetterVariable - a single letter variable expressed in Roman letters (lower-case or upper-case A, B, C, ...)
  • nonnegativeInteger - an integer expressed using just the digits 0-9
  • integer - same as the previous, but possibly preceded by a -
  • nonnegativeNumber - a number expressed using the digits 0-9 and a decimal point (optional)
  • number - same as the previous, but possibly preceded by a -

Source

Methods

static

fromJSON(name, converter, json, addConceptsAlso)

Rather than creating an empty language using this class's constructor, then adding each notation with a separate function call, you can construct an instance and add all the notations in one function call with this method.

The format for the JSON data structure passed as the third argument is as follows.

  • It should have a "groupers" field that is an array of strings containing the exact same data you would pass as the groupers argument to the constructor.
  • It should have a "notations" field that is an array of objects, each object having the fields "concept", "notation", and "options", which correspond directly to the three parameters of the addNotation() function.

To see an example of such a data structure, examine the contents of the file latex-notation.js in this repository.

Parameters

  • name string

    the name of the language, just as in this class's constructor

  • converter Converter

    a Converter instance, just as in this class's constructor

  • json Object

    the JSON representation of the language, as described above

  • addConceptsAlso boolean true

    if true, before constructing the Language, examine all built-in concepts mentioned in any of its notations, and add them to the converter using addBuiltIns().

Source

addNotation(conceptName, notation, optionsnullable)

Add a new notation to this language, for one of its converter's concepts. Specify the name of the concept being represented, then the notation using a string in which the letters A, B, and C represent the first, second, and third arguments, respectively. (You can omit any arguments you do not need. For example, you might write A+B for addition, -A for negation, or just \\bot for the logical constant "false" in LaTeX.)

The options object supports the following fields.

  • If you need to use one of the letters A, B, or C in the notation itself, or if you need to use more than three parameters in your notation (continuing on to D, E, etc.) then you can use the options object to specify the variables in your notation. For example, you could use notation x+y and then use the options object to specify { variables : [ 'x', 'y' ] }. Note that every occurrence a variable counts as the variable (except inside another word) even if used multiple times. So choose variable names that do not show up in the new notation you are introducing.
  • If this notation should be used only for representing the concept in this language, but not for parsing from this language into an AST, then you can set writeOnly : true. This can be useful in two cases.
    1. If you have multiple notations for the same concept in some languages, but not in others. You can map each notation to a separate concept, then map all concepts to one notation in the smaller language, marking all but one as write-only, thus establishing a canonical form. And yet between any two languages that support all the notations, translation can preserve the notational subtleties.
    2. If you have some notation that is just a shorthand for a more complex notation, you can parse the notation to a concept named for that notation, but convert to putdown form in a write-only way, expanding the notation to its underlying (compound) meaning. Then the converter will not attempt to invert that expansion when parsing putdown, but will preserve its expanded meaning.

There are no other options at this time besides those documented above, but the options object is available for future expansion.

Parameters

  • conceptName String

    the name of the concept represented by this new notation

  • notation String

    the notation being added

  • options Object <nullable>

    any additional options, as documented above

Source

convertTo(text, language, ambiguous) → {String|Array.<String>}

Convert text in this language to text in another language. If the text cannot be parsed in this language, then undefined is returned instead. Note that this object and language must have the same Converter instance associated with them, or this function will throw an error.

Parameters

  • text String

    the text in this language to be converter to the other language

  • language Language

    the destination language

  • ambiguous boolean false

    passed to the parse() function, and thus determines whether the result of this is a string or an array thereof

Returns

  • String Array.<String>

    the converted text, if the conversion was possible, and undefined otherwise (or an array of strings if ambiguous is true)

Source

parse(text, ambiguous) → {AST|Array.<AST>}

Treat the given text as an expression in this language and attempt to parse it. Return an abstract syntax tree (AST) on success, or undefined on failure. Or, if you set the optional second parameter to true, it will return an array of all possible parsings, each as an AST.

Parameters

  • text String

    the input text to parse

  • ambiguous boolean false

    if true, return all possible meanings of the given text, which will be more than one if the text is ambiguous; defaults to false, which returns just one AST or undefined

See

  • compact()

Returns

  • AST Array.<AST>
    • if ambiguous is set to false, returns the parsed AST, or undefined if parsing failed; if ambiguous is set to true, returns all parsed ASTs as an array, which may be empty

Source

rulesFor(target) → {Array}

Get all grammar rules for the given concept or syntactic type. The result is an array of the right-hand sides of the grammar rules for the concept or syntactic type. Each such right-hand side is the array of tokens or type names used internally by the parser.

Parameters

  • target String | AST

    if this is a string, it must be the name of the concept or syntactic type to look up; if it is a leaf AST, then its contents as a string are used; if it is a compound AST, then its head is used

Returns

  • Array

    an array of the right-hand sides of the grammar rules

Source