Regular expression decision solver

In Spring of 2024, I took the course "CS 6861: Introduction to Kleene Algebra". This course was very theory-heavy, but I wanted to apply some of the things learned into a programming project.

What Is Kleene Algebra?

In short, Kleene algebra is the algebra of regular expressions. It has addition and multiplication, which may be familiar to most people, but it has some other funky properties, some them including:

What My Project Does

In the second problem set for this course, one problem involved proving the following equivalences:

One of the nice results of Kleene algebra is that the algebra is decidable. This means if you can write an equation, there always exists a proof of whether this equation is true or false. For Kleene algebra specifically, there exists decision procedures, which are a set of rules that you can follow to always compute such a proof.

My project implements such decision procedures and is able to verify those equations are true! It can also reject incorrect equations.

How My Regex Decider Works

At a high level, my program works as follows:

  1. Parse 2 regular expressions into Abstract Syntax Trees (ASTs)
  2. Convert ASTs into Deterministic Finite Automatons (DFAs)
  3. Run bisimulation on the two DFAs to determine equivalence

Inputting Regular Expressions

I use Menhir to generate a parser for regular expressions.

The BNF grammar looks loosely like:

<program> ::= <expression>

<expression> ::=
  | ZERO
  | ONE
  | <character>
  | <expression> PLUS <expression>
  | <expressin> TIMES <expression>
  | LEFT_PARENTHESIS <expression> RIGHT_PARENTHESIS
  | <expression> STAR

Using this grammar, we can generate a parser that will take the input $(ab)* + 1$ and turn it into the following AST:

          PLUS
          /  \
      STAR   ONE
       |
     TIMES
     /   \
    a     b

Converting ASTs into Automatons

My program supports two different ways of converting Abstract Syntax Trees (ASTs) into automatons. Ultimately, we want Deterministic Finite Automatons (DFAs) to run bisimulation on.

One way my program can construct DFAs is by first using Thompson's Construction to construct Non-deterministic Finite Automatons (NFAs), and then convert them into DFAs using subset construction.

Another way is by constructing the DFA directly through Brzozowski derivatives. Isn't it cool that you can take derivatives of regular expressions?

Running Bisimulation

Related Resources