Conversion of Expressions from Infix to Postfix Notation in C# – Part 2 Unary Operators

Conversion of Expressions from Infix to Postfix Notation in C# – Part 1

Conversion of Expressions from Infix to Postfix Notation in C# – Part 3 Custom Functions

Originally in the second part, I was going to add the ability to both evaluate expressions with the unary + and – and also build in support for functions. Due to time pressures I am going to have to save the addition of functions for part 3, but this will be added! Part 2 is going to concentrate on building in support for unary + and -.

What are Unary Operators

For an in-depth explanation you can read the Wikipedia article on this. You can see from the article that there are more unary operations than + and -, but these are the operators we are going to concentrate on supporting in our expression evaluator. Quite simply put, the expression -5 expresses negative 5. The expression 2 * -5 = -10. There is the concept of a unary +, so you could have the expression 2 * +5 = 10, but actually the unary + is not needed, because the 5 is assumed to be positive.

Why Explicitly Support This in Code

So why are we having to explicitly build in support for this in our parser? If you look back at Part 1, the Shunting Yard algorithm for converting infix to postfix is explained. As it stands, the unary operators are going to confuse the code that converts the expression. If you consider the infix expression 2 * 5, this converts to a postfix expression of 25* and using our code to evaluate this causes no problems at all. If you add in the unary – our (2 * -5) our code converts this to a postfix expression of 2*5- and when we try to evaluate this, our code is going to get very confused, so we need to do something to the code that converts the expression from infix to postfix to stop the confusion.

So what can we do? I have read a couple of articles to see how people have got around this. The first piece of good news and really this is obvious, we can ignore the unary + as it serves no real purpose. We first need to think about how we decide if an operator is a unary operator and there are a some simple rules we can follow, these are:

  1. If the + or – is the first character in the infix expression, its assumed to be unary. E.g. -5 * 2
  2. If the + or – comes after another operator, its assumed to be unary. E.g. 5 * -2
  3. If the + or – comes after an opening parenthesis, its assumed to be unary. E.g. 1 + (-5  * 2)

So that’s our rules, what can we do in code. Well a lot of responses on forums have said to find out if the operator is unary and if its a + ignore it and if its a – replace it with another character such as # or ~. That’s fine and would overcome the issue, but we then need to really play around with our code, both the infix to postfix converter and evaluation code and I want to avoid this if possible as the code seems to work well, is there an alternative? Well, yes I think there is, lets see how we can convert a positive number to a negative using simple maths. To convert 5 to -5 I would normally say ‘multiply by -1’, 5 * -1 = -5, but that doesn’t help us as there is a unary – in the expression and that’s what we are trying to replace!! So we could always do 0 – 5 = -5, that gives us -5 quite nicely and if you think about our expression, gives us a clue as to what we can do. Lets look at our three rules above, could we not do the following and derive the same answers?

  1. -5 * 2 = (0 – 5) *2
  2. 5 * -2 = 5 * (0 – 2)
  3. 1 + (-5 * 2) = 1 + ( (0 – 5) * 2 )

So our solution would involve a slight code change, which would be to applied to our infix expression before it is converted to postfix.

  1. Look for a unary +, if its found, remove it from the infix expression as its not needed.
  2. Look for a unary -, if its found, insert a ( and a 0 before the unary – and a ) after the operand the unary minus applies to.

We would then have a valid infix expression with the unary operators removed but produces the same result when evaluated, but more importantly, we have an infix expression that still works with our postfix conversion code.

The Code!

Enough theory, lets put this into code! We’ve said that we need to do the above before the expression is converted to postfix, so we are either going to build the code into our InfixToPostfix method, or write a method to work on the expression but call it from the InfixToPostfix method. I’m going to follow the route of writing a method and call it.

One of the first things our InfixToPostfix method does is to split an infix expression and produce a list of the expressions component parts, this makes adding and removing items from the expression nice and easy. Our original InfixToPostfix method had a LINQ expression that took our infix expression as a string, applied a regular expression to it and looped around the list. As I want to pass the expression converted to a list into our new method, I have refactored the method slightly to now do the following:

string pattern = @"(?<=[-+*/(),^<>=&])(?=.)|(?<=.)(?=[-+*/(),^<>=&])";
      
expression = expression.Replace(" ", "");
Regex regExPattern = new Regex(pattern);
List<string> expr = new List<string>(regExPattern.Split(expression));
expr.RemoveAll(s => String.IsNullOrEmpty(s.Trim()));
       
//parse our expression and fix unary + and -
ParseUnary(ref expr);
        
expr.ForEach(s =>

I’ve highlighted in bold our call to a method called ParseUnary which applies our rules we have derived above. This is a very simple method and looks as follows:

private static void ParseUnary(ref List<string> expr)
{
    for(int i = 0; i < expr.Count; i++)
    {
        //we have a minus
        //if its a unary, then the following happens:
        //1. At the beginning: -1 * 2 = ( 0 - 1 ) * 2
        //2. After an opening parenthesis: 1 + ( -2 * 3 ) = 1 + ( ( 0 - 2 ) * 3 )
        //3. After an operator: 1 + -2 = 1 + ( 0 - 2 )
        if (expr[i] == "-")
        {
            //is it at the beginning of the expression OR
            //it follows and opening parenthesis OR operator>
            //if so, its a unary minus
            if (i == 0 || expr[i - 1] == "(" || operators.ContainsKey(expr[i - 1]))
            {
                expr.Insert(i, "(");
                expr.Insert(i + 1, "0");
                expr.Insert(i + 4, ")");
            }
        }
        //we have a +
        else if(expr[i]=="+")
        {
            //is it at the beginning of the expression
            //OR it follows an opening parenthesis OR operator?
            //if so, we can drop it, as it serves no purpose
            if (i == 0 || expr[i - 1] == "(" || operators.ContainsKey(expr[i - 1]))
                expr.RemoveAt(i);
        }
    }
}

As you can see, we iterate over the list and apply our rules for checking if a + or a – is a unary operator. As we have said + can be ignored, so if we find one, we simply remove it from the list. If we find a unary -, we insert our leading ( and 0 before the unary – and then insert our closing parenthesis after the operand the unary – applied to. As we are passing the list that contains the expression by reference, we don’t need to return anything from our ParseUnary method. This style is personal choice.

And that’s it! We’ve added the new method and made the changes to the existing method to implement handling the unary + and -, so lets see this working with the expression 1 + (-5 * 2). This should give us the answer -9. Evaluate -5 * 2 first to give us -10 and then add 1 to it gives us -9.

Unary Expression Output
As you can see from the postfix expression in the image above, the (0-5) as been applied and when evaluated, has given us the correct answer of -9.

I have made available a new version of the Visual Studio solution that implements the new additions above and you can download this from here.

Advertisements

Conversion of Expressions from Infix to Postfix Notation in C# – Part 1

Conversion of Expressions from Infix to Postfix Notation in C# – Part 2 Unary Operators

Conversion of Expressions from Infix to Postfix Notation in C# – Part 3 Custom Functions

If you have a need in your applications to evaluate mathematical expressions and you search the term ‘convert infix to postfix’, you will find a huge amount of information and code out there that will get you to where you need to be, some with explanations of how the code is working, others with just the program code. This post will hopefully fulfill both criteria, by giving an explanation of what the code is doing and also provide the Visual Studio Solution (VS2013) I have produced.

This is all out there already, why do this?

Good question!! Over the past 12 months I have been pretty much working solidly on my day job (the one that pays the bills!) and haven’t been able to spend much of my spare time sitting back and writing code for pleasure. With all the travel I have had, the time spent at home has been with the family. I’ve found myself with 2 weeks off over the Christmas period and this has been an ideal time to spend some time coding for pleasure. I’ve got a tonne of code I could be writing for work, but I’ve needed to switch off from that totally (not easy!!). For some reason the idea of writing some code to do expression evaluation popped into my head and this was an ideal little coding project that I could dip in and out of over the couple of weeks off.

What do the conversion from Infix to Postfix?

If you are reading this post, you more than likely already know the difference between infix and postfix notation. For a good description of these two notations (and for that matter prefix), follow the link here, its a very good brief description of the three notations.

With infix expressions, writing an evaluator can be a fairly complicated task, you need to cater for parenthesized expressions (potentially levels deep) and this can become potentially complicated to code (recursion anyone?). For example, with the expression ( (A * B) + (C / D) ) your evaluator will need to take into account all the parenthesized parts in order to derive the correct answer. Throw into the mix that you may also want to use functions, e.g. sin, cos, and it suddenly becomes not so much fun to write.

Postfix (Reverse Polish Notation)

Postfix notation specifies that all operators follow all of their operands and all parenthesis are removed. Looking at our infix expression ( (A * B) + (C / D) ) its posfix form would be AB*CD/+. To write code to evaluate this becomes an awful lot easier. If our code utilises a stack, we can do the following:

infix: ( ( 5 * 5) + (4 / 2) )
postfix expression:  55*42/+

for each char in postfix expression
{
    if char is operator
    {
        rhs = Pop();
        lhs = Pop();
        switch(char)
        {
            case '*':
                Push(lhs*rhs);
                break;
            case '/':
                Push(lhs/rhs);
                break;
            ... all other operators ...
        }
    }
    else
        Push(char); //push non-operator value to the stack.
}

So, working this through using our pseudo-code above, we get the following:

  1. push 5 to the stack
  2. push 5 to the stack
  3. Char * is an operator, so pop values from the stack into lhs and rhs and evaluate 5 * 5 and push result 25 to the stack
  4. Push 4 to the stack
  5. Push 2 to the stack
  6. Char / is an operator, so pop values from the stack into lhs and rhs and evaluate 4 / 2 and push result 2 to the stack
  7. Char + is an operator, so pop values from the stack into lhs and rhs and evaluate 25 + 2 and push result 27 to the stack
  8. END as postfix expression has been evaluated

Converting from Infix Please!!

Having read everything above, it all looks wonderful, but how to we get from infix to postfix and is it difficult? Fortunately this is not a difficult process and there is an algorithm for this called the Shunting Yard Algorithm. This is the algorithm we are going to implement. My code does not support functions or unary + and -. In part 2 of this series I will implement support for functions and unary + and -.

I’m not going to put the Shunting Yard Algorithm in this post, as I have linked to it above, so we can dive into the code that performs the conversion.

Operator Precedence and Associativity

The algorithm needs to take into account operator precedence and associativity when it constructs the postfix version of the expression. The wikipedia article gives this in the form of a table and this is what we will replicate in our code. We will implement this as a generic dictionary Dictionary<string,int[]) where the string will be the operator symbol and the integer array will hold the precedence and a value of 1 or 0 for either left or right association. The definition for this is as follows:

private static Dictionary<string, int[]> operators = new Dictionary<string, int[]>();
                 
//populate operators
//int format is {precedence, association -> 0=Left 1=Right}
operators.Add("^", new int[] { 5, 1 });
operators.Add("*", new int[] { 4, 0 });
operators.Add("/", new int[] { 4, 0 });
operators.Add("+", new int[] { 3, 0 });
operators.Add("-", new int[] { 3, 0 });
operators.Add("<", new int[] { 2, 0 });
operators.Add(">", new int[] { 2, 0 });
operators.Add("=", new int[] { 2, 0 });
operators.Add("&", new int[] { 1, 0 });

Our implementation of the algorithm also takes into account the operators < > = and &, so the precedence values have been modified compared to the Wikipedia article. Should you wish to add more operators you must make sure you get the precedence of the operator dictionary absolutely correct.

Starting the Conversion

 We then need to split our infix expression into its component parts so that we can parse it more easily. I’m using a regular expression to do this, as follows:

string pattern = @"(?<=[-+*/(),^<>=&])(?=.)|(?<=.)(?=[-+*/(),^<>=&])";
expression = expression.Replace(" ", "");
Regex regExPattern = new Regex(pattern);
    
regExPattern.Split(expression).Where(s => !String.IsNullOrEmpty(s.Trim())).ToList().ForEach(s =>
{

I don’t profess to be an expert with regular expressions and there is a small side effect with our expression in that it does produce empty elements in our list when we perform the split. This I think is to do with grouping within the regex and I will address this, but for the time being, I’m removing these in the LINQ expression when I start to iterate over the elements of the expression.

Compared to the algorithm explanation, I’m handling the order I do things slightly differently. I’m first checking if the element of the infix expression is an operator, if so, we process it, if not we go on to handle opening and closing parenthesis. If the element is neither of these things, its an operand and I’m simply adding it to the output queue. The order I have done things is just personal choice, it just seemed more logical to me.

    //is our token one of our defined operators
    //deal with precedence and association
    if (operators.ContainsKey(s))
    {
        //while the stack is not empty and the top of the stack is not an (
        while (stack.Count > 0 && stack.Peek() != "(")
        {
            if ((GetAssociation(s) == 0 && GetPrecedence(s) <= 
                                           GetPrecedence(stack.Peek())) ||
                (GetAssociation(s) == 1 && GetPrecedence(s) < 
                                           GetPrecedence(stack.Peek()))
               )
                queue.Enqueue(stack.Pop());
            else
                break;
        }
      
        //push operator onto the stack
        stack.Push(s);
    }
           
    //handle opening parenthesis
    //simply push this on the stack
    else if (s == "(")
    {
        stack.Push(s);
    }
    //handle closing parenthesis
    //pop all operators off the stack until the matching 
    //opening parenthesis is found and then discard the
    //opening parenthesis
    else if (s == ")")
    {
        while (stack.Count != 0 && stack.Peek() != "(")
            queue.Enqueue(stack.Pop());
      
        //forget the (
        stack.Pop();
    }
    else
        //none of the above so queue it
        queue.Enqueue(s);
});
            
//pop off the rest of the stack
while (stack.Count != 0)
    queue.Enqueue(stack.Pop());

return queue;

When we enter the loop to process the expression elements, we are first checking if the element is a valid operator, if it is, we are checking to see if there are any operators on the stack. If there are, we perform the precedence and association check for our current operator against what is on the top of the stack and pop off the operator on the stack and add to the output queue if required, pushing our currently selected operator element onto the stack. If we don’t have any operators on the stack, we simply push our operator onto the stack.

If our current operator in the expression is an opening parenthesis, we simply push this onto the stack and that’s all.

If our currently operator in the expression is a closing parenthesis, we pop all operators off the stack and add them to the output queue up to the point of the matching opening parenthesis. We then pop off and forget about the opening parenthesis, in other words, we don’t add it to the output queue.

Finally if the element in our expression is neither an operator or an opening or closing parenthesis, we assume that its an operand and we add it to the output queue.

When all expression elements have been processed within the loop, we check to see if there any operators left on the stack, if there are, we simply pop them off the stack and add them to the output queue.

And that’s it!! We then just return the queue as a return parameter from the method. My decision to use a Queue within the method is just personal choice and may or may not be a good decision, quite probably a generic list would have been better.

There are a couple of additional methods used that obtain the precedence and association values and these are shown below.

//Get the precedence of the operator passed in
private static int GetPrecedence(string s)
{
    return operators[s][0];
}
     
//Get the association of the operator passed in
private static int GetAssociation(string s)
{
    return operators[s][1];
}

To use these methods, we simply pass in the operator and the associated precedence or association value is returned.

The Output

I have wrapped all my code into a class called Evaluate and with this in mind, the code I can use to convert an infix expression to postfix and display the result is:

string exp = "( (5 * 5) + (4 / 2) )";
    
Console.WriteLine("Infix Expression: {0}", exp);
    
Console.Write("Postfix Expression: ");
Queue<string> q = Evaluate.InfixToPostFix(exp);
    
foreach (string s in q)
    Console.Write(s);

The output on the console looks as follows:

Postfix Expression
Evaluating the Postfix Expression

So now we have the expression converted to postfix form, we need to evaluate it, after all this is the whole purpose of this post! At the beginning of this post, I gave the algorithm for evaluating a postfix expression, so there is no need to go over this again. The method to evaluate is as follows:

public static double CalculateExpression(Queue<String> postfix)
{
    Stack<double> stack = new Stack<double>();
    
    postfix.ToList<String>().ForEach(token =>
    {
        if (operators.ContainsKey(token))
        {
            if (stack.Count > 1)
            {
                double rhs = stack.Pop();
                double lhs = stack.Pop();
                switch (token)
                {
                    case "+":
                        stack.Push(lhs + rhs);
                        break;
                    case "-":
                        stack.Push(lhs - rhs);
                        break;
                    case "*":
                        stack.Push(lhs * rhs);
                        break;
                    case "/":
                        stack.Push(lhs / rhs);
                        break;
                    case "^":
                        stack.Push(Math.Pow(lhs, rhs));
                        break;
                    case "=":
                        stack.Push(Convert.ToDouble(lhs == rhs));
                        break;
                    case "<":
                        stack.Push(Convert.ToDouble(lhs < rhs));
                        break;
                    case ">":
                        stack.Push(Convert.ToDouble(lhs > rhs));
                        break;
                    case "&":
                        stack.Push(Convert.ToDouble((int)lhs & (int)rhs));
                        break;
                }
            }
        }
        else
            stack.Push(Convert.ToDouble(token));
    });

    //everything has been evaluated and our result is on the stack
    //so return it
    return stack.Pop();
}

To call this method, we simply pass in what was returned from our method that converts infix to postfix, for example:

string exp = "( (5 * 5) + (4 / 2) )";
    
Console.WriteLine("Infix Expression: {0}", exp);
Console.Write("Postfix Expression: ");
Queue<string> q = Evaluate.InfixToPostFix(exp);
               
foreach (string s in q)
    Console.Write(s);
    
Console.WriteLine("\nThe expression evaluates to: {0}", Evaluate.CalculateExpression(q));

The output on the console then looks as follows:

Postfix Expression Evaluation

As you can see, the expression is evaluated and returns the answer of 27, which in my book means the evaluation has worked!

Other Expression Types

As I have added the operators < > and & and = you can build and evaluation some more complex expressions, such as:

  • (2*2) = (1+3) which would return the value 1 meaning true
  • (2*2) < 1 which would return the value 0 meaning false

Conclusion

Whilst this has been a fairly long post, we have covered a lot of ground, both theory and program code. I hope you find this post useful and feel free to leave a comment or contact me via twitter (@pcurnow).

In part 2 of this series, we will look at adding the ability to evaluate functions and give the ability to add your own functions to be evaluated.

You can find a full Visual Studio 2013 solution that contains a full working version of the conversion and evaluation routines described in this post here.