# How To Perform Nonlinear Regression in Excel

Excel Solver is one of the simple and easy curve-fitting tool around. Its curve-fitting capabilities makes it an excellent tool for performing nonlinear regression. The Excel Solver can be used to find the equation of the linear or nonlinear curve which closely fits a set of data points. Here are few simple steps on how to perform nonlinear regression in Excel.

### Nonlinear regression can serve two distinct goals

• To fit a model to your data in order to obtain best-fit values of the parameters, or to compare the fits of alternative models.
• To simply fit a smooth curve so you can interpolate values from the curve, or perhaps to draw a graph with a smooth curve. If this is your goal, you can guess by looking at the data plot and curve. There is no need to learn the deep math behind it.

### Least-Squares regression

A random variable with a Gaussian distribution is said to be normally distributed and is called a normal deviate. The normal distribution is sometimes called the bell curve. Although, other distributions such as Cauchy, Student’s t, and logistic distribution are bell-shaped as well.

A common assumption is that the data points are randomly scattered along a curve or a line with the scatter following a Gaussian distribution. If you go by this assumption, the next step is to tweak the model’s parameters to find the curve that minimizes the sum of squares of the distances of the points from the curve. You may pause for a second and ask, Why minimize the sum of squares of the distances? Would minimizing the sum of the linear distances work?

### Simple Explanation

If the random scatter plot follows a Gaussian distribution, it is very likely to have two moderate deviations than to have a very small and a very large deviation. You are most likely to find two deviations, say 10-units each from the curve than finding one small deviation of 5 units and one large deviation of 15 units.

A function that minimized the sum of the absolute value of the distances would have no preference over a curve that was 10 units away from two points and one that was 5 units away from one point and 15 units from another.

However, a function that minimizes the sum of the squares of the distances prefers to be 10 units away from two points where the sum-of-squares is 100, rather than 5 units away from one point and 15 units away from another, where the sum-of-squares is 250. If the scatter is Gaussian or a bell shaped, the curve determined by minimizing the sum-of-squares is most likely to be correct.

### Model Formulation

Running promotions with discounts is one of the most popular and effective ways to drive sales. A sales manager has collected the following data points on the discount% vs. Sales for the prior month.

Her main purpose is to build a model that will predict sales by the discount level. A polynomial of degree n is a function of the form,

f(x) = $a {\scriptscriptstyle n}x^n + a {\scriptscriptstyle n-1}x^{n-1} + ... + a {\scriptscriptstyle 2}x^2 + a {\scriptscriptstyle 1} x + a {\scriptscriptstyle 0}$

A quadratic form of this polynomial takes the form,

$a {\scriptscriptstyle 0} + a {\scriptscriptstyle 1}x + a {\scriptscriptstyle 2}x^2$ = c

Since the equation is quadratic, there are three coefficients, one for x squared, one for x, and a constant. So we’ll need to start by creating a space to store the three coefficients for the equation.

The expression can be restated as,

$a*ln(x)^{b} =c$

Excel calculates the values of all required variables which produces the equation of the curve that closely fits the data points. Since it is one of the easiest tools around, the sales manager prefers Excel to model an equation what closely describes the relationship between sales and the discount level for her customers.

Enough of introduction. Lets jump right into how to perform nonlinear regression in Excel. The first step is to take a quick glance at the data and estimate what general type of curve these data points most likely fit into.

#### Guessing from the plot

The scatter plot clearly points out to a diminishing y value for an increasing x value. While slashing prices certainly attracts customers, executing discounts the wrong way could end up eating your way into the profits or enticing the wrong types of customers – customers who will only buy when you lower the prices. The power of discounts as a tool to drive sales tapers off after 18%.

The curve takes the general form of

Y = $A {\scriptscriptstyle 0} + A {\scriptscriptstyle 1}X^{B1}$

or Sales = $A {\scriptscriptstyle 0} + A {\scriptscriptstyle 1}*(Discount)^{B1}$

You can use the Solver to determine the optimal values of A0, A1, and B1 decision variables that minimizes the objective function.

How to perform nonlinear regression in excel begins with initializing decision variables to 1000, 1000, and 0.5 in cells C3, C4, and C5 respectively. You are setting these initial values arbitrarily. Solver will find the optimal values that minimizes the objective function in cell F20. The next step is to take the difference between actual and predicted sales by our model with the initial values of decision variables.

The square of each difference is taken, and summed up in cell F20. The objective is to find the optimal decision values in cells C3, C4, and C5 that minimizes sum of squares of the differences.

### Solver setup

Open the Excel Solver and the following blank dialogue box shows up:

The dialogue box has these 4 parameters that need to be set:

1. Objective Cell or the target which is $F$20
2. Optimization Type – Minimize or Maximize the target function
3. Decision variables. Use cells C3 thru C5
4. Constraints – there are none for this curve fitting optimization
5. Solving Method

### GRG Nonlinear method of solving

GRG Nonlinear is the most suitable solving method when the objective function is non-linear and continuous. These objective functions are often called as smooth. Common spreadsheet functions such as ABS, IF, and VLOOKUP, for instance will cause problems for this algorithm as they are non-smooth.

The Sales function in our example appears to be smooth and non-linear. The completed Solver dialogue will look like:

Uncheck Make Unconstrained Variables Non-Negative option so as the coefficients of the regression will be negative. After running the solver we get the optimized decision variables to minimize the objective function.

The objective function was minimized to \$3,63M. The sales can now be predicted from the discount percentage by,

Sales = -0.00003188 + 23659.83 * (Discount)^0.74

The points in the regenerated plot with actual and predicted sales curves are quite close.

The solver had to run twice to minimize the function. GRG non-linear method often produces different results upon successive runs. The starting point or the initial values assigned to the decision variables are quite important. You may have to run the GRG algorithm multiple times to achieve desirable results.

### Limitations of Excel Solver

The Excel Solver is a convenient tool and easy to use for spreadsheet users. One of the major limitations of the Excel Solver is the number of decision variables and constraints it can handle. Solver cannot run optimization problems beyond 200 decision variables and cannot process more than 100 constraints at once.

It is quite common for business and engineering optimization problems to have hundreds and thousands of variables. Get a powerful optimization tool such as Lingo from Lindo Systems

Frontline Systems is another option and it integrates with Excel seamlessly. You may be interested in a Network Optimization example that uses Evolutionary algorithm for a very simple Supply Chain problem.

1. Mario Gomez
1. Jish Bhattacharya Post author