---
title: "An introduction to nlsr"
author: "John C. Nash"
date: "`r Sys.Date()`"
output: pdf_document
bibliography: ImproveNLS.bib
vignette: >
  %\VignetteIndexEntry{nlsr Introduction}
  %\usepackage[utf8]{inputenc}
  %\VignetteEngine{knitr::rmarkdown}
---

**R** has several tools for estimating nonlinear models and minimizing sums of 
squares functions. The base R
distribution includes a function `nls()` that is feature-rich but has some 
annoying weaknesses (@JNABRefactoringNLS22). Some of these
are addressed in the package **nlsr** (@nlsr2019manual), for which this vignette 
provides an introduction with examples.

## What does nlsr do?

**nlsr** has functions to find a set of parameters that minimizes the sum of 
squares (or deviance) 
of differences between a model specified either as a formula using function 
`nlxb()` or as both residual
and Jacobian R functions using function `nlfb()`. As described below, the 
Jacobian may be approximated
by particular control settings.

Existing R tools that are similar to `nlxb()` are `nls()` and `minpack.lm::nlsLM()`;  
`minpack.lm::nls.lm()` is similar to `nlfb()`.

There are several important differences in approach and results, some of which 
are detailed in this document.

### Output object 

The output object of `nlxb()` is smaller than that of `nls()` and is focused on 
the solution and its properties
rather than the **modeling** task. That is, package `nlsr` emphasizes the solution 
of the nonlinear least squares problem rather than the estimation of a nonlinear 
model that fits or explains the data. `nls()` and `nlsLM` return an object of 
class `nls`, which allows for
a number of specialized modeling and diagnostic extensions. For compatibility, 
the `nlsr` package has function `wrapnlsr()`, for which `nlsr()` is an alias. 
This uses `nlxb()` to find good parameters, then calls `nls()` to return the class 
`nls` object. Unless particular modeling features are needed, the use of 
`wrapnlsr()` is unnecessary and wasteful of resources.

### Jacobian calculation

Internally, `nlxb()` uses symbolic and automatic differentiation tools to create 
the residual and Jacobian functions needed for `nlfb()` then uses it to solve 
the relevant nonlinear least squares minimization problem. 

As with other nonlinear least squares packages, we provide the Jacobian code as 
the "gradient" attribute of the Jacobian function. This lets us embed the code 
for the Jacobian as this attribute of the **residual** function so that the call 
to `nlfb()` can be made with the same name used for 
both residual and Jacobian function arguments. This programming trick saves a 
lot of trouble for the package developer, but it can be a nuisance for users 
trying to understand the code. 

Generally `nls()` and `nlsLM()` use a fairly simple forward difference 
approximation of derivatives for the Jacobian, though a central approximation 
can be specified in control parameters. Package `nlsr` provides four approximation 
options, with a further control `ndstep` for the size of the
step used in the approximation.

### Stabilization of Gauss-Newton computations

The iteration in all the major tools mentioned is the solution of the Gauss-Newton 
equations. `nls()` uses a variant of an approach suggested by @Hartley61, while 
`nlsr` and `minpack.lm` use variants of the method published by @Marquardt1963, 
though it was suggested earlier by @Levenberg1944. The details of the method in 
`nlsr` are discussed in  @JNABRefactoringNLS22.
Though it is unlikely to be useful in general, control settings for `nlxb()` or `nlfb()`
allow for those routines to perform a variety of Hartley and Marquardt algorithms. This
has been useful for learning about the algorithms, but tangential to simply finding
parameters of nonlinear models.

Nonlinear least squares methods are mostly founded on some or
other variant of the Gauss-Newton algorithm. The function we
wish to minimize is the sum of squares of the (nonlinear) 
residuals $r(x)$ where there are $m$ observations (elements of $r$)
and $n$ parameters $x$. Hence the function is

$$    f(x) = \sum_i({r_i}^2)$$

Newton's method starts with an original set of parameters $x[0]$.
At a given iteraion, which could be the first, we want to solve

$$    x[k+1]  =  x[k]  -   H^{-1}  g$$
   
where $H$ is the Hessian and $g$ is the gradient at $x[k]$. We can 
rewrite this as a solution, at each iteration, of 
  
$$    H \delta = -g$$

with

$$    x[k+1]  =  x[k] + \delta$$
         
For the particular sum of squares, the gradient is 

$$    g(x) = 2 * r[k]$$

and 

$$    H(x) = 2 ( J' J  +  \sum_i(r_i * Z_i))$$

where $J$ is the Jacobian (first derivatives of $r$ w.r.t. $x$)
and $Z_i$ is the tensor of second derivatives of $r_i$ w.r.t. $x$).
Note that $J'$ is the transpose of $J$. 

The primary simplification of the Gauss-Newton method is to
assume that the second term above is negligible. As there is
a common factor of 2 on each side of the Newton iteration 
after the simplification of the Hessian, the Gauss-Newton
iteration equation is

$$    (J' J)  \delta = - J' r$$
<!-- find the very convoluted code behind `nls()` very difficult to decipher. Unfortunately, its authors have now (as far as I am aware) all ceased to maintain the code. -->


This iteration frequently fails. The approximation of the Hessian
by the Jacobian inner-product is one reason, but there is also
the possibility that the sum of squares function is not
"quadratic" enough that the unit step reduces it. @Hartley61
introduced a line search along delta. @Marquardt1963 suggested
replacing $J' J$ with $(J' J + \lambda * D)$ where $D$ is a diagonal
matrix intended to partially approximate the omitted portion of
the Hessian. Choices suggested by Marquardt were $D = I$ (a unit matrix) or
$D$ = (diagonal part of $J' J$). The former approach, when $\lambda$ is
large enough that the iteration is essentially

$$    \delta = - g / \lambda $$

gives a version of the steepest descents algorithm. Using the
diagonal of $J' J$, we have a scaled version of this
(see \url{https://en.wikipedia.org/wiki/Levenberg-Marquardt_algorithm};
<!-- https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm -->
@Levenberg1944 predated Marquardt, but the latter seems to
have done the practical work that brought the approach to general attention.)

Both `nlsr::nlxb()` and `minpack.lm::nlsLM` use a Levenberg-Marquardt stabilization
of the iteration described above, with `nlsr` using the modification involving the 
$\phi$ control parameter. The complexities of the code in `minpack.lm` are such that I
have relied largely on the documentation
to judge how the iteration is accomplished. `nls()` uses a straightforward 
Hartley variant of the Gauss-Newton iteration,
but rather than form the sum of squares and cross-products, uses a QR decomposition 
of the matrix
$J$ that has been found by a forward difference approximation. The line search used
by `nls()` is a simple back-tracking search using a step reduction factor of 0.5 as
the default stepsize reduction.

@jncnm79 and @jnmws87 solve

$$ (J^T J + \lambda D_x)   \delta  = - J^T r $$

by the Cholesky decomposition. In this $D_x = (D + \phi * I)$ is used as above
and $\lambda$ is a number of modest size initially. $\phi$ is typically 1. Clearly
for $\lambda = 0$ we have a Gauss-Newton method. Typically, the sum of squares of 
the residuals calculated at the "new" set of parameters is used as a 
criterion for keeping those parameter values. If so, the size of $\lambda$ is 
reduced. If not, we increase the size of $\lambda$ and compute a new $\delta$. 
Note that a new $J$, the expensive step in each iteration, is NOT required in 
this latter case. 

In 2022, a modification to use $D_y = (\psi * D + \phi * I)$ was introduced,
though the matrix equations are solved via a QR decomposition approach. Within
the code, control parameters `psi`, `phi` and `stepredn` were introduced so that
a variety of Gauss-Newton, Hartley, or Marquardt approaches are available by 
simple control modifications. Experience so far suggests that a Levenberg-Marquardt
stabilization is much more reliable than the Gauss-Newton-Hartley choices, but that
different selections of `psi` and `phi` perform rather similarly.
As for Gauss-Newton methods, the details of how to start, adjust and 
terminate the iteration lead to many variants, increased by these different 
possibilities for specifying $D$. See @jncnm79.


### Programming language

An important choice made in developing `nlsr` was to code entirely within the R 
programming language. `nls()` uses a mix of R, C and Fortran, as does `minpack.lm`. 
Generally, I believe that keeping to a single programming language allows for 
easier maintenance and upgrades. On the other hand, R is usually somewhat slower 
in performing computations because it keeps track of names and structures and 
because it is usually interpreted rather than compiled. In recent years the 
performance penalty for using code entirely in R has been much reduced
with the just-in-time compiler and other improvements, so that using an all-R 
package offers acceptable performance. Indeed, in `nlsr` the use of R may be 
less of a performance cost than the attempt to ensure the solution has been found, 
which forces more iterations to be used.

## An illustrative example

The Hobbs weed infestation problem (@cnm79 [page 120]) is a growth curve modeling 
task which is seemingly straightforward but turns out to be quite nasty. 

The data and a graph of it is given below.

```{r ex01, echo=TRUE}
weed <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443,
          38.558, 50.156, 62.948, 75.995, 91.972)
tt <- 1:12
weeddf <- data.frame(tt, weed)
plot(weeddf, main="Hobbs weed infestation data")
```

Most nonlinear least squares problems encountered by R users have data, but it is not
strictly required, as we shall see below in the section on functional specification 
of nonlinear least squares problems. What is required is a definition of 
the (residual) functions that are to be squared and then their squares 
summed to provide the  **loss function** or **objective function** 
to be minimized by adjusting the **parameters** that define the 
set of functions. In the case of the Hobbs problem, I was told the scientists
who collected the data believed that the growth of weeds followed a 3-parameter
logistic sigmoid growth curve where $y$ (the `weed` data) changes with time ($t$) 
(the `tt` index) according to the function

**Logistic3U**:
$$  y \approx  b_1 / (1 + b_2 * exp(- b_3 * t)) $$

The problem (using the 3 parameter logistic form above) has very bad 
scaling and regions in the parameter space where the sum of squares 
objective is nearly "flat", so that its
Hessian, or matrix of second derivatives, is almost singular. When this occurs, 
methods cannot easily make progress towards the optimal solution.

Solution methods also need an initial guess for the parameters to be adjusted.
There are often some ways to provide such initial parameters, though in my 
experience they take quite a lot of work to set up reliably, but R does have
some worthwhile possibilities in some, though not all, of the **selfStart**
modelling functions, especially those in the package `nlraa` (@MiguezNLRAA2021).

There are alternatives to the model above. For example, a simple scaling 
can make the solution to the problem easier to find, such as

**Logistic3S**:
$$ y \approx  100 * c_1 / (1 + 10 * c_2 * exp(- 0.1 * c_3 * t)) $$
Another form is

**Logistic3T**:
$$ y \approx Asym / (1 + exp((xmid - t)/scal)) $$

The functions above are equivalent. Their parameters are related as follows:

$$   Asym =  b_1 = 100 * c_1 $$
$$ exp(xmid/scal)  =  b_2 = 10 * c_2 $$
$$ 1/scal  =  b_3 $$

`nlsr` is programmed to try to find solutions from very poor initial 
parameter values. While
this is not always possible, `nslr` has generally been able to succeed in finding
solutions, even when all parameters are started at 1. Let us compare its results
with those of `nls()` and `nlsLM()`.

### Problem setup
  
```{r ex02set, echo=TRUE}
# model formulas
frmu <- weed ~ b1/(1+b2*exp(-b3*tt))
frms <- weed ~ 100*c1/(1+10*c2*exp(-0.1*c3*tt))
frmt <- weed ~ Asym /(1 + exp((xmid-tt)/scal))
#
# Starting parameter sets
stu1<-c(b1=1, b2=1, b3=1)
sts1<-c(c1=1, c2=1, c3=1)
stt1<-c(Asym=1, xmid=1, scal=1)
```

### Solution attempts with nls()

```{r ex02nls}
unls1<-try(nls(formula=frmu, start=stu1, data=weeddf))
summary(unls1)
snls1<-try(nls(formula=frms, start=sts1, data=weeddf))
summary(snls1)
tnls1<-try(nls(formula=frmt, start=stt1, data=weeddf))
summary(tnls1)
cat("\n")
```
The "singular gradient" results here, with this problem, were the main 
motivation for developing `nlsr`. It is actually the Jacobian matrix 
which is singular, but because the Jacobian is in a sense the "gradient"
of the residuals, R has emitted its particular error report.

### Solution attempts with nlsr tools

```{r ex02nlsr}
library(nlsr)
unlx1<-try(nlxb(formula=frmu, start=stu1, data=weeddf))
print(unlx1) 
pshort(unlx1) # A short form output
snlx1<-try(nlxb(formula=frms, start=sts1, data=weeddf))
# pshort(snlx1)
print(snlx1)
tnlx1<-try(nlxb(formula=frmt, start=stt1, data=weeddf))
# pshort(tnlx1)
print(tnlx1)
ct<-as.list(tnlx1$coefficients) # Need a list to use ct$... in next line
cat("exp(xmid/scal)=",exp(ct$xmid/ct$scal),"\n")
cat("\n")
# explicit residuals (weighted if there are weights)
rtnlx1<-residuals(tnlx1)
print(rtnlx1)
cat("explicit sumsquares =", sum(rtnlx1^2),"\n")
```

`nlsr::nlxb()` has succeeded in finding a solution in all cases from the
poor "all-1s" starts. 

### Solution attempts with minpack.lm

NOTE: The original version of this vignette passed all package checks for Linux,
Windows and (Intel-based) Macintosh computers, but failed on Macintosh machines
using the M1 chip. 

```{r ex02minpack}
library(minpack.lm)
unlm1<-try(nlsLM(formula=frmu, start=stu1, data=weeddf))
summary(unlm1) # Use summary() to get display
unlm1 # Note the difference. Use this form to get sum of squares
# pnls(unlm1)  # Short form of output
snlm1<-try(nlsLM(formula=frms, start=sts1, data=weeddf))
# pnls(snlm1)
summary(snlm1)
tnlm1<-try(nlsLM(formula=frmt, start=stt1, data=weeddf))
if (inherits(tnlm1, "try-error")) {
   cat("Failure to compute solution -- likely singular Jacobian\n")
} else {  
   pnls(tnlm1) # short form to give sum of squares
   summary(tnlm1)
}   
```

### Solution attempts with wrapnlsr() wrapper

```{r ex02wrapnlsr}
## Try the wrapper. Calling wrapnlsr() instead of nlsr() is equivalent
##
unlw1<-try(nlsr(formula=frmu, start=stu1, data=weeddf))
print(unlw1) # using 'unlx1' gives name of object as 'x' only
snlw1<-try(nlsr(formula=frms, start=sts1, data=weeddf))
# pshort(snlx1)
print(snlw1)
tnlw1<-try(nlsr(formula=frmt, start=stt1, data=weeddf))
print(tnlw1)
```

### Commentary

There are several details that may be important for users.

- `nlsr` is set up to use `print()` to output standard errors and singular values
  of the Jacobian (for diagnostic purposes). By contrast, `minpack.lm` and `nls()`
  use `summary()`, which does NOT display the sum of squares, while `print()` 
  gives the sum of squares, but not the standard errors.
- The singular values displayed by `print.nlsr()` (the internal name for the adaptation
  of the generic `print()`) are displayed in a column to the right of the coefficient
  and standard error display, but are NOT specific to the parameters. Their position is
  purely for efficient use of page space. 
- The most common use of the singular values is to gauge how "nearly singular" the 
  Jacobian is at the solution, and the ratio of the largest to smallest of the 
  singular values is a simple but effective measure. In the above example, we note 
  that the scaled logistic has the smallest ratio. 
- The result from `nlsLM` for the transformed model has a very large sum of squares,
  which may suggest that the program has failed. Since neither `nls()` nor `nlsLM()`
  offer the singular values, we can "cheat" and use `nlxb()`, though the Jacobian
  used will be the analytic one used by this last program rather than the forward
  difference approximation that is generally used by the others.
  
```{r ex03, echo=TRUE}  
stspecial<- c(Asym = 35.532,  xmid = 43376,  scal = -2935.4)
# force to exit at once by setting femax to 1 (maximum number of sum of squares evaluations)
getsvs<-nlxb(formula=frmt, start=stspecial, data=weeddf, control=list(femax=1))
print(getsvs)
```

- The trick used above to get singular values is convenient, but it is actually
  quite easy to get the Jacobian from a class `nls` object such as `tnlm1` as follows.
  
```{r ex04, echo=TRUE, eval=FALSE}  
if (inherits(tnlm1, "try-error")) {
   cat("Cannot compute solution -- likely singular Jacobian\n")
} else {  
   Jtm <- tnlm1$m$gradient()
   svd(Jtm)$d # Singular values
}   
```

We see that there are differences in detail, but the more important result is that
two out of three singular values are essentially 0. Our Jacobian is singular, and no 
method of the Gauss-Newton type should be able to continue. Indeed, from this set of
parameters, `nlxb` also stops.

```{r ex05, echo=TRUE}  
stspecial<- c(Asym = 35.532,  xmid = 43376,  scal = -2935.4)
# force to exit at once by setting femax to 1 (maximum number of sum of squares evaluations)
badstart<-nlxb(formula=frmt, start=stspecial, data=weeddf)
print(badstart)
```

### Extracting standard errors from nlsr solutions

While it is straightforward to display coefficients of models with 
their standard errors (SE) and related statistics by **print**ing 
the solution, some users may need to be able
to save and use the quantities, for example, in particular 
presentations of them. A query
from Rohana Ambagaspitiya of the University of Calgary prompted me to 
explain how to do this, as the SE's and related quantities are not 
in the output of `nlxb()` or `nlfb()`,
but are computed in the `summary()` and `print()` functions. Saving 
the SE of the `Asym` coefficient in the `tnlx1` solution above is 
illustrated.

```{r tnlx1se, echo=TRUE, eval=TRUE}  
ssum<-summary(tnlx1)
# str(ssum) # This will display the contents of the summary if we wish.
# The key element is the param array
print(ssum$param)
AsymSE<-ssum$param[1,2]
AsymSE
```


## selfStart options

The form of the logistic given by `frmt` is available as a **selfStart** model,
needing no starting parameters with `nls()` or `minpack.lm::nlsLM()`. 
The code is in base R in location 
` src/library/stats/R/zzModels.R`.

```{r ex06, echo=TRUE}  
frmtss <- weed ~ SSlogis(tt, Asym, xmid, scal)
ssts1<-nls(formula=frmtss, data=weeddf)
summary(ssts1)
require(minpack.lm)
sstm1<-nlsLM(formula=frmtss, data=weeddf)
summary(sstm1)
```

The essence of selfStart models is to provide starting parameters for the iterative methods
used to find the final parameters. However, an examination of the code for the `SSlogis()`
function shows that it makes use of a DIFFERENT solver and returns the final results of
that solver. The computational effort in doing this is, I believe, greater than 
that expended by `nlxb()` from an extremely crude start (all 1's). An alternative
selfStart for this version of the logistic is included in package `nlsr` as 
`SSlogisJN.R`.

Users should be aware that the programming of selfStart models involves 
quite esoteric aspects of R which I find prone to errors and difficult to 
grasp easily. Developers who have built them for us deserve our thanks. A 
particularly important collection of such tools is in the package @MiguezNLRAA2021. 

###  SSlogisJN.R for the 3-parameter logistic

`SSlogisJN.R` uses ideas for selfStart from @Ratkowsky1983 for the 
Logistic3T form of the 
model. It produces only an approximate set of starting parameters, unlike `SSlogis.R`. 
The core of the idea is as follows, where we use the abbreviated output function to
save page space.
  
```{r ex07, echo=TRUE}  
Asymt <- 2*max(weed) # use double the largest weed value as a guess to Asymt
Asymt
pw <- Asymt/weed # intermediate quantities to compute scale and xmidt
pw1<-pw-1
lpw1<-log(pw1)
clin <- coef(lm(lpw1~tt))
clin<-as.list(clin)
scalt <- -1/clin$tt
scalt
xmidt <- scalt*as.numeric(clin[1])
xmidt
library(nlsr)
try1<-nlxb(frmt, data=weeddf, start=c(Asym=1, xmid=1, scal=1))
pshort(try1)
try2<-nlxb(frmt, data=weeddf, start=c(Asym=Asymt, xmid=xmidt, scal=scalt))
pshort(try2)
```

We see a dramatic reduction in the computational effort as measured by residual and 
Jacobian evaluations.

### Running selfStart models with nlxb()

There are two important steps in using selfStart models with `nlxb()`:

- we must provide the starting parameters explicitly to `nlxb()`, which seems
counter to the goal of selfStart models. However, we use function `getInitial()`
to exploit particular selfStart modeling functions.
- we must point to a mechanism to compute the Jacobian, since the
  selfStart function is unlikely to be in the derivatives table of
  analytic or automatic derivatives. Note that Jacobian ("gradient")
  code is generally part of selfStart functions. If code is included
  for the analytic Jacobian, it is likely worth using. However, because
  it is easy to make errors in coding derivatives, it
  may also be worth checking results of such code, for example,
  using tools from package `numDeriv`. I have made more such errors
  over my career than I like to admit.
  
These steps can be automated, and this has been carried out in function
`nlsrSS()`. The automation is built into the functions `nls()` and 
`minpack.lm::nlsLM()`, but I have preferred to make the process explicit
and give it a separate function.

Let us illustrate how to use the `getInitial()` function (part of
base R) to do this.

```{r ex08, eval=TRUE}
frmtssJ <- weed ~ SSlogisJN(tt, Asym, xmid, scal) # The model formula
lstrt <- getInitial(frmtssJ, weeddf) # Here we get the starting parameters
cat("starting parameters:\n")
print(lstrt)
# Next line uses the start. We indicate that the "approximate" Jacobian code is in
#  the selfStart code, though in fact it is not an approximation in this case.
sstx1<-nlxb(formula=frmtssJ, start=lstrt, data=weeddf, control=list(japprox="SSJac"))
print(sstx1)
# If no Jacobian code is included in selfStart module, we can actually use an
#  approximate Jacobian. See "gradient" elements for differences in result.
sstx1a<-nlxb(formula=frmtssJ, start=lstrt, data=weeddf, control=list(japprox="jacentral"))
print(sstx1a)
## We can automate the above in function nlsrSS()
sstSS<-nlsrSS(formula=frmtssJ, data=weeddf)
print(sstSS)
```

### Starting parameters for the Logistic3U model

A similar approach for the Logistic3U model can be used.

```{r ex09, echo=TRUE} 
b1t <- 2*max(weed)
b1t
lpw1<-log(b1t/weed-1)
clin <- as.list(coef(lm(lpw1~tt)))
b3t <- -clin$tt
b3t
b2t <- exp(as.numeric(clin[1]))
b2t
library(nlsr)
try1<-nlxb(frmu, data=weeddf, start=c(b1=1, b2=1, b3=1))
pshort(try1)
try2<-nlxb(frmu, data=weeddf, start=c(b1=b1t, b2=b2t, b3=b3t))
pshort(try2)
```

Once again, there is a significant saving in the number of iterations.
We have not coded a selfStart function for Logistic3U, preferring to 
use the scaled form Logistic3S for which we do not need good starting
values.

## Jacobian computation

`nlxb()` offers more options for how the Jacobian is computed than either `nls()` or
`minpack.lm::nlsLM()`. The choice is made using the `control` list element
`japprox`. If this is NOT specified, `nlxb()` attempts to build Jacobian code
using analytic or automatic derivative codes. This is not, of course, always possible,
and sometimes we will get an "error" message that required information is not in 
the derivatives table. This will be the case when we use a function like `SSlogisJN()`
unless we indicate that the code is available from a selfStart module by using a value 
of `SSJac` for `japprox`. The example also showed how we can specify an approximation
method, which is also useful when a formula does not have analytic derivatives available.
In such cases `control` element `japprox` can be set to one of `jafwd`, `jaback`, 
`jacentral` or `jand`, giving forward, backward, central and package `numDeriv`
approximations. I recommend `jacentral` as a reasonable compromise between 
accuracy of approximation and amount of computational work. 

## Bounds constraints on parameters

In some cases, we know that parameters cannot be bigger or smaller than some
externally known limits. Cash reserves cannot be negative. Animals have a 
minimum need for water. Airplanes cannot carry more than a known or 
legislated cargo weight. Such limits can be built into models, but there 
are some important details for using the tools in R.

- `nls()` can only impose bounds if the `algorithm="port"` argument is 
used in the call. Unfortunately, the documentation warns us:

  *The algorithm = "port" code appears unfinished, and does not even 
   check that the starting value is within the bounds. Use with 
   caution, especially where bounds are supplied.*

- `nlsLM()` includes bounds in the standard call, but I have observed cases
   where it fails to get the correct answer. From my examination of the code,
   I believe the authors have not taken into account all possibilities, though
   it should be noted that I regard all programs as having some weakness in
   regard to constrained optimization. Software creators have to work with
   assumptions on the extremity of scale that they are willing to countenance,
   and sometimes problems will be outside the scope envisaged.

```{r ex10, echo=TRUE} 
# Start MUST be feasible i.e. on or within bounds
anlshob1b <- nls(frms, start=sts1, data=weeddf, lower=c(0,0,0),
             upper=c(2,6,3), algorithm='port')
pnls(anlshob1b) #  check the answer (short form)
# nlsLM seems NOT to work with bounds in this example
anlsLM1b <- nlsLM(frms, start=sts1, data=weeddf, lower=c(0,0,0), upper=c(2,6,3))
pnls(anlsLM1b)
# also no warning if starting out of bounds, but gets good answer!!
st4<-c(c1=4, c2=4, c3=4)
anlsLMob <- nlsLM(frms, start=st4, data=weeddf, lower=c(0,0,0), upper=c(2,6,3))
pnls(anlsLMob)
# Try nlsr::nlxb()
anlx1b <- nlxb(frms, start=sts1, data=weeddf, lower=c(0,0,0), upper=c(2,6,3))
pshort(anlx1b)
```

## Functional specification of nonlinear least squares problems

We illustrate how to solve nonlinear least squares problems where the set of
functions to be squared is provided by an R function, as is the Jacobian
matrix. Note that `nlsr::nlfb()` gets this information from the object
returned by the `jacfn` argument in the "gradient" attribute of that object.
This is a programming artifice to allow a simplification of the `nlxb()` code.
The example is the well-known Banana shaped valley of @Rosenbrock1960.

```{r exrosen, eval=TRUE}
frres<-function(x) { ## Rosenbrock as residuals
    x1 <- x[1]
    x2 <- x[2]
    res<-c( 10 * (x2 - x1 * x1), (1 - x1) )
}

frjac<-function(x) { ## Rosenbrock residual derivatives matrix
    x1 <- x[1]
    x2 <- x[2]
   J<-matrix(NA, nrow=2, ncol=2)
   J[1,1]<- -20*x1
   J[1,2]<- 10
   J[2,1]<- -1
   J[2,2]<- 0
   attr(J,"gradient")<-J # NEEDED for nlfb()
   J
}
require(minpack.lm)
require(nlsr)
strt<-c(-1.2,1)
rosnlf<-nlfb(strt, resfn=frres, jacfn=frjac)
print(rosnlf)
rosnlm<-nls.lm(par=strt, fn=frres, jac=frjac)
summary(rosnlm)  
```

`nls()` does not lend itself to solving problems via functional specification,
so that possibility will not be pursued here.

One use of this functional specification with `nlsr::nlfb()` that is sometimes 
helpful is the solution of **nonlinear equations**. Essentially, if we can compute
"residuals" in the form

$$  resfn(x, Data) $$

where $x$ are our parameters and $Data$ is exogenous information needed to compute
the residual functions, then a result where the residuals are zero has solved a
set of nonlinear equations. There are tools for the nonlinear equations problem,
in particular package `nleqslv`,  @p-nleqslv. However, there may be no solution
to the nonlinear equations, in which case the nonlinear least squares approach 
may be helpful to see if we are "close" or to enable some exploration of the 
problem. 

To assist users in the case that a Jacobian function is not (easily) available,
package `nlsr` offers the possibility of numerical derivatives using functions 
for forward, backward, or central difference approximations, or those from 
the `numDeriv` package. These are used by quoting the names of the approximation
functions, namely `jacfn="jafwd"`, `jacfn="jaback"`, `jacfn="jacentral"`, and
`jacfn="jand"`. My experience is that the forward and backward approximations
fail more often than I would like, and the `numDeriv` approximation works well
but uses too much computational effort. The central difference seems a sensible
compromise.

It is a disappointment that for the extended Rosenbrock problem, the choice 
makes no difference whatever.

```{r exrosenn, eval=TRUE}
# Extended Rosenbrock functional test with numerical Jacobian
require(minpack.lm)
require(nlsr)
strt<-c(-1.2,1)
rosnlff<-nlfb(strt, resfn=frres, jacfn="jafwd")
print(rosnlff)
rosnlfb<-nlfb(strt, resfn=frres, jacfn="jaback")
print(rosnlfb)
rosnlfc<-nlfb(strt, resfn=frres, jacfn="jacentral")
print(rosnlfc)
rosnlfnd<-nlfb(strt, resfn=frres, jacfn="jand")
print(rosnlfnd)
# minpack.lm has built in (likely forward difference) Jacobian
rosnlmn<-nls.lm(par=strt, fn=frres)
summary(rosnlmn)  
```


## Weighted nonlinear regression

Regression attempts to explain a *dependent* (or *predicted*) variable as a function of
*independent* (or *explanatory* or *predictor*) variables and estimated parameters. The
estimation method commonly used is minimization of the sum of squared
residuals. However, these residuals may be weighted. In the tools available
in **R** for nonlinear least squares, weights can be specified which multiply
the **squared** residuals, and we need a weight for each term in the sum.
Each residual is therefore multiplied by the square root of its respective
weight.

Typically, the weights are given as a vector of numbers. A popular choice is
the reciprocal of a measure of the variance of each data observation.
Replicated data allows for sample variances to be used; otherwise we need some
proxy. The concept is that the weights are proportional to our belief that the
observations are correct.

### Static weights

Let us demonstrate with weights that are simply the reciprocal of the
independent variable. This may not make sense for statistical modeling,
but it lets us see that the `residuals()` function returns weighted
residuals.

```{r staticwts}
wts <- 1/weeddf$tt # wts are reciprocal of time value
tnlx1w<-try(nlxb(formula=frmt, start=stt1, data=weeddf, weights=wts))
# pshort(tnlx1)
print(tnlx1w)
ct<-as.list(tnlx1w$coefficients) # Need a list to use ct$... in next line
cat("exp(xmid/scal)=",exp(ct$xmid/ct$scal),"\n")
cat("\n")

rtnlx1w<-residuals(tnlx1w)
print(rtnlx1w)
cat("explicit sumsquares =", sum(rtnlx1w^2),"\n")
```

### Weights that are functions of the model parameters

Possible choices for weights include the reciprocal of the squared residuals
or squared fitted values. Unfortunately, such quantities depend on the
parameter values. Consider our Logistic example. We want to predict
$y$ with

$$  fitted(b_1, b_2, b_3, t) = b_1 / (1 + b_2 * exp(- b_3 * t)) $$
Thus the raw residuals are

$$  rawres(b_1, b_2, b_3, t) = y -  fitted(b_1, b_2, b_3, t) $$
But if the weights are $(1/ fitted(b_1, b_2, b_3, t)$, then the
residuals for which the sum of squares is to be calculated are

$$ resid(b_1, b_2, b_3, t) = rawres(b_1, b_2, b_3, t) * (1/fitted(b_1, b_2, b_3, t) $$

$$                          =  y/(b_1 / (1 + b_2 * exp(- b_3 * t))) - 1 $$

However, we still need to keep in mind that the user needs to use the formula
for the fitted values to get predictions. While the software needs to minimize
the weighted (hence modified) sum of squares, it also must keep track of the
fitted/predicted values.

`minpack.lm` offers a function `wfct()` that can be used within the argument 
supplied as `weights` in the call to `nlsLM()` or `nls()`, though it usually
fails with `nlsr::nlxb()`. The action of this function was surprising to me, 
as `wfct()` takes an expression as its argument (and may itself be part of
an expression in the argument after the `=` sign). If the argument to 
`wfct` contains `fitted`, `resid` or `error`, then the call to `nlsLM` or
`nls` will trigger a further call to one of these functions, evaluate the
`weights` and then run again to compute the solution. This can be seen by
setting `trace = TRUE`. The process can be verified explicitly for the 
example used in the `wfct` documentation. Unfortunately, `wfct()` appears
to crash `knitr` or `rmarkdown`, so we have evaluated the example outside
of Rstudio and included the output below.

```{r puromycinwfct, echo=TRUE, eval=FALSE}
library(minpack.lm)
library(nlsr) # for pnls
Treated <- Puromycin[Puromycin$state == "treated", ]
# First get raw estimates
wtt3nlm0<-nlsLM(rate ~ Vm * conc/(K + conc), data = Treated, trace=TRUE,
               start = c(Vm = 200, K = 0.05))
fit0<-fitted(wtt3nlm0)
# Static wts using fit0
wtt3nlm0s<-nlsLM(rate ~ Vm * conc/(K + conc), data = Treated, trace=TRUE,
               start = c(Vm = 200, K = 0.05), weights = wfct(1/fit0^2))
pnls(wtt3nlm0s)
# and run directly, noting the 2 phase operation
wtt3nlm<-nlsLM(rate ~ Vm * conc/(K + conc), data = Treated, trace=TRUE,
               start = c(Vm = 200, K = 0.05), weights = wfct(1/fitted^2))
pnls(wtt3nlm)
cat("weights from wtt3nlm\n")
as.numeric(wtt3nlm$weights)
```

```
It.    0, RSS =    1636.59, Par. =        200       0.05
It.    1, RSS =    1205.62, Par. =    211.157  0.0616271
It.    2, RSS =    1195.57, Par. =    212.511  0.0638418
It.    3, RSS =    1195.45, Par. =    212.666  0.0640939
It.    4, RSS =    1195.45, Par. =    212.682  0.0641186
It.    5, RSS =    1195.45, Par. =    212.684   0.064121

It.    0, RSS =    0.22811, Par. =        200       0.05
It.    1, RSS =   0.227222, Par. =    200.913  0.0494747
It.    2, RSS =   0.227221, Par. =    200.847  0.0494303
It.    3, RSS =   0.227221, Par. =    200.842   0.049426
It.    4, RSS =   0.227221, Par. =    200.841  0.0494256
wtt3nlm0s  -- ss= 0.2272213 :  Vm = 200.841  K = 0.04942558; 4  itns
It.    0, RSS =    1636.59, Par. =        200       0.05
It.    1, RSS =    1205.62, Par. =    211.157  0.0616271
It.    2, RSS =    1195.57, Par. =    212.511  0.0638418
It.    3, RSS =    1195.45, Par. =    212.666  0.0640939
It.    4, RSS =    1195.45, Par. =    212.682  0.0641186
It.    5, RSS =    1195.45, Par. =    212.684   0.064121
It.    0, RSS =    0.22811, Par. =        200       0.05
It.    1, RSS =   0.227222, Par. =    200.913  0.0494747
It.    2, RSS =   0.227221, Par. =    200.847  0.0494303
It.    3, RSS =   0.227221, Par. =    200.842   0.049426
It.    4, RSS =   0.227221, Par. =    200.841  0.0494256
wtt3nlm  -- ss= 0.2272213 :  Vm = 200.841  K = 0.04942558; 4  itns
Show in New Window
weights from wtt3nlm
 [1] 3.910941e-04 3.910941e-04 9.460635e-05 9.460635e-05 5.539227e-05 5.539227e-05 
 [7] 3.687173e-05 3.687173e-05 2.745956e-05 2.745956e-05 2.475956e-05 2.475956e-05
```

The documentation of `wfct` suggests that it can also use the name of the
response (dependent) variable or the name of the predictor (independent)
variable. However, some models have more than one independent variable, and
I have not explored what happens in such cases.

It would be helpful if `nlsr` had the capability to include functional weights. 
Duncan Murdoch suggested a patch that allows this. We modify the `nlfb()` routine
to detect and act on functional weights. 


```{r formulawts}
library(nlsr)
Treated <- Puromycin[Puromycin$state == "treated", ]
# "regression" formula
w1frm <- rate ~ Vm * conc/(K + conc)
# Explicit dynamic weighted nlxb
w3frm <- rate/(Vm * conc/(K + conc)) ~ 1
dynweighted <- nlxb(w3frm, data = Treated, start = c(Vm = 201.003, K = 0.04696),
                    trace=TRUE, control=list(prtlvl=0))
pshort(dynweighted)
# Compute the model from the ORIGINAL model (w1frm) using parameters from dynweighted
dyn0 <- nlxb(w1frm, start=coefficients(dynweighted), data=Treated, control=list(jemax=0))
wfdyn0 <- 1/fitted(dyn0)^2 # weights
print(as.numeric(wfdyn0))
pshort(dyn0) # Shows sumsquares without weights, but computes weights we used
formulaweighted <- nlxb(w1frm, data = Treated, start = c(Vm = 201.003, K = 0.04696),
                        weights = ~ 1/fitted^2, trace=TRUE, control=list(prtlvl=0))
pshort(formulaweighted)
wtsfromfits<-1/fitted(formulaweighted)^2
print(as.numeric(wtsfromfits))
print(as.numeric(formulaweighted$weights))
```

There are differences in the (weighted) sums of squares. The **dynweighted** case minimizes
the residuals weighted
by the actual CURRENT fits as expressed by the formula. The **formulaweighted** case
computes the weights at each
iteration from the previous (or original) fit. It is a form of 
iteratively weighted least squares. The case
**wtt3nlm** above uses `nlsLM` with weights specified in `wfct()` and 
computes the weights once from an unweighted
model, then runs a second, statically weighted calculation. Keeping track of these
differences requires a lot of
care, and I advise documenting what is done carefully.

## Models that use multiple functional forms

The query by John Sorkin in 2013 on the R-help list, item
https://stat.ethz.ch/pipermail/r-help/2013-January/344918.html ,
asks about fitting a composite model made up of "pieces". Sorkin's
problem, as described, seems incomplete. We will use two problems
that we can present clearly. 

### Two straight lines

We will generate data for two 
intersecting straight lines. These will intersect at the point
`(kx, ky)`, with slopes `sl` and `sr` on the left and right hand
sides of the point.

```{r 2slA}
xx<-1:50 # simple sequence of 50 points
kx<-24 # Set x coordinate of intersect
ky<-10 # Set y coordinate of intersect
sl<-1 # left slope
sr<-0 # right slope
# now write down model using conditionals
formu<- yy ~ (xx <= kx)*(sl*(xx - kx)+ky) +  (xx > kx)*(sr*(xx - kx)+ky) 
# Generate "exact" model for given parameters
yy0<- (xx <= kx)*(sl*(xx - kx)+ky) +  (xx > kx)*(sr*(xx - kx)+ky) 
ee<-runif(50, -3, 3) # generate some "errors"
ee<-ee-mean(ee) # but center them
yy<-yy0+ee # and add error to the "exact" model
sldf<-data.frame(xx=xx, yy=yy) # make a data frame of the data
# Try to estimate a model from the data and the formula
t2sl<-try(nlxb(formu, data=sldf, start=c(kx=1, ky=1, sl=1, sr=1), trace=TRUE))
# But that fails because the ">" in the formula trips up derivative program
# We can still use approximate Jacobian
t2slx<-nlxb(formu, data=sldf, start=c(kx=1, ky=1, sl=1, sr=1), trace=TRUE,
            control=list(japprox="jacentral"))
t2slx
coef(t2slx)
plot(xx, yy)
yline<-fitted(t2slx)
lines(xx, yline, type='l', col='red')
```

### The WOOD test function

Let us try an example -- the Wood test function (Problem 14 in @More1981TUO).
Some authors refer to this as the Colville function 
(https://www.sfu.ca/~ssurjano/colville.html)

This problem is usually stated as a function minimization in
4 parameters. This is 

$$ f(x) = 100*({x_1}^2-x_2)^2+(x_1-1)^2+(x_3-1)^2+  \\
    90*({x_3}^2-x_4)^2+10.1((x_2-1)^2+(x_4-1)^2)+  \\
    19.8*(x_2-1)*(x_4-1)$$


It is, however, a nonlinear least squares problem for which we can write the
residuals as a set of formulas.

```{r woodform1}
library(nlsr)
library(minpack.lm)
frm<- 0 ~ (x == 1) * (10 * (`par[2]` - `par[1]` * `par[1]`)) + 
  (x == 2) * (1 - `par[1]`) + 
  (x == 3) * ((`par[4]` - `par[3]` * `par[3]`) * sqrt(90)) + 
  (x == 4) * (1 - `par[3]`) + 
  (x == 5) * ((`par[2]` + `par[4]` - 2) * sqrt(10)) + 
  (x == 6) * ((`par[2]` - `par[4]`) * sqrt(0.1))

testnls<- try(nls(frm, data=data.frame(x = 1:6), 
  start= c(`par[1]` = -3, `par[2]` = -1, `par[3]` = -3, `par[4]` = -1),
  trace=TRUE, control=list(maxiter=100)))

testnlxb<- try(nlxb(frm, data=data.frame(x = 1:6), 
              start= c(`par[1]` = -3, `par[2]` = -1, `par[3]` = -3, `par[4]` = -1),
              trace=TRUE))

testnlxbn<- try(nlxb(frm, data=data.frame(x = 1:6), 
              start= c(`par[1]` = -3, `par[2]` = -1, `par[3]` = -3, `par[4]` = -1),
              trace=TRUE, control=list(japprox="jacentral")))

testnlsLM<- try(nlsLM(frm, data=data.frame(x = 1:6), 
                start= c(`par[1]` = -3, `par[2]` = -1, `par[3]` = -3, `par[4]` = -1),
                trace=TRUE, control=list(maxiter=100)))
```

The results above for `nlsr::nlxb()` accord with the accepted solution, but
a large number of iterations are used. The Wood problem is,
however, known to be difficult. 

Both `nls()` and `minpack.lm::nlsLM()` required the specification of more than
the default number of "iterations" (essentially Jacobian evaluations, which
`nlsr` limits with the control parameter `jemax` for which the default is a
very aggressive 5000). Note that `nls()` did NOT terminate except on this limit.
I have not determined, as yet, the precise cause of this.

## Ongoing efforts

The examples above show how to use package `nlsr` as it exists at the beginning
of February 2024. The story, however, is not finished. It would be very 
nice to be able to identify and work with models that are partially 
linear. Solving such models is possible with the `algorithm="plinear"`
option of `nls()`, but identifying which parameters enter linearly is 
not easy for most users. Moreover, the specification
of the model to the `VARPRO` solver that is called by the `plinear` 
option is inconsistent with the general specification used by `nls()` and 
other nonlinear least squares tools for R.

There are also a number of possible tweaks to details in `nlsr` and 
efforts to include such improvements are a continuing interest.
I welcome communication and collaboration to continue the improvement 
of the package and R in general. 


## References