Go to:
CoHort Software |
CoStat |
CoStat Statistics
## Polynomial Regression in CoStatPolynomial equations have the general form: y = b where b - A linear equation
(y = b
_{0}+ b_{1}x) is called a first order polynomial. - A quadratic polynomial equation
(y = b
_{0}+ b_{1}x + b_{2}x^{2}) is called a second order polynomial. - A cubic polynomial equation
(y = b
_{0}+ b_{1}x + b_{2}x^{2}+ b_{3}x^{3}) is called a third order polynomial. - Higher (4th or 5th) order polynomials are useful for attempts to describe data points as fully as possible, but the terms generally cannot be meaningfully interpreted in any biological or physical sense. Higher order terms can lead to odd and unreasonable results, especially beyond the range of the x values.
Graph : Dataset :
Representations in CoPlot) or other methods (for example,
"Transformations : Smooth"),
too.
There must be at least two numeric columns of data; you can designate any column as the x column and any column as the y column. Rows of data with missing values in the x or y column are rejected.
`X Column:`- Choose the x column from a list of the columns.
`Y Column:`- Choose the y column (the dependent variable) from a list of the columns.
`Degree:`- Specify the polynomial order.
For example,
`Degree=2`will generate a quadratic equation (for example,`y = 0.32 + 0.15*x + 0.02*x^2`). `Keep If:`- lets you enter a boolean expression (for example,
). Each row of the data file is tested. If the equation evaluates to__(col(1)>50) and (col(2)<col(3))__`true`, that row of data will be used in the calculations. If`false`, that row of data will be ignored. See "Using Equations", "the`A`button", and "the`f()`button". `Calculate Constant:`- In most cases
`checked`is appropriate.`Not checked`will produce a curve passing through the origin (x=0, y=0). -
`Print Residuals:` - prints the X values, Y observed, Y expected, and Residual (Y observed - Y expected). These are commonly printed so you can see if the residuals appear to be random (that's good) or if there is some trend (that's bad; maybe some other type of equation is more suitable).
`Save Residuals:`- This lets you optionally insert two new columns in the data file
with the expected Y's and the residuals.
You can then use CoPlot
to plot
`X`vs.`Y Observed`and`Y Expected,`or plot`X`vs. the residuals. `OK`- Press this to run the procedure when all of the settings above are correct.
`Close`- Close the dialog box.
The data for the sample run is a made-up set of x and y data points: PRINT DATA 2000-08-04 16:17:44 Using: c:\cohort6\expdata.dt First Column: 1) X Last Column: 2) Y First Row: 1 Last Row: 8 X Y --------- --------- 1 2 2 3.5 3 8 4 17 5 28 6 39 7 54 8 70 For the sample run, use - From the menu bar, choose:
`Statistics : Regression : Polynomial regression` -
`X Column: 1) X` -
`Y Column: 2) Y` -
`Degree: 2` -
`Keep If:` -
`Calculate constant: (checked)` -
`Print Residuals: (checked)` -
`Save Residuals: (don't)` -
`OK`
REGRESSION: POLYNOMIAL 2002-09-26 16:11:26 Using: C:\cohort6\expdata.dt X Column: 1) X Y Column: 2) Y Degree: 2 Keep If: Calculate Constant: true Total number of data points = 8 Number of data points used = 8 Regression equation: y = 0.54464285714 -0.5625*x^1 +1.16369047619*x^2 R^2 is the coefficient of multiple determination. It is the fraction of total variation of Y which is explained by the regression: R^2=SSregression/SStotal. It ranges from 0 (no explanation of the variation) to 1 (a perfect explanation). R^2 = 0.99893689645 For each term in the ANOVA table below, if P<=0.05, that term was a significant source of Y's variation. Source SS df MS F P ------------------------ ------------- -------- --------- --------- --------- Regression 4352.83630952 2 2176.4182 2349.1054 .0000 *** x^1 4125.33482143 1 4125.3348 4452.6582 .0000 *** x^2 227.501488095 1 227.50149 245.55252 .0000 *** Error 4.63244047619 5 0.9264881 ------------------------ ------------- -------- --------- --------- --------- Total 4357.46875 7 Table of Statistics for the Regression Coefficients: Column Coef. Std Error t(Coef=0) P +/-95% CL ------------------------ --------- --------- --------- --------- --------- Intercept 0.5446429 1.342886 0.4055764 .7018 ns 3.4519984 x^1 -0.5625 0.6846597 -0.821576 .4487 ns 1.7599737 x^2 1.1636905 0.0742618 15.670116 .0000 *** 0.190896 Degrees of freedom for two-tailed t tests = 5 If P<=0.05, the coefficient is significantly different from 0. Residuals: Row X Y observed Y expected Residual --------- ------------- ------------- ------------- ------------- 1 1 2 1.14583333333 0.85416666667 2 2 3.5 4.0744047619 -0.5744047619 3 3 8 9.33035714286 -1.3303571429 4 4 17 16.9136904762 0.08630952381 5 5 28 26.8244047619 1.1755952381 6 6 39 39.0625 -0.0625 7 7 54 53.6279761905 0.37202380952 8 8 70 70.5208333333 -0.5208333333 If the constant term is not calculated (uncheck that checkbox), the curve will be forced through the origin. The results are then: REGRESSION: POLYNOMIAL 2002-09-26 16:14:38 Using: C:\cohort6\expdata.dt X Column: 1) X Y Column: 2) Y Degree: 2 Keep If: Calculate Constant: false Total number of data points = 8 Number of data points used = 8 Regression equation: y = -0.3076671035*x^1 +1.13870685889*x^2 R^2 is the coefficient of multiple determination. It is the fraction of total variation of Y which is explained by the regression: R^2=SSregression/SStotal. It ranges from 0 (no explanation of the variation) to 1 (a perfect explanation). R^2 = 0.99954387736 For each term in the ANOVA table below, if P<=0.05, that term was a significant source of Y's variation. Source SS df MS F P ------------------------ ------------- -------- --------- --------- --------- Regression 10485.4651595 2 5242.7326 6574.1784 .0000 *** x^1 9787.10294118 1 9787.1029 12272.638 .0000 *** x^2 698.362218282 1 698.36222 875.71848 .0000 *** Error 4.78484054172 6 0.7974734 ------------------------ ------------- -------- --------- --------- --------- Total 10490.25 8 Table of Statistics for the Regression Coefficients: Column Coef. Std Error t(Coef=0) P +/-95% CL ------------------------ --------- --------- --------- --------- --------- x^1 -0.307667 0.2523271 -1.219318 .2685 ns 0.6174222 x^2 1.1387069 0.0384795 29.592541 .0000 *** 0.094156 Degrees of freedom for two-tailed t tests = 6 If P<=0.05, the coefficient is significantly different from 0. Residuals: Row X Y observed Y expected Residual --------- ------------- ------------- ------------- ------------- 1 1 2 0.83103975535 1.16896024465 2 2 3.5 3.93949322848 -0.4394932285 3 3 8 9.3253604194 -1.3253604194 4 4 17 16.9886413281 0.01135867191 5 5 28 26.9293359546 1.07066404543 6 6 39 39.1474442988 -0.1474442988 7 7 54 53.6429663609 0.35703363914 8 8 70 70.4159021407 -0.4159021407 Note that the Total degrees of freedom equals the number of data points (1 greater than before), since the estimated mean was not used in the regression. The R^2 value is higher than the R^2 value for the model with a constant term(!). Remember that the R^2 value is calculated a different way when there is no constant term (see "Regression - Details - R^2" and "Regression - Constant term").
Go to: CoHort Software | CoStat | CoStat Statistics | Top |