Professional advice for entrepreneurs and business managers in the context of Europe's recovery from the financial crises. Marketing notes, stories and videos.

Regression

Regression

Regression Analysis – predicting the future



Let’s start with the definition of regression: Regression is a prediction equation that relates the dependent (response) variable (Y) to one or more independent (predictor) variables (X1, X2).

In marketing, the regression analysis is used to predict how the relationship between two variables, such as advertising and sales, can develop over time. Business managers can draw the regression line with data (cases) derived from historical sales data available to them.

The purpose of regression analysis is to describe, predict and control the relationship between at least two variables. The basic principle is to minimise the distance between the actual data and the perditions of the regression line. Regression analysis is used for variations in market share, sales and brand preference and this is normally done using variables such as advertising, price, distribution and quality.

  •  Regression analysis is used:
  •  To predict the values of the dependent variable
  • To determine the independent variables
  • To explain significant variation in the dependent variable and whether a relationship between variables exists
  • To measure strength of the relationship
  • To determine structure or form of the relationship

Example:

An online t-shirt sales company invested in Google AdWords advertising:

  • £1000 in January
  • £1000 in February
  • £1000 in March

Their sales grew steadily in this period:

  • £5000 in January
  • £5500 in February
  • £6000 in March

The managers can predict by looking at the regression line that with current level of advertising spent (£1000 per month) the sales in April will be £6500. This obviously would be the case if all other things remain equal but in reality they never do. The sales managers should use the prediction data from the regression analysis as an additional managerial tool but should not exclusively rely on it. The level of sales can be affected by elements other than the level of advertising. This includes, but is not limited to, factors such as weather conditions or the central bank’s increase or decrease of base interest rates. Regression analysis is concerned with the nature and degree of association between variables but does not assume causality (does not explain why there is relationship between variables). Other good examples of how regression analysis can be used to test marketing relevant hypothesis are: Can variation in demand be explained in terms of variation in fuel prices? Are consumers’ perceptions of quality determined by their perceptions in price? For a simple tutorial about the regression analysis for beginners please view the video below:

Regression analysis consists of number of statistics used to determine its accuracy and usefulness for certain purpose. Some of those statistics and methods are clearly explained by the statistics experts in the videos listed below. It is recommended that you read the text first and then watch the corresponding video:

  • Product Moment Correlation (r) is a statistic summarising the strength of association between two metric variables (for example: X and Y). It is used to determine whether a linear (straight line) relationship exists between X and Y. It indicates the degree to which the variation in one variable (X) is related to the variation in another variable (Y) (also known as Pearson or Simple Correlation, Bivariate Correlation or Correlation Coefficient). Covariance is a systematic relationship between two variables in which a change in one implies a corresponding change in the other (COV x Y).  The correlation coefficient between two variables will be the same regardless of their units of measurement. If r = 0.93 (a value close to 1.0) it means that one variable is strongly associated with the other. It does not matter which variable is considered dependent and which independent (X with Y) or (Y with X). The ‘r’ is designed to measure the strength of linear relationship, thus r= 0 does not suggest that there is no relationship between X and Y as there could be a non-linear relationship between the two.

  • Residuals – the difference between the observed value of Y and the value predicted by the regression equation.

  • Partial Correlation Coefficient – measures the association between the variables after adjusting for the effect of one or more additional variables. For example: how strongly related are sales to advertising expenditure when the effect of price is controlled?
  • Part Correlation Coefficient – is a measure of the correlation between Y and X when the linear effects of the other independent variables have been removed from X but not from Y.
  • Non-metric Correlation – a correlation measure for two non-metric variables that rely on rankings to compute the correlations.
  • Scatter Diagram – is a plot of values of two variables for all the cases of observation. The dependent variable on the vertical axis and the independent variable on the horizontal axis. If one variable increases so does the other – there is a linear relationship between X and Y. Scattergram
Image sourced from Flat World Knowledge
  • Least Squares Procedure – is a technique for fitting straight line to a scattergram by minimising the vertical distances of all the points from the line. The best fitting line is a regression line. The vertical distance from the point to the line is the error (e). read more
  • Significance Testing – significance of the linear relationship between X and Y may be tested by examining two hypothesis:
    • There is no linear relationship between X and Y
    • There is a relationship (positive or negative) between X and Y

The strength and significance of association is measured by the coefficient of determination r-square (r2). Significance Testing involves testing the significance of the overall regression equation as well as specific partial regression coefficients.

Multiple Regression

Multiple Regression is extremely relevant to business analysis. Itinvolves single dependent variable such as sales and two or more independent variables such as employee remuneration, number of staff, level of advertising, online marketing spend. For example: can variation in sales be explained in terms of variation in advertising expenditures, prices and level of distribution? It is possible to consider additional independent variables to answer the question raised. Statistics relevant to multiple regression are: adjusted r-square (r2) – coefficient of multiple determination is adjusted for the number of independent variables and the sample size to account for diminishing returns. To get more insight into multiple regression and understand how other statistics such as significance testing influence the usefulness of the analysis please watch the video below:

Multicollinearity – a state of high inter-correlation among independent variables. When multi-collinearity is present, special care is required in assessing the importance of independent variables. Here, once again, it is recommended that you watch the video below.

References:

  • Tuk, M., 2012. Regression Analysis, Marketing Analytics. Imperial College London, unpublished.
  • Malhotra, K. N. and Birks, F.D., 2000. Marketing Research. An applied approach. European Edition. London: Pearson

Written by