
With dummies, for each category of you model you receive a weight, a real floating point value. But is there any automated process, since identifying categories, not just two, in large data is almost impossible. In this case it is easy to identify the different patterns, depending on sex, and build then 2 regression lines for the 2 categories. Now if we take the whole population, correlations are almost null, but if we compute correlations or regression on a women and men samples, they are almost perfect. The smaller the cars they drive, the higher their income.

Here the example to illustrate: Men(x1) with higher income(y) drive powerful cars(x2). I wanted to test before applying to my data pool. I built an example to check how the multi regression model reacts but as expected it doesnt produce anything usable. You can now perform multiple regression on the X data in range F3:I19 and Y data in range J3:J19 using the Linear Regression data analysis tool.Ĭan you please tell me whether a regression models exists for this situation where a dependent variable is affected from the independent variables differently whether it belongs to a category or not. The output is as shown in range F3:J19 of Figure 1.
#Xlminer analysis toolpak how to get p value three variable code
Now click on Gender from the list and press the Add Code button and, finally, click on Income from the list and press the Add Column button. As you can see, the Party#1 and Party#2 variables have been added to the worksheet. The result is shown on the left side of Figure 5. Since the Ordinary coding option was selected, the 0, 1 coding is used. Next, select Party from the list box and press the Add Code button (as shown on the right side of Figure 5). Now, select Age in the list box and press the Add Column button (to copy the Age data to the output). Now enter A3:D19 into the Input Range in the dialog box (as shown on the right side of Figure 4) and press the OK button.įigure 4 – Coding the Party variable via the data analysis tool You can use the Real Statistics Extract Columns from a Data Range data analysis tool to automate the coding of categorical variables.įor example, to create the coding for the Party and gender variables from Example 1, press Ctrl-m and select Extract Columns from a Data Range from the menu. Similarly, we see that the model forecasts that a 40 year-old man who is Independent will have an income of 52,697 (cell J23).įigure 3 – Forecasting with categorical data From Figure 3, you can see that the model forecasts that this person would have an income of 24,494 (cell J22) as calculated by the formula =TREND(J4:J19,F4:I19,F22:I22). The model can predict the income of a 25 year old woman who is a Democrat, provided you recognize that the coding is Age = 25, Gender1 = 0, Party1 = 0, Party2 = 1.


The output from the Real Statistics Linear Regression data analysis tool on this input is shown in Figure 2.įigure 2 – Regression with categorical data We can now perform regression analysis on this range. The resulting coding is as shown in range F3:J19 of Figure 1. Party2 = 1 if Party is Dem and Party2 = 0 otherwise Party1 = 1 if Party is Rep and Party1 = 0 otherwise Since Party takes three values (Rep, Dem, Ind), two dummy variables, called Party1 and Party2, are needed to code Party, defined as follows: Gender1 = 1 if Gender is Male and Gender1 = 0 otherwise (i.e. Since Gender takes two values (Male and Female), one dummy variable, called Gender1, is sufficient to code Gender, defined as follows: In general, if the original data has k categorical values, the model will require k – 1 dummy variables.

There are three possible values for the Party affiliation variable and two possible values for the Gender. Each such dummy variable will only take the value 0 or 1 (although in ANOVA using Regression, we describe an alternative coding that takes values 0, 1 or -1).Įxample 1: Create a regression model for the data in range A3:D19 of Figure 1.Īge is a continuous random variable, while Party affiliation and Gender are categorical random variables. Categorical independent variables can be used in a regression analysis, but first, they need to be coded by one or more dummy variables (also called tag variables).
