Interactions Effects in ANOVA using R
I would like to estimate and test interaction effects but I am unable to do so. The data consist of three factors with three levels and one response:
X1 = c("250", "250", "250", "300", "300", "300", "350", "350", "350")
X2 = c("1.5", "3.0", "4.5", "1.5", "3.0", "4.5", "1.5", "3.0", "4.5")
X3 = c("0.24", "0.34", "0.44","0.34", "0.44", "0.24", "0.44", "0.24", "0.34")
Y = c("0.317", "0.377", "0.400", "0.426", "0.513", "0.408", "0.541", "0.444", "0.606")
df = data.frame(X1, X2, X3, stringsAsFactors = TRUE)
df$Y= as.numeric(Y)
Performing an ANOVA analysis we find out there is no significant effect of the factors to the response.
anova(lm(Y ~ X1+ X2+ X3, data = df))
Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
X1 2 0.041173 0.0205863 8.6594 0.1035
X2 2 0.002867 0.0014333 0.6029 0.6239
X3 2 0.015650 0.0078250 3.2915 0.2330
Residuals 2 0.004755 0.0023773
If I try to look at interactions I get a warning message.
anova(lm(Y ~ X1+ X2+ X3 + X1*X2 + X1*X3 + X2*X3, data = df))
Warning message:
In anova.lm(lm(Y ~ X1 + X2 + X3 + X1 * X2 + X1 * X3 + X2 * X3, data = df)) :
ANOVA Ftests on an essentially perfect fit are unreliable
Can anybody help me to estimate and test interactions?
Thanks!
See also questions close to this topic

2 yaxes Dumbbell ggplot2
I am quite new to R and programming in general. So please forgive my ignorance, I am trying to learn.
I have two sets of data and I would like to plot them against each other. Both have 27 rows and 3 columns; one set is called "range" and the other is called "rangePx". Column “Comp” has the different components, column “Min” is the minimum concentration in % and column “Max” is the maximum concentration in %.
I want to make a 2y axis dumbbell plot, with the y axis being the different components and x axis being the concentration.
I do manage to create 1 y axis dumbbell plot, but I have troubles to add the second y axis.
Here is a snap from the "range" data
head(range) # A tibble: 6 x 3 Comp Min Max <chr> <dbl> <dbl> 1 Methane 0.0100 100 2 Ethane 0.0100 65.0 3 Ethene 0.100 20.0 4 Propane 0.0100 40.0 5 Propene 0.100 6.00 6 Propadien 0.0500 2.00
and here is a snap from the "rangePx" data
head(rangePx) # A tibble: 6 x 3 Comp Min Max <chr> <dbl> <dbl> 1 Methane 50.0 100 2 Ethane 0.00800 14.0 3 Ethene 0 0 4 Propane 0.00800 8.00 5 Propene 0 0 6 Propadien 0 0
Here is the piece of code that I use:
library(ggplot2) library(ggalt) library(readxl) theme_set(theme_classic()) range < read_excel(range.xlsx) rangePx < read_excel(rangePx.xlsx") p < ggplot(range, aes(x=Max, xend=Min, y = Comp, group=Comp)) p < p + geom_dumbbell(color="blue") p px < ggplot(rangePx, aes(x=Max, xend=Min, y = Comp, group=Comp)) px < px + geom_dumbbell(color="green") p < p + geom_dumbbell(aes(y=px, color="red")) p
and here is the complain I get when I call
p
:Error: Aesthetics must be either length 1 or the same as the data (27): y, colour, x, xend, group
Here I saw a 6x3 data frame but my original data are 27x3
can anyone help me?
Thnx in advance

Trouble trying to clean a character vector in R data frame (UTF8 encoding issue)
I'm having some issues cleaning up a dataset after I manually extracted the data online  I'm guessing these are encoding issues. I have an issue trying to remove the "U+00A0" in the "Athlete" column cels along with the operator brackets. I looked up the corresponding UTF8 code and it's for "NoBreakSpace". I'm also not sure how to replace the other UTF8 characters to make the names legible  for e.g. getting U+008A to display as Š.
Subset of data
head2007decathlon < structure(list(Rank = 1:6, Athlete = c("<U+00A0>Roman <U+008A>ebrle<U+00A0>(CZE)", "<U+00A0>Maurice Smith<U+00A0>(JAM)", "<U+00A0>Dmitriy Karpov<U+00A0>(KAZ)", "<U+00A0>Aleksey Drozdov<U+00A0>(RUS)", "<U+00A0>Andr<e9> Niklaus<U+00A0>(GER)", "<U+00A0>Aleksey Sysoyev<U+00A0>(RUS)"), Total = c(8676L, 8644L, 8586L, 8475L, 8371L, 8357L), `100m` = c(11.04, 10.62, 10.7, 10.97, 11.12, 10.8), LJ = c(7.56, 7.5, 7.19, 7.25, 7.42, 7.01), SP = c(15.92, 17.32, 16.08, 16.49, 14.12, 16.16), HJ = c(2.12, 1.97, 2.06, 2.12, 2.06, 2.03), `400m` = c(48.8, 47.48, 47.44, 50, 49.4, 48.42), `110mh` = c(14.33, 13.91, 14.03, 14.76, 14.51, 14.59), DT = c(48.75, 52.36, 48.95, 48.62, 44.48, 49.76), PV = c(4.8, 4.8, 5, 5, 5.3, 4.9), JT = c(71.18, 53.61, 59.84, 65.51, 63.28, 57.75), `1500m` = c(275.32, 273.52, 279.68, 276.93, 272.5, 276.16), Year = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "2007", class = "factor"), Nationality = c(NA, NA, NA, NA, NA, NA)), .Names = c("Rank", "Athlete", "Total", "100m", "LJ", "SP", "HJ", "400m", "110mh", "DT", "PV", "JT", "1500m", "Year", "Nationality"), row.names = c(NA, 6L), class = c("tbl_df", "tbl", "data.frame"))
This is what I've tried so far to no success:
1) head2007decathlon$Athlete < gsub(pattern="\U00A0",replacement="",x=head2007decathlon$Athlete) 2) head2007decathlon$Athlete < gsub(pattern="<U00A0>",replacement="",x=head2007decathlon$Athlete) 3) head2007decathlon$Athlete < iconv(head2007decathlon$Athlete, from="UTF8", to="LATIN1") 4) Encoding(head2007decathlon$Athlete) < "UTF8" 5) head2007decathlon$Athlete< enc2utf8(head2007decathlon$Athlete)

Create a plot from boxplot.stats
Someone sent me a file containing the list of boxplot.stats.
I now want to reproduce and plot this boxplot from the list. (I have stats, n , conf and out).How should I proceed? Can I use plotly for this purpose?

Predictive function on list of lists
I am in need of running predictive model where input is list of lists and output is list of lists. Lists are in a form of:
> ListActive[[1]][1] [[1]] [[1]][[1]] [,1] [,2] [,3] [,4] [1,] 1.079881191 1.079881191 1.079881191 1.079881191 > ListDevActive[[1]][1] [[1]] [[1]][[1]] [,1] [,2] [,3] [,4] [1,] 0.00005559801813 0.00005512118097 0.00005488276239 0.00005488276239
Each seen up lists is composed of 28 lists having 8761 lists and each of those 8761 is made of 1 list.
I need to find a predictive models that will spit at what value of ListActive the ListDevActive is equal to 0. I have already tried to run neural network on all the data but without success. I highly appreciate any idea how to make if efficiently and effectively. As you can see the data is "sometimes" not changing. Condition is that model should run if any of the values in ListDevActive is higher than 0.05.

Setting Different Levels of constants for categorical variables in R
Will anyone be able to explain how to set constants for different levels of categorical variables in r?
I have read the following: How to set the Coefficient Value in Regression; R and it does a good job for explaining how to set a constant for the whole of a categorical variable. I would like to know how to set one for each level.
As an example, let us look at the MTCARS dataset:
df < as.data.frame(mtcars) df$cyl < as.factor(df$cyl) set.seed(1) glm(mpg ~ cyl + hp + gear, data = df)
This gives me the following output:
Call: glm(formula = mpg ~ cyl + hp + gear, data = df) Coefficients: (Intercept) cyl6 cyl8 hp gear 19.80268 4.07000 2.29798 0.05541 2.79645 Degrees of Freedom: 31 Total (i.e. Null); 27 Residual Null Deviance: 1126 Residual Deviance: 219.5 AIC: 164.4
If I wanted to set cyl6 to .34 and cyl8 to 1.4, and then rerun to see how it effects the other variables, how would I do that?

gee with longitudional data and corrstructure
Trying to fit GEE's in R using a
"fixed"
correlation structure on longitudional data with a single cluster I run into various problems. All GEE packages request thecor.struct
(when set to "fixed") to have the same number of rows/colums as the data and to be larger than the largest cluster. I tested the following functions for gee.gee::gee
runs indefinityly when supplying a cor.struc with longitudional data. It abborts when using short cor.struct.gee(formula = dep ~ var1 * var2, family = gaussian, data = data, id = id, corstr = "fixed", R = cor.struct)
geepack::geeglm
uses a cor.struct in a vector format (zcor). Using the long format returns Std.err of 0 for all predictor combinations.zcor < rep(cor.struct[lower.tri(cor.struct)], 1)
geeglm(formula = dep ~ var1 * var2, family = "gaussian", data = data, id = id, corstr = "fixed", zcor = zcor)
geem::geem
returns the following when running with long data. It abborts when using the short cor.struct.if(determinant(R, logarithm=T)$modulus == Inf){ stop("supplied correlation matrix is not invertible.") }
geem(formula = dep ~ var1 * var2, family = "gaussian", data = data, id = id, corstr = "fixed", corr.mat = cor.struct)
data
There are multiple observation by species. The cor.struct is extracted based on phylogenetic distances
ape::vcv
. I simply doubled the entries for the species that have multiple records in the data in the cor.structure. The data below is an extract. The org. has about 1900 time series of >800 species.data < t(matrix(data=c("sp1", 0, "A", "A1", 1,"sp1", 0.1, "A", "A1", 1,"sp2", 0,"B", "A1", 1,"sp3", 1, "B", "A1", 1, "sp3", 1, "B", "A2", 1,"sp3", 0, "B", "B1", 1),nrow=5,dimnames =list(c("species","dep","var1","var2","id")))) cor.struct < matrix(data=c(1,0.86,1,0.80,0.7,1,0.8,0.86,0.7)) cor.struct < cor.struct[rep(seq_len(nrow(cor.struct)), as.vector(table(data[,1]))),] cor.struct < t(cor.struct) cor.struct < cor.struct[rep(seq_len(nrow(cor.struct)), as.vector(table(data[,1]))),]
Any help appreciated on how to fit a GEE on longitudional data with a fixed correlation structure and a single cluster.

Interpretation Interaction effect IV*covariate in repeated measures anova
I have performed a 2x2 repeated measures anova for my withinsubjects design experiment. I added a few covariates and SPSS shows me a significant interaction effect between the independent variable condition and a continious covariate. Any clues how to interpret such an interaction effect?
All help will be very much appreciated!

Correct linear model code in R
Consider a dataset that has 3 treatments x 3 replicate tanks per treatment x 3 snails in each tank. I am looking to see if there is a significant effect of treatment on weight. To do this I would like to use a mixed model ANOVA where treatments are fixed factors and tanks are nested as random factors within treatments. I'm just not sure if I have the correct R code to achieve this.
Which of these options is correct to achieve this (if any?). Thank you
library(lme4)
Option 1:
lmer(Weight ~ Treatment+ (1Tank), data=df)
Option 2:
lmer(Weight ~ Treatment+ (1 Treatment/Tank), data=df)
Option 3:
lmer(Weight~Treatment+ (1 Treatment:Tank), data=df)
Option 4:
lmer(Weight~Treatment+ (1+ TankTreatment), data=df)
Data:
dput(df) structure(list(Treatment = structure(c(1L, 1L, 1L, 3L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("A", "B", "C"), class = "factor"), Tank = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9), Fish = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3), Weight = c(100, 111, 120, 130, 140, 145, 160, 134, 120, 123, 145, 121, 187, 189, 125, 164, 178, 182, 125, 152, 156, 176, 111, 100, 122, 111, 142)), .Names = c("Treatment", "Tank", "Fish", "Weight"), row.names = c(NA, 27L), class = c("tbl_df", "tbl", "data.frame"))

Exporting 3way ANOVA TukeyHSD results and then creating a visual matrix from corrplot
I am conducting an ANOVA using this code
HBWallaov < aov(Value ~ PASelection + Model + Season + Model:Season:PASelection, data = HBWall) TukeyHSD(HBWallaov)
I am now trying to export it to create a dataframe, however I'm interested in the interaction between Model:Season:PASelection. I would like the data frame to be exported to something like:
PASelection = Random or SRE Model = GAM, GLM, RF, MARS, Season = Summer, Winter
Ideally I would like to create a matrix like Diagram of what i'd like my matrix to look like
To get the data into a dataframe
HBWdf < aov(Value ~ Model:Season:PASelection, data = HBWall) df < TukeyHSD(HBWdf, "Model:Season:PASelection", ordered = TRUE) as.data.frame(df$Model)
But since the model contains three levels, I can't seem to figure out how to separate them.

How Can I Run a Nonlinear Regression Macro in Minitab (simple syntax error)?
I'm new to scripting Minitab 17 and have run into a snag that I can't find any documentation for, including an error message that brings up no hits on Google. All I want to do is generate macros that perform simple nonlinear regressions automatically, all of which execute just fine in the GUI or through Session Commmands. If I follow the directions on p. 10 of the Minitab Macros documentation and copy the commands I've successfully run from the Project Manager/History folder, copy them into a .MAC file and surround them with GMACRO and ENDMACRO commands, I end up with the code below:
GMACRO NLinear; Response 'MyColumn1'; Continuous 'MyOtherColumn2'; Parameter "Theta1" 0.5; Parameter "Theta2" 0.2; Expectation Theta1 * ln (MyOtherColumn2  Theta2 ); NoDefault; TMethod; TStarting; TConstraints; TEquation; TParameters; TSummary; TPredictions. ENDMACRO
The code between the MACRO statements runs OK from the GUI or as a Session Command. When I run the resulting macro file from the session prompt in Minitab, however, I invariably receive the following error: "Arguments not allowed in all global macro mode." I also receive syntax errors for every column that includes quote marks, even though that is standard session window syntax; I can eliminate these by substituting the column heading from my open worksheet, such as "C1", but can't get past the other error.
I'm obviously using some kind of incorrect syntax element(s) but can't pin them down  does anyone have any ideas? There are plenty of instructional materials on Minitab macros on the Web, but I haven't yet encountered any that deal with either this particular error or that delve much into how to execute ordinary Minitab tests of this kind. My goal is merely to write batch files that will do all my nonlinear regressions on offhours etc. Thanks in advance.

How to understand what should be the period of moving average process to be used?
In the analysis of time series data ,for wiping out cyclical fluctuations in data provided from 19691991, I'm facing problem to choose the period of moving average.

Linear Regression Including Variables in Computed Variable
I have a computed variable that is created from two other variables from my data. If I was to complete a Linear Regression Model with my computed variable as the dependent variable, would I also include the two variables I used to originally create the computed variable in the independent list?
A=BC (A is computed from variables B&C)
A is now the dependent in my model
Are B&C included with the other independent variables in the data set when applying linear regression.
Many thanks in advance for any information!