Interactions Effects in ANOVA using R
I would like to estimate and test interaction effects but I am unable to do so. The data consist of three factors with three levels and one response:
X1 = c("250", "250", "250", "300", "300", "300", "350", "350", "350")
X2 = c("1.5", "3.0", "4.5", "1.5", "3.0", "4.5", "1.5", "3.0", "4.5")
X3 = c("0.24", "0.34", "0.44","0.34", "0.44", "0.24", "0.44", "0.24", "0.34")
Y = c("0.317", "0.377", "0.400", "0.426", "0.513", "0.408", "0.541", "0.444", "0.606")
df = data.frame(X1, X2, X3, stringsAsFactors = TRUE)
df$Y= as.numeric(Y)
Performing an ANOVA analysis we find out there is no significant effect of the factors to the response.
anova(lm(Y ~ X1+ X2+ X3, data = df))
Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
X1 2 0.041173 0.0205863 8.6594 0.1035
X2 2 0.002867 0.0014333 0.6029 0.6239
X3 2 0.015650 0.0078250 3.2915 0.2330
Residuals 2 0.004755 0.0023773
If I try to look at interactions I get a warning message.
anova(lm(Y ~ X1+ X2+ X3 + X1*X2 + X1*X3 + X2*X3, data = df))
Warning message:
In anova.lm(lm(Y ~ X1 + X2 + X3 + X1 * X2 + X1 * X3 + X2 * X3, data = df)) :
ANOVA Ftests on an essentially perfect fit are unreliable
Can anybody help me to estimate and test interactions?
Thanks!
See also questions close to this topic

R Panel Data Reg, RE/FE equal
I try to analyse a real world dataset with panal data regression, but get some weird things which im not sure why it occures.
In shot, the Hausmann test is equal to 1 and the estimate of fixeffects model (FE) and randomeffects model (RE) are exactly equal. I know that a true RE model can be estimated consistantly with a FE approach, but im confused that the estimates are not only close but equal and the hausmanntest shows 1 (which im considering as unrealistic for real world data)
Here is what im doing:
data_ = read.csv("Output_final.csv",header = T, sep = ";")
(some subsets)
d1 = plm.data(data, c("city","year")) within = plm(costs ~ city + people + .... , model = "within" , effects = "individual",data = d1) random = plm(costs ~ city + people + .... , model = "within" , effects = "random",data = d1)
All the timeinvariate independent variables falling out in the FE model, but the others give me the same estimates like the RE model, with in addition estimates individual dummys.
I just read some tutorials but dont get a usefull answer. So, my question is, am i doing something wrong in R (plm function) that causes this result? If not, i have to search for an other explanation (maybe from an statitical point of view > strong clusters)
Im grateful for every hint! Thanks

Loop for tuber package to extract youtube comments in R
For my master's thesis I want to perform a sentiment analysis in R based on youtube comments, which I want to extract from different videos.
I created a table containing all the video id's needed for the
get_all_comments()
function of the tuber package. After that I created a vector which contains all the necessary video id's (73 in total).However, the
get_all_comments()
function is restricted to only onevideo_id
. That is why I tried to write a loop in order to avoid to write down the function 73 times for each video id. Sadly, it did not work out.Does anyone of you know how i could solve this problem? Or maybe somebody knows how to write the loop for this specific issue.
Thank you in advance and best regards!

random.forest.importance and XTS package
I am quite new to R coding, the TTR/XTS package and random.forest.importance functions.
I am extracting trading data using the xts function, calculating whether the difference between Close and Open is positive/negative/flat, applying a handful of technical indicators using the TTR function , and then combining the indicators to calculate the random.forest.importance function.
When I run the code, I get the Error in model.frame.default(formula, data, na.action = NULL) : variable lengths differ (found for 'Close').
Data:
Date Time Open High Low Close TVolume 20171012 14:00:00 1.18462 1.18487 1.18334 1.18347 1165 20171012 15:00:00 1.18351 1.18377 1.18295 1.18347 884 20171012 16:00:00 1.18348 1.18348 1.18265 1.18276 1000 20171012 17:00:00 1.18245 1.18329 1.18242 1.18303 1184 20171012 18:00:00 1.18305 1.18373 1.18284 1.18343 469 20171012 19:00:00 1.18343 1.18343 1.18247 1.18303 886
Code as follows:
pkgs < c('class', 'gmodels', 'quantmod', 'TTR','xts','corrplot','caret','FSelector') z < head(tail(hist_r, samples+retro), samples) z < as.xts(z[,2:6], order.by=as.POSIXct(z$Timestamp, origin='19700101 00:00', tz='UTC')) hist < getHist(z) h < as.xts(hist) price < z$Closez$Open class = ifelse(price > 0,""'UP'"",ifelse(price <0,""'DOWN'"",'""FLAT'"")) forceindex < (z$Closez$Open) * z$TVolume WillR5 < WPR(z[,c(""'High'"",""'Low'"",""'Close'"")], n = 5) dataset = data.frame(class,forceindex,WillR5) dataset = na.omit(dataset) dput(head(dataset, 10)) set.seed(5) weights < random.forest.importance(class~., dataset, importance.type = 1) print(weights)
When I run dput, i get the following:
tructure(list(Close = structure(c(1L, 3L, 1L, 1L, 1L, 3L, 1L, 3L, 1L, 3L), .Label = c("DOWN", "FLAT", "UP"), class = "factor"), Close.1 = c(12.9382400000007, 0.107400000000227, 1.66915000000001, 0.290530000000006, 1.18667999999979, 0.0752800000000753, 0.244080000000094, 0.0653999999999928, 0.395999999999996, 0.372089999999928)
Would sincerely appreciate any help that anyone can give me
Many thanks in advance Kkel

`Summary` multiple LM and GLM objects
I have 50
lm
andglm
objects in R. I want to get the summary of all of them without having to typesummary(lm...)
50 times. Is there a quick way to do it? The names of the dataset begin the same way: lin.mod.t# or lin.mod.s# where {# = 1,2,...,25}. I can list all the models usingls(pattern = "lin.mod") objects(pattern = "lin.mod")
But I cannot run
summary
for the outcomes of usingls
orobjects
. I usedsummary(eval(parse(text = ls(pattern = "lin.mod"))))
, but this only runs the first one. Any suggestions? Maybe uselapply
? 
Interaction effect plots with speedglm output
I am working with a dataset large enough to make necessary the use of alogorithms more advanced than
stats
glm
in order to compute binomial regression models.glm
takes over a week to compute.Large fixed effects binomial regression in R suggests
rxGlm
andspeedglm
. I tried both and managed to achieve significant improvements in speed (down to under 2 hours). However, I am facing the following problem:In both cases, I cannot use the
effects
functionplot(effect())
to create interaction plots similar to the following ones, becauseeffects
cannot deal with these kind of objects and / or needed data is missing.I already had posted a question with bounty on
rxGlm
(How to plot interaction effects from extremely large data sets (esp. from rxGlm output)), to no avail. So I guess there is no way to do so usingrxGlm
.Now my question: can I plot such interaction effect terms using
speedglm
, and if so, how? 
OpenGL/GLM matrices  get chaser object to face player object
My chaser object will not face the player object. It simply keeps spinning around and won't face the player.
void chaser::Move() { vec3 plannedFacingDirection = normalize(p>GetPosition()  GetPosition()); vec3 currentFacingDirection = normalize(vec3(0.0f, 0.0f, 1.0f)); //yes i know it's already normalized vec3 axisOfRotation = normalize(cross(currentFacingDirection, plannedFacingDirection)); float dotProductOfVectors = dot(currentFacingDirection, plannedFacingDirection); clamp(dotProductOfVectors, 1.0f, 1.0f); float angle = acos(dotProductOfVectors); if (plannedFacingDirection != currentFacingDirection) { if (dotProductOfVectors != 1.0f && dotProductOfVectors != 1.0f) { transform = rotate(transform, angle, axisOfRotation); } } transform = translate(transform, normalize(currentFacingDirection) * speed); }
Move() is called in the render() function.
p
is the player object. 
R vs. SPSS mixed model repeated measures code [from Cross Validated]
NOTE: This question was originally posted on Cross Validated, where it was suggested that it should be asked in StackOverflow instead.
I am trying to model a 3way repeated measures experiment, FixedFactorA * FixedFactorB * Time[days]. There are no missing observations, but my groups (FactorA * FactorB) are unequal (close, but not completely balanced). From reading online, the best way to model a repeated measures experiment in which observation order matters (due to the response mean and variance changing in a timedependent way) and for unequal groups is to use a mixed model and specify an appropriate covariance structure. However, I am new to the idea of mixed models and I am confused as to whether I am using the correct syntax to model what I am trying to model.
I would like to do a full factorial analysis, such that I could detect significant time * factor interactions. For example, for subjects with FactorA = 1, their responses over time might have a different slope and/or intercept than subjects with FactorA =2. I also want to be able to check whether certain combinations of FactorA and FactorB have significantly different responses over time (hence the full threeway interaction term).
From reading online, it seems like AR1 is a reasonable covariance structure for longitudinallike data, so I decided to try that. Also, I saw that one is supposed to use ML if one plans to compare two different models, so I chose that approach in anticipation of needing to finetune the model. It is also my understanding that the goal is to minimize the AIC during model selection.
This is the code in the log for what I tried in SPSS (for longform data), which yielded an AIC of 2471:
MIXED RESPONSE BY FactorA FactorB Day /CRITERIA=CIN(95) MXITER(100) MXSTEP(10) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED=FactorA FactorB Day FactorA*FactorB FactorA*Day FactorB*Day FactorA*FactorB*Day  SSTYPE(3) /METHOD=ML /PRINT=SOLUTION TESTCOV /REPEATED=Day  SUBJECT(Subject_ID) COVTYPE(AR1)
This is what I tried in R, which yielded an AIC of 2156:
require(nlme) #output error fix: https://stats.stackexchange.com/questions/40647/lmeerroriterationlimitreached ctrl < lmeControl(opt='optim') #I used this b/c otherwise I get the iteration limit reached error fit1 < lme(RESPONSE ~ Day*FactorA*FactorB, random = ~ DaySubject_ID, control=ctrl, correlation=corAR1(form=~Day), data, method="ML") summary(fit1)
These are my questions:
The SPSS code above yielded a model with AIC = 2471, while the R code yielded a model with AIC = 2156. What is it about the codes that makes the models different?
From what I described above, are either of these models appropriate for what I am trying to test? If not, what would be a better way, and how would I do it in both programs to get the same results?
Edits
Another thing to note is that I didn't dummycode my factors. I don't know if this is a problem for either software, or if the builtin coding is different in SPSS vs R. I also don't know if this will be a problem for my threeway interaction term.
Also, when I say "factor", I mean an unchanging group or characteristic (like "sex").

difference between unbalanced data and missing data in ANOVA
In twoway repeated measure ANOVA, I was told that aov function cannot deal with unbalanced data. ez::ezAnova and car::Anova can deal with unbalanced data but not missing data. If there is any missing data, one approach is to use linear mix effect model for that purpose. I'm actually confused about what is the difference of unbalanced data and missing data here in experimental design. My understanding is in unbalanced design, there are different number of observations for different level combinations (for example, in a two way ANOVA). But I'm confused about what missing data means here..... Can anyone help clarifying this concept? thank you very much.

R, how to do anova in a loop over dataframe?
I would like to do 2ways anova, and store the p value and than do tukey hsd, but i have a problem with the initial table. Not always I have full data, so not always it is possible to perfors anova, I dont know how to do this so my script runs, than skip the not full data and runns further. my data looks like this:
https://filebin.net/w5cfuwztae7yk747
in the link there is example with two Accessions, but in real data there is 3013 accessions and some of them dont have all light conditions or all genotypes
67822 AT2G41680 f HL_f_Dejan58 1.240108e+06 HL AT2G41680 f 70136 AT2G41680 f HL_f_Dejan_61 3.384010e+06 HL AT2G41680 f 72450 AT2G41680 ntrc HL_ntrc_ Dejan_62 1.410768e+05 HL AT2G41680 ntrc 74764 AT2G41680 ntrc HL_ntrc_Dejan_66 5.642197e+00 HL AT2G41680 ntrc 77078 AT2G41680 ntrc HL_ntrc_Dejan65 3.921952e+05 HL AT2G41680 ntrc 78997 AT2G41680 WT LL_WT_Dejan_41 1.016001e+07 LL AT2G41680 WT 81433 AT2G41680 WT LL_WT_Dejan_43 9.320892e+06 LL AT2G41680 WT 83869 AT2G41680 WT LL_WT_Dejan_49 8.560308e+06 LL AT2G41680 WT
there is 4 genotypes, and four light conditions I am trying to do something like this:
AOV< data.frame() IDs< unique(Dejan_all_new_norm$Accession) for (i in 1 : length(IDs)){ temp<Dejan_all_new_norm[(Dejan_all_new_norm$Accession)==IDs[i],] aov2<aov(value ~ genotype + Light + genotype:Light, data = temp) AOV < rbind(as.character(unique(IDs[i])),aov2,AOV) }
so i want to subset each gene (Accession) and than do ANOVA, but after this i want do tukey to have something like this:
$`genotype:Light` diff lwr upr p adj m:FLf:FL 7324259.81 16715470 2066950.5 0.3486778 ntrc:FLf:FL 1662873.54 7728337 11054083.9 0.9999998 WT:FLf:FL 5219263.59 13913835 3475307.7 0.7927417 f:HLf:FL 4936680.12 13871535 3998174.3 0.8796738 m:HLf:FL 7389937.49 16324792 1544916.9 0.2496858 ntrc:HLf:FL 7122962.46 16057817 1811891.9 0.3102106
I would like to work on this simple loop that is my example, because it seems easiest way. I will appreciate any help!

Data not fitting any distribution
I have a data set that contains fields from a call center operation.
I am looking to do a Capability Analysis on it.The field "Days request open" is the one I am interested in but it does not fit any of the available distributions.
All the pvalues are <0.05 or <0.010
There is a high number of zeros in this field i.e. 2198 of 3549 records. Is it advisable to remove the zeros?
What would be the best way to determine the distribution and perform a Process Capability Analysis?
Thank you!

How to access the Minitab 16 Customization Tool by command?
I want to create a configuration file to modify Minitab installation when deploying.
And I follow the guide to use command.
But when I type in "setup.exe /admin",I could get the prompt"invalid command line" only.
The command "setup /help" works,and I can't see the option"/admin"exists.
I use minitab 18,does this version cancel this command?
Any help would be appreciated.
Thanks.
PS:All I want to do is to save the configuration.I can't save the changes such as font.

Comparing count data with lots of zeroes
I'm not one to search for the most tenuous significant difference I can find, but hear me out.
I have some count data with four groups (3 of these can be combined to one, if necessary), groups A, B, C, and X.
Looking at the means and interval plots, X is clearly greater than the others (in terms of mean value), yet I cannot find any statistical test to back this up. This is, I believe, somewhat due to a high variability within groups and the large number of zero values.
I have tried normalized, removing zeroes, parametric, nonparametric, and more, with no success!
Any advice would be greatly appreciated as to how to approach this.
Many thanks.
The link below has the raw data. Groups A, B, and C can be combined into one group if it is relevant.
https://drive.google.com/open?id=0B6iQ6J6e2TeU25Rd2hsd0Uxd2c