Can KruskalWallis test be used to test significance of multiple groups within multiple factors?
I have tried to read what I can on KruskalWallis and while I have found some useful information, I still seem to not find the answer to my question. I am trying to use the KruskalWallis test to determine the significance of multiple groups, within multiple factors, in predicting a set of dependent variables.
Here is an example of my data:
ID Date Point Season Grazing Cattle_Type AvgVOR PNatGr NatGrHt
181 7/21/2015 B22 late pre Large 0.8 2 20
182 7/21/2016 B32 early post Small 1.0 4 24
In this example, my dependent variables are "AcgVor", "PNatGR" and"NatGrHt" while the independent variables (factors) are "Season', 'Grazing", and "Cattle_Type". As you can see, each of my factors has 2 group levels each.
What I am trying to accomplish is to run a nonparamatric test that looks at the separate and combined importance of my factor groups to each of my dependent variables. I chose KrukalWallis and it seems to work for testing one of my grouping factors at a time.
Here is the result for AvgVor ~ Grazing
kruskal.test(AvgVOR ~ Grazing, data = Veg)
KruskalWallis rank sum test
data: AvgVOR by Grazing
KruskalWallis chisquared = 94.078, df = 1, pvalue < 2.2e16
This tells me that AvGVor is significantly different according to whether they were recorded pre or post grazing.
Is there a way to build a similar model using KruskalWallis that includes all of my grouping factors? Even if I have to run a separate one for each dependent variable.
I attempted the following code, but it is flawed.
lapply(Veg[,c("Grazing", "Cattle_Type", "Season")]),function(AvgVOR) kruskal.test(AvgVOR ~ Veg)
See also questions close to this topic

Use SpotfireData package with nonTERR R engine
I want to read in Spotfire Binary data into a nonTERR R engine that can handle graphing and other complex packages, etc. So I want to use the SpotfireData package with other nonTERR R engines. Yet when I try to install, I get an error:
install.packages("SpotfireData") Warning in install.packages : package ‘SpotfireData’ is not available (for R version 3.4.4)
Has anyone had luck using the SpotfireData package outside of TERR?
I'm using:
> version _ platform x86_64w64mingw32 arch x86_64 os mingw32 system x86_64, mingw32 status major 3 minor 4.4 year 2018 month 03 day 15 svn rev 74408 language R version.string R version 3.4.4 (20180315) nickname Someone to Lean On
Also, when I switch engines to R3.4.3, I get the same error:
install.packages("SpotfireData") Warning in install.packages : package ‘SpotfireData’ is not available (for R version 3.4.3)
Also, when I copy/paste the actual SpotfireData package folder into my R3.4.4 library, I get this error:
library(SpotfireData) Error in library(SpotfireData) : ‘SpotfireData’ is not a valid installed package

HMM 5' exonintron junction detector in R
It is known that introns tend to start with a pair of GT bases. Assume the base composition of the second base of introns is known from previous studies to be 80% T, 10% G, 5% A, and 5% C. The following code is given with specific observed sequence, define an improved 5' exonintron junction detector using this information:
>install.packages("HMM") >library("HMM") >library(seqinr) >States = c("Exon", "5site", "Intron") >Symbols = c("A","C","G","T") >transProbs = matrix(c('EE'=0.9,'E5'=0.1,'EI'=0, '5E'=0, '55'=0, '5I'=1.0, 'IE'=0, 'I5'=0, 'II'=1.0), c(length(States), length(States)), byrow = TRUE) >emissionProbs = matrix(c('A'=0.25,'C'=0.25,'G'=0.25,'T'=0.25, 'A'=0.05,'C'=0.0,'G'=0.95,'T'=0.0, 'A'=0.4,'C'=0.1,'G'=0.1,'T'=0.4), c(length(States), length(Symbols)), byrow = TRUE) >observed_seq < s2c('CTTCATGTGAAAGCAGACGTAAGTCA') >hmm < initHMM(States, Symbols, startProbs = c(1,0,0), transProbs = transProbs, emissionProbs = emissionProbs) >vit < viterbi(hmm, observed_seq) #viterbi best labelling >post < posterior(hmm, observed_seq) # posterior decoding

Error in library(slam) : there is no package called ‘slam’
I am trying to do text mining in R and I installed many packages as shown below but it keeps showing an error
It also says (Error: could not find function "VCorpus") How can I install "slam" package correctly? are there any alternatives to text mine?

How to show parameter in plot
I want to show alpha = 2, beta = 2. That is, parameter
beta_plot < function(a, b, ...) { x < seq(0, 1, by = 0.02) y < dbeta(x, shape1 = a, shape2 = b) plot(x, y, main = expression(paste(alpha, " = ", a," ", beta, " = ", b)), xlab = NA, ylab = NA, ...) } beta_plot(2, 2)
but It shows enter image description here
What should I do?

Generating Correlated Binomial Variable
Suppose that I have 4(n=4) missiles with hitting probability of p(p=0.5). They are fired at the same time, at the same environment. For that reason, each missile should have correlation with other three missile. For example when corr=1, all missiles hit or miss. When corr=0, they are distributed binomially and independently.
Challenging part is that correlation can not be 1. Since there are two outcome (miss or hit).
So I want to generate a random discrete binomial value (lets say between 0 and 4 with a probability of 0.4 and correlation=0.6)
My code is on the below.
n < 100 # size p < 0.4 # probability corr < 0.6 # correlation trial < 10000 # number of trials p < rep(p, n) rho < corr library(bindata) off < rmvbin(trial, p, bincorr=(1  rho)*diag(n) + rho) off [,1] [,2] [,3] [,4] [1,] 0 0 1 0 [2,] 0 0 0 0 [3,] 1 1 1 1 # This part gives correlated [4,] 1 0 1 0 # Bernoulli var for 10000 [5,] 0 0 0 0 # trial. When you sum each [6,] 1 1 1 1 # row, you get random correlated [7,] 1 1 1 0 # number of missiles. [8,] 0 0 0 0 ........................ [10000,] 0 0 1 0 cor(off) [,1] [,2] [,3] [,4] [1,] 1.0000000 0.6015165 0.5987907 0.6005857 # correlation is as [2,] 0.6015165 1.0000000 0.6019365 0.6012273 # demanded. [3,] 0.5987907 0.6019365 1.0000000 0.5972144 # [4,] 0.6005857 0.6012273 0.5972144 1.0000000 #
But, when size (n) gets bigger, the robustness and accuracy of code is decreasing.
Is there any way to generate it? (Generating a integer number between an interval with a given correlation.)

Best way to calculate a trend
I am coding an app that allows users to vote (
market_trend_up += 1
for example), the app then fetches the accumulated data (trend_up_votes = 632; trend_down_votes = 236
), analyzes and displays the resulting trend (if up_votes > down_votes { trend = up }
).What would you advise me to do to refresh the trends regularly? I thought about reinitializing the votes every 6h for instance but then the first voter will decide of the trend by himself.
Would letting the votes accumulate always provide the current trend? Thank you!

Multivariate multiple regression with constraints on dependent variables in R
I have several dependent variables (exactly 5) and independent variables. I would like to generate a linear regression model with several dependent variables in R. I saw how to do it with
lm
:rm < lm(cbind(Y1,Y2,Y3,Y4.Y5) ~ X1 + X2 + ... + Xn, data = m)
However, I would also like to add a constraint on dependent variables: the sum of Y variables must be equal to 1 (
Y1 + Y2 + Y3 + Y4 + Y5 = 1
).I have checked, but I have only found how to constraint independent variables.
Thank you in advance!
Pablo.

Error with plot.mvabun
I would like to contrast abundance data with my predictor variable which is continuous data as opposed to age. However, I get an error when I attempt to plot the relationship:
#convert abundance data to a mvabund object mill_abun_s1 < mvabund(stand1[,1]) plot(mill_abun_s1~stand1$Age) Error in do.call("default.plot.mvformula", foo, allargs, dots) : second argument must be a list
I am assuming this is because my predictor variable is continuous (Age)
How can I rectify this?

Is there a more efficient alternative to Peacock.test?
This package is quite helpful for testing whether there is difference between a pair of two and three dimensional sets, implementing Peacock's 'extension' to the KolmogorovSmirnov test. However, it is VERY computationally intensive; the Fasano and Franceschini test was specifically developed to be less computationally intensive.
This paper establishes a lower limit on Peaock's test of Ω(n^2 lg n) at best; it's not clear whether Peacock.test even implements a rangecounting tree to bring efficiency to that level. The Fasano and Franceschini's method uses quadrants of each point (4n) rather than quadrants of pairs of points (4n^2) so it's MUCH more efficient at O(n lg n)!
I just ran my computer for three DAYS to obtain a batch of results using Peacock.test, so naturally I'd love something more efficient.

Appropriate test statistic
I am trying to do the appropriate test statistic as a part of my hypothesis testing by using this formula:
>cor(x.df$gender,x.df$dummy_var)*sqrt(length(x.df$gender)2)/(sqrt(1(cor(x.df$gender,x.df$dummy_var)^2)))
I have a dummy variable (numbers 1,0 and NA) and a gender variable (M,F,NA) which I don't know how to transform into numbers. The result shows that "x" has to be a number. How do I assign female=1 and male=0, and do I need to get rid of the NA values from both variables to calculate this? Thanks in advance.

Appropriate test statistic: need help finding my mistake
I need to calculate the appropriate test statistic with using R, with n−2 degrees of freedom, where n is the number of cases, and r is the sample correlation coefficient. This is the command:
cor(x,y)*sqrt(length(x)2)/(sqrt(1(cor(x,y)ˆ2)))
Below is what I input. What went wrong? Thanks for the help in advance.
cor(wvs.df$rel,wvs.df$no)*sqrt(length(wvs.df$rel)2)/(sqrt(1(cor(wvs.df$rel,wvs.df$non)ˆ2))) Error: unexpected input in "cor(wvs.df$rel,wvs.df$non)*sqrt(length(wvs.df$rel)2)/(sqrt(1(cor(wvs.df$rel,wvs.df$non)\"

5 step hypothesis testing: null and alternative
I am trying to perform a correlation test using the fivestep hypothesis testing procedure to determine if there is a strong and statistically significant relationship between 2 variables. Needs to be done in R. The first step is stating the null/alternative hypothesis. Does that mean I have to type in H0 : ρ = 0 for null and Ha : ρ not equal to 0 for alternative? Is this always the case? Also, how do I type in (not equal to), as in a crossed = sign?

KruskalWallis test between a list sublists in R
I'm pretty new to R. I'm trying to run a KruskalWallis test between dataframed sublists (containing numeric data) within one list but I keep on getting errors.
Each sublist has one column but an unequal number of rows (hence, they can't be stored, as far as I know, within one dataframe)
data:
data_list < list(tumor = 0.004255040 0.002703172 0.007478089 0.003554968 0.003803952 0.005225325 0.004816366 0.005674340 0.003474605 0.004784456, t = 0.004326186 0.008126497 0.009110830 0.004030094 0.005784066 0.006752136 0.009840556, b = 0.004872971 0.009066809 0.005964638 0.003622466 0.011660714, caf = 0.003618611 0.007463386 0.007463134 0.005453387 0.010409640 0.012020965))
So it looks like this:
$tumor 1 0.004255040 2 0.002703172 3 0.007478089 4 0.003554968 5 0.003803952 6 0.005225325 7 0.004816366 8 0.005674340 9 0.003474605 10 0.004784456 $t 1 0.004326186 2 0.008126497 3 0.009110830 4 0.004030094 5 0.005784066 6 0.006752136 7 0.009840556 $b 1 0.004872971 2 0.009066809 3 0.005964638 4 0.003622466 5 0.011660714 $caf 1 0.003618611 2 0.007463386 3 0.007463134 4 0.005453387 5 0.010409640 6 0.012020965
I've tried many things, all came back with errors and unsuccessful:
> kruskal.test(data_list) Error in `[.data.frame`(u, complete.cases(u)) : undefined columns selected > kruskal.test(list(data_list$tumor,data_list$t,data_list$b,data_list$caf)) Error in `[.data.frame`(u, complete.cases(u)) : undefined columns selected > kruskal.test(list(data_list$tumor[,1],data_list$t,data_list$b[,1],data_list$caf[,1])) Error in `[.data.frame`(u, complete.cases(u)) : undefined columns selected > kruskal.test(unlist(data_list)) Error in kruskal.test.default(unlist(data_list)) : argument "g" is missing, with no default
Thank you! :)

Post hoc test for Kruskal Wallace nonparametric test
I am trying to run a post hoc test for the comparisons that showed significant differences in my Kruskall Wallace test and I keep getting a variety of errors. I don't think I have my data set up right, as I keep getting errors such as
object not found
orinvalid term in model formula
. Here is a sample of the data I am trying to run:LMWSed: 77 0 3.4 22.7 73.5 79 57 19 16 70
group3: ref ref ref ref low low low low high high high high
The script I used was:
> dunnTest(LMWSed ~ group3, data = tpah, method = "bh")
This returned this error:
Error in eval(expr, envir, enclos) : object 'LMWSed' not found
I also tried it with quotes around the LMWSed, and had this error:
Error in terms.formula(formula, data = data) : invalid term in model formula
Thanks for any help in advance.
Jenn

how can I implement kruskalwallis test in sparkscala
I am trying to implement kruskalwallis test in sparkscala. I know it is possible to use Python Pandas so I found this code1
but can anyone suggest some sources or some codes plz!