Sample proportion confidence interval estimates using logit
This seems like a problem that has an accepted, statistically and mathematically sound answer, but I can't seem to find it.
When estimating confidence intervals from sample proportions, I generally use the normal approximation technique described here: https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Normal_approximation_interval
However, this fails spectacularly for proportions where my sample is close to 0 or 1, notably having symmetrical distribution which causes it to go above 1 or below 0. Generally, since proportion estimates "behave better" when modeled using a logit, I assume there is some way to apply a logit transform to the confidence intervals which would result in an asymmetric confidence interval that would never cross 0 or 1.
However, instead of trying to hack together my own technique with freshman calculus and MBA statistics as my highest formal mathematical training, I have been searching the web to see if such a technique has already been described by someone more qualified.
Is anyone aware of a way to do this?
See also questions close to this topic

Using the flip(a) for n dimensions
Flip Function
a=(x,y)
flip(a)=(x(yx),y+(xy))
e.g. b=(3,2)
flip(b)=(2,3)
This changes the equation from a=(x,y) to flip(a)=(y,x). This works with all numbers.
For 3Dimensions the formula is: a=(x,y,z) flip(a)=(x+(yz)+(zy),y,z(yx)(zx))
What is the formula for NDimensions? I can see a pattern but I cannot seem to put it into summation notation. Please help, this paper is due in 5 months and I've been struggling with this problem for 4 days.

pick between several actions using percentage
I have an Object with X amount of hobbies, each hobby has a value between 0100 (representing %), and together they add up to 100.
Every morning I want to run a function that decides what that Object is going to do, and the higher value a hobby has, the more likely they are to be chosen.
I cannot figure out how to translate that last part into code. Using these variables for example:
int fishing = 25; int sleeping = 25; int drinking = 50; int running = 0;

Hyperthreading on/off in simple python script
I read something about hyperthreading and found out how beneficial it is for performace. I know the idea is something like when 2 disjoint processes on core are going to execute, in hyperthreading it can be executed in the same time. I also know, that hyperthreading is using some magic with spliting core to 2 virtual ones.
But imagine this situation. I have some basic python script doing some maths. Its 1 threaded process and can push core to 100% for a long time. Its always doing the same, so it cannot use hyperthreading best feature to execute in same time. What will be the difference in performance in HT on and off? In ht on, will i have only 5060% of the proccess performance or close to 100%?
My example: I have 2 cores(4 threads), and im running 1 program on 100%, windows manager is showing around 3233% cpu used by that process. I also try it with Process Explorer, there i see its capped with 25% on that process.So, if i dont have HT on, it will be 50%. But im now not sure, if im using only half of the core power, or its showing wrong numbers or whats over. I need some explanation.

Interpreting log discontinuity in the density from McCrary test
In the article by Justin McCrary "Manipulation of the running variable in the regression discontinuity design: A density test" (2008) he estimates log discontinuity in the density of democratic vote share.
He finds a 52% discontinuity but does not elaborate on how to interpret this 52%. Intuition might imply a 52% drop in the density but it seems unlikely graphically and very possible that this value could exceed 100%
 Is it an arbitrary output best used for comparisons?
 What would a value over 100% mean?
Thanks in advance

Predicted Probability Calculations for Large Dataframe Following Regression
I have conducted a logistic regression on a binary dependent variable and 5 independent variables. The dataframe I drew these variables from is survey data asking whether a person has voted for or against a policy change (binary dependent variable), with the other variables being questions regarding their income, location and other such personal information that could inform whether they would vote for or against the vote.
Having conducted the regression, I'd now like to calculate the predicted probability that each person would have voted yes/no to see how informative those variables are. In total my dataframe has information on 3000 people and I'd like to calculate the predicted probability of voting for/against for every single row/person.
What methods are available for doing so?
Appreciate the help!

is there a way to get an oddsratio for an interaction term?
I have a glm that describes the effect of various different predictors on an output variable. I am using the oddsratio package to identify the effect sizes of each of the predictors. One of the significant terms in the model is an interaction between a categorical and a continuous variable. However, the output from orglm just gives 1.0000 as the effect size for this interaction term. Clearly the effect isn't 0. Does anyone know how to fix this?
code:
a < glm(data = final, freq ~ res_dist + foraging_trail_cost + predation + res_stoch + K:res_stoch, family = 'binomial') or_glm(data = final, model = a, incr = c(K = 270000, foraging_trail_cost = 0.00004, predation = 0.01))
output:
predictor oddsratio CI_low (2.5 %) CI_high (97.5 %) increment 1 res_distclustered 2.175 1.599 2.970 Indicator variable 2 foraging_trail_cost 1.759 1.293 2.399 4e05 3 predation 2.709 1.991 3.705 0.01 4 res_stochconstant 3.197 1.985 5.198 Indicator variable 5 res_stochfluctuating:K 1.000 1.000 1.000 Indicator variable 6 res_stochconstant:K 1.000 1.000 1.000 Indicator variable
The interaction effect I'd like to get is res_stochconstant:K, the other is nonsignificant

Performing a function on my data query using matrix and vectors
I am creating a function called pagerank that takes as input the set edges, a teleport probability
a
and a positive integeriters
and computes the transition probability matrix.the edges being defined as
edges =[[0,1], [1,1], [2,0], [2,2], [2,3], [3,3], [3,4], [4,6], [5,5], [6,6], [6,3]]
It then starts from an arbitrary probability vector (say one full of 1/Ns where N is the number of all states) and then multiplies this vector with the transition probability matrix iters times.
The function should return the resulting
vector
Does anyone know how I can create a function to handle the vector multiplication with the vector?

Draw small sample from large set with discrete distribution efficiently
I have two lists, both the same size, let's call them
elements
andweights
. I want to choose one element of theelements
list with discrete probability distribution given byweights
.weight[i]
corresponds to the probability of choosingelements[i]
.elements
never changes, but after every sample taken,weights
changes (only the values, not the size).I need an efficient way to do this with large lists.
I have an implementation in Python with
numpy.random.choice(elements, p=weights)
but taking a sample of sizek
from a set of sizen
wherek << n
is extremely inefficient. An implementation in any language is welcome, but I am working primarily in Python.(This is used in a social network simulation with networkx. I have a weighted graph and a node
i
and I want to choose a nodej
fromi
's neighbors where the probability for each node is proportional to the weight of the edge betweeni
and the given node. If I set the probability to 0 for nonneighbors, I don't have to generate the list of neighbors every time, I just need a list of all nodes.)It will be used like this:
elements = [...] weights = [...] for(...): element = sample(elements, weights) *Some calculation with element and changing the values of weights*

Independence of random variables
let U = the number of trials needed to get the first head and V= number of trials needed to get two heads in repeated tosses of a fair coin. Are U and V independent random variables?
I would say they are dependent if  u = number of trials before first head appears v = number of trials to get 2nd head after the event u has occurred
Please help me understand it better?

Blinking element in
Good day, all, so i'm doing this thing where i have 4 squares and when i click a button 3 random squares have to blink in different colors in sequence. And i need to store the id's of the squares that blinked in an array. What I can't figure out is how to make them blink one after another. This is what i have so far...https://codepen.io/anon/pen/dmBxYe?editors=0110
class App extends React.Component { constructor() { super(); this.state = { colors: ['yellow', 'red', 'blue', 'green'], quantity: 3, } } start() { const {quantity} = this.state; const quantityArray = Array.from(Array(quantity)); const pieces = Array.from(document.querySelectorAll('.gamepiece')); quantityArray.map((item, i) => { setTimeout(() => { pieces[i].classList.toggle(`${this.state.colors[i]}`) }, 1000); pieces[i].classList.toggle(`${this.state.colors[i]}`); }); } render() { return ( <div className="memorygame"> {this.state.colors.map((gamePiece, i) => { return <div key={`gamePiece${i}`} className="gamepiece"></div> })} <button onClick={this.start.bind(this)} className="startgame">Start the game</button> </div> ) } } React.render(<App />, document.getElementById('app'));

RxSwift interval (Timer) in background
I have one Observable that each second increments by one and this makes that one label gets updated.
Is it wrong to execute the observable in the background queue? When I modify the UI I’m doing it in the Main Thread.
As the Observable is in the background thread can create a delay?
Here is my code:
Observable<Int>.interval(1.0, scheduler: SerialDispatchQueueScheduler(qos: .background)) .observeOn(MainScheduler.instance) .subscribe(onNext: { [weak self] _ in self?.updateCountdown() }).disposed(by: disposeBag)

Extracting date intervals with R
I have a set of paired data. One column is a Date, formated as mdY, the other a corresponding datapoint.
The dates are daily.
I would like to extract the date and data that is recorded on friday (the last day of the working week), to make the data weekly. The next thing I would like to do is reformat the Date to YMD so that it matches another dataset I have.
How can this be achieved in R?
Thanks in advance.