Outlier Treatment using Python
I am a novice in Data Science and in the problem which I am trying to solve, I am stuck up with the outlier detection and treatment. Some of the insights about the dataset below:
 It's a regression problem
 Having both numerical and categorical features
 Numerical features include both discrete and continuous data columns
 Categorical features include mostly nominal & Ordinal data columns
 I've done the missing value imputation and categorical data transformation
I am stuck up since I don't know the way of outlier detection and treatment of numerical data. I request any of your valuable help in proceeding further.
Please let me know if you want any snapshot of the numerical data in order to give a solution.
I haven't added it since it's a generic doubt as I don't even know how and what to use for outlier detection and treatment.
See also questions close to this topic

Laravel validate a variable in controller store
I could not find an answer on my question so I hope someone can help me
I want to validate if I add a new appointment that the chosen employee has not been chosen on the day of the appointment. So I can't doublebook someone on a day. I'm using laravel 5.6 and MySQL with table appointments using following rows: id, day, employee_id and resource_id
My controller is a resource controller (with the index,create,store,... functions).
So if $appointmentExists is 1 I need to throw an error and stay on the same page of the create form.
public function store(Request $request) { $appointmentExist = \DB::table('appointments') >where([ ['day','=',$request>day], ['employee_id','=',$request>employee_id], ]) >exists(); $request>validate([ 'day' => 'requiredmin:1max:10', 'employee_id' => 'required', 'resource_id' => 'required', $appointmentExist => 'in:0', ]); $appointment = Appointment::create(['day' => $request>day, 'employee_id' => $request>employee_id, 'resource_id' => $request>resource_id]); return redirect('/appointments/' . $appointment>id); }
I hope someone can help

Making at least one of two fields mandatory in Python Eve
I have a simple schema defined in my Eve project, similar to this:
my_schema = { 'recycling': { 'type': 'string', 'required': True, }, 'disposal': { 'type': 'string', 'required': True, }, 'other_stuff': { 'type': 'string' }, }
How can I make at least one of 2 fields (
recycling
ordisposal
) as a mandatory field? I was looking at Cerberus documentation (especially *ofrules), but I don't quite understand how can I define that if one field is defined the other one is notrequired
anymore... 
How to efficiently validate variables before putting into a parameterized constructor? In Java
Without using any mutators (as this is based on a given task). IllegalArgumentException is unfortunately also not an option as I am asked not to use if haven't been taught in class.
I prompt for input in my main method. A new instance is being created by passing the input as parameters to the constructor.
I think, usually validation (mainly for 0 or negative value) is done in the class but seems impossible when it comes to parameterized constructor. Sorry, very new here so the example might not look very nice. Very simplified example of class and main method:
//class public Stock(String name, double price) { this.name = name; this.price = price; /*(1)How I would do for 'price' but the issue is unable to reprompt if (price <=0) JOptionPane.showMessagDialog(null, "Invalid input."); else this.price = price; */ } //main method public static void main(String args[]) { name = JOptionPane.showInputDialog("Enter name:"); price = Double.parseDouble(JOptionPane.showInputDialog("Enter price:"); /*(2)How I would do it if but the issue is I have plently of variable like price to validate so I have to multiply this loop but the amount do { price = Double.parseDouble(JOptionPane.showInputDialog("Enter price:"); } while (price <=0); */ Stock obj1 = new Stock(name, price);//this needs to be validated }

Normalizing Input for Machine Learning Algorithm
I would like to normalize (zscore, minmax etc.) my predictor variables for a number of Machine Learning algorithms (Neural Network) and a Log Regression and I am wondering:
1) Should I normalize the entire predictor variables, that is training AND Test data?
2) Should normalize my predicted variables, y?

Multioutput regression
I have been looking in to Multioutput regression the last view weeks. I am working with the scikit learn package. My machine learning problem has an a input of 3 features an needs to predict two output variables. Some ML models in the sklearn package support multioutput regression nativly. If the models do not support this, the sklearn multioutput regression algorithm can be used to convert it. The multioutput class fits one regressor per target.
 Does the mulioutput regressor class or supported multioutput regression algorithms take the underlying relationship of the input variables in to account?
 Instead of a multioutput regression algorithm should I use a Neural network?

Activation Function in Machine learning
What is meant by Activation function in Machine learning. I go through with most of the articles and videos, everyone states or compare that with neural network. I'am a newbie to machine learning and not that much familiar with deep learning and neural networks. So, can any one explain me what exactly an Activation function is ? instead of explaining with neural networks. I struck with this ambiguity while I learning Sigmoid function for logistic regression.

Outliers for all numerical values to mean SAS
I am working in SAS with a dataset with a lot of numeric values which I have standardised as follows:
proc standard data=df mean=0 std=1 out=df; run;
Is there any easy way to deal with outliers (+/ 3standard deviation) for all numeric values? Ideally I would want to change all of those to + or  3x standard deviation, or in worst case remove them.

Rescale boxplot axis in Python
I am displaying a boxplot to visualise outliers in my data using matplotlib in Python3.
However, some of my outliers are so far, it makes the plot unreadable.
I do not know what is worse  cut the range (hiding true data range) or some kind of logarithmical squishing of the sparse top of the scale. I just know I will probably have to choose one.
The code I am currently using for boxplot is as below:
mpl.use('agg') numpy_data = np.array(data) bp_figure = plt.figure(1, figsize=(9, 6)) axes = bp_figure.add_subplot(111) boxplot = axes.boxplot(numpy_data, sym='+', vert=True, whis=1.5) if save_to_file: bp_figure.savefig( path.join( getcwd(), cfg.dir_img, cfg.boxplot_png_filename ), bbox_inches='tight' ) bp_figure.show()
What is the less incorrect approach and how to specify it in my boxplot method?

need a simple code to treat outliers in R (capping method)
i am a beginner in R, as I didn't understand the codes in the other page I have raised this question again. please explain the code if possible, I used squish function to treat outlier but still outlier are present how do I tackle it?