I have a set of data points that I have used to generate my empirical CDF which looks like this (to simplify things I have reduced the number of points for this question but it shouldn’t matter): Given this data and plot I need to somehow generate random values which follow this distribution. I admit ..

#### Category : statistics

I have a data set : Google Sheet Data While performing a Two sample t test assuming unequal variances , the excel output is this : I am trying to replicate the same in python using : T_test = ttest_ind(df.dropna()[‘PRE’],rest.dropna()[‘POST’],equal_var=False, alternative="less) result = T_test[1] The p value from scipy is 0.004689 where as in excel ..

If I want to keep my bin-to-bin fluctuations less than <= 5%, how do I find the maximum number of bins I should use? I am using an list of 10,000 random numbers to see analyze its histogram distribution. Here is an example: I am using the same data but with different bin sizes. Bin ..

I am trying to create a box plot with matplotlib library of python. The code is given below. fig, ax = plt.subplots(figsize=(8, 6)) bp = ax.boxplot([corr_df[‘bi’], corr_df[‘ndsi’], corr_df[‘dbsi’], corr_df[‘mbi’]], patch_artist = True, notch =’True’, vert = 1) ax.set_title("Spearmanâ€™s correlation coefficient for Soil indices", fontsize=14) ax.set_xlabel("Indices", fontsize=14) ax.set_ylabel("Spearmanâ€™s correlation coefficient", fontsize=14) colors = [‘#088A08’, ‘#FFFF00′,’#01DFD7’, ‘#FF00FF’, ..

So I have a data science interview at Google, and I’m trying to prepare. One of the questions I see a lot (on Glassdoor) from people who have interviewed there before has been: "Write code to generate random normal distribution." While this is easy to do using numpy, I know sometimes Google asks the candidate ..

I have a dataset where every data sample consists of 10-20 2D coordinates points. The data is mostly clean but occasionally there are falsely annotated points. For illustration the cleany annotated data would look like these: either clustered in a small area or spread across a larger area. The outliers I’m trying to filter out ..

this is my first time writing to a board like this so I have no idea of how many things I’m doing wrong, sorry in advance I’m trying to perform a tukey kramer test on a dataframe I have constructed from a dictionary, but I keep getting told "AttributeError: ‘str’ object has no attribute ‘dropna’" ..

If I want to keep my bin-to-bin fluctuations less than <= 5%, how do I find the maximum number of bins I should use? I am using an list of 10,000 random numbers to see analyze its histogram distribution. Source: Python..

How to calculate the gradient (or derivative) of y = f(x) of y w.r.t x where y represents the order statistics divided by median of x? For instance x is [3, 2, 1, 5, 4] when y=f(x) would be [1/3, 2/3, 1, 4/3, 5/3]. How can I calculate the derivative of y with respect to ..

I’ve a list of numbers, say T, where each element is sampled from a Normal Distribution. I’d like to generate another list, say B, such that B and T has a particular correlation coefficient corr (supplied as an argument), and B consists of purely binary variables. How can I do such a thing in Python? ..

## Recent Comments