
Chauhan, Amit. “Difference Between Z-Test And T-Test With Confidence Interval”. 2022. Medium. https://medium.com/pythoneers/difference-between-z-test-and-t-test-with-confidence-interval-3715e0dda9a4.
This article will discuss the z-test and t-test with known and unknown variance examples. Both tests will give different confidence interval results. These are very important tests in statistics for data analysis.
Daryani, Harish. “What Really Is Statistical Significance ?”. 2021. Medium. https://medium.com/analytics-vidhya/what-really-is-statistical-significance-7463acb65407.
The term ‘statistical significance’ often comes up in analyzing results from experiments or observational studies. A common interpretation I have heard in an experimentation context is –
Statistical significance tells us if the difference in results between different populations is ‘large’ enough or simply put, whether an experiment had a significant impact.
This is a flawed interpretation and what’s driving this is probably the word ‘significant’. Statistical significance has its roots in the area of inferential statistics which essentially deals with inferring about population from samples.
What does the provided example tell us about statistical significance?
1. It is not about the magnitude of the difference in the target metric.
2. It is about the confidence in the result, i.e., the result being drawn from a relatively larger sample isn’t purely due to chance. We know smaller samples are typically more variable and there is a possibility that better (or worse) results might come in due to chance.
Karthikeyan, Neelesh. “Hypothesis Testing With One-Way Analysis Of Variance (ANOVA)”. 2022. Medium. https://medium.com/@neelesh2k/hypothesis-testing-with-one-way-analysis-of-variance-eb301a60908.
Sir Ronald A Fisher first proposed the statistical underpinnings for the design of experiments and the Analysis of Variance (ANOVA). A population is defined as a large group of data (or) measurements. A sample refers to the subset of data taken from a large population. If each item in the population has an equal opportunity of being selected, it is called a random sample. The variance of any factor or component (Mean Square abbreviated as MS) is the sum of squares divided by their degrees of freedom. ANOVA is a technique of partitioning total variation in an experiment into accountable sources of variation. It is a statistical method for interpreting data from experiments and making decisions about the parameters under study. ANOVA is frequently used to test equality between different means by comparing variation between groups to variance within groups (random error). One-way ANOVA is a parametric test that analyses the means of two or more independent groups to see if statistical evidence exists that the related population means differ significantly.
Paliwal, Akanksha. “6 Linear Regression Concepts That Are Easy To Miss!” 2021. Medium. https://medium.com/codex/6-linear-regression-concepts-that-are-easy-to-miss-88fc22b0d625.
Statistics and Data Science work strongly to predict the output variable based on values of predictor variables and anomaly detection. Regression diagnostics originally intended for analytics and improving regression models can also be used to detect anomalies in X or Y values. A regression model that fits the data well will accurately capture the changes in Y due to any change in X. Other than that the regression equation does not prove the direction of causation. Hence conclusions about causations eg. that clicks on the ads lead to sales and not the other way round, must come from the contextual knowledge of the data scientist interpreting the model.
1) Correlation Vs. Regression
2) Degrees of Freedom (N-1). Why does it matter?
3) RMSE (Root Mean Squared Error) vs. MAE (Mean Absolute Error)
4) R sq. Vs Adjusted R sq.
5) t-statistic or p-value of a coefficient
6) Occam’s Razor’s guide to choosing a better model — All things being equal a simpler model should be used in preference to a more complicated model.
Sucky, Rashida Nasrin. “A Complete Guide For Detecting And Dealing With Outliers”. 2022. Medium. https://towardsdatascience.com/a-complete-guide-for-detecting-and-dealing-with-outliers-bad26b1e92b6.
Outliers can be a big problem in data analysis or machine learning. Only a few outliers can totally alter a machine learning algorithm’s performance or totally ruin a visualization. So, it is important to detect outliers and deal with them carefully.
Sucky, Rashida Nasrin. “What Is A/B Testing? How To Perform An A/B Testing?”. 2022. Medium. https://medium.datadriveninvestor.com/what-is-a-b-testing-how-to-perform-an-a-b-testing-892cc8f35cbf.
A/B testing is a commonly used methodology in eCommerce to test new features or products. This is a process of making data-driven decisions for user interface, marketing, and overall products. The main process is to split the users into the control group and experiment group. Then allot the existing products or features to the control group and the new product to the experiment group. Observations are recorded on how the control group and experiment group responded and decisions are made based on their behavior or response about which version is better.
The most important part is, all the elements should be constant except for one.
When results are reliable and repeatable, we can make the right decision. This is a high-level introduction to how A/B testing works.