Confidence Intervals
Confidence intervals is one of the foundational concepts of statistics and data science. They are used to communicate the precision of an estimated relationships, in predictive modelling to quantify the uncertainty of predictions, to estimate statistical significance of a study and in survival analysis to name a few use cases. They are basically everywhere! In this edition we will be exploring the definition and determination of Confidence Interval, so let’s dive right in….
What is Confidence Interval?
To understand Confidence Interval, we will first need to familiarize ourselves with the terms ‘Population’ and ‘Sample’. Population is an entire set of data points, items or individuals of interest in a study, and Sample is a subset of this population selected for the analysis. Suppose we are interested in ‘estimating’ the population mean of a dataset. We extract several subsets or samples from the data and determine the sample mean for each subset. The true population mean is likely to lie within the range of these sample means determined. Confidence Interval is used to express the degree of certainty with which we can estimate the population mean with a given set of samples. Example : If we construct a Confidence Interval with 95% confidence level, we are confident that 95 out of 100 times the estimate will fall between the upper and lower values specified by the confidence interval.
How is Confidence Interval calculated?
The formula to determine Confidence Interval is given below.
A confidence interval is essentially the sample mean plus or minus a specific margin of error. In the error term, the standard deviation indicates the spread of the underlying data. The greater the variation in the data, the larger the standard deviation which increases the numerator of the error term. This results in wider confidence interval. The denominator, which is the sample size indicates the more the sample size the smaller the confidence interval. In other words, larger the sample population, the more likely the the sample mean will be close to true population mean, therefore reducing the uncertainty in the estimation.
(image credit : https://www.coursesidekick.com/statistics/study-guides/boundless-statistics/confidence-intervals)
The confidence level value tells us how many standard deviations away from the mean we need to go in order to reach the desired confidence level for our confidence interval. In this article we have considered a normal distribution to determine Confidence Interval, in the upcoming articles we will discuss the determination of confidence intervals for other data distributions.