1. What is statistical survey?
Statistical surveys are used to collect quantitative
information about items in a population. A survey may focus on opinions or
factual information depending on its purpose, and many surveys involve
administering questions to individuals. When the questions are administered by
a researcher, the survey is called a structured interview or
a researcher-administered survey. When the questions are administered by
the respondent, the survey is referred to as a questionnaire or
a self-administered survey.
2. What are the advantages of survey?
§
Efficient way of
collecting information
§
Wide range of
information can be collected
§
Easy to administer
§
Cheaper to run
3. What are the disadvantages of survey?
§
Responses may be
subjective
§
Motivation may be low
to answer
§
Errors due to sampling
§
If the question is not
specific, it may lead to vague data.
4. What are the various modes of data
collection?
§
Telephone
§
Mail
§
Online surveys
§
Personal survey
§
Mall intercept survey
5. What is sampling?
“Sampling” basically means selecting people/objects from a
“population” in order to test the population for something. For example, we
might want to find out how people are going to vote at the next election.
Obviously we can’t ask everyone in the country, so we ask a sample.
Classification, Tabulation & Presentation
of data
1. What are the types of data collection?
Qualitative Data
§
Nominal, Attributable
or Categorical data
§
Ordinal or Ranked data
Quantitative or Interval data
§
Discrete data
§
Continuous
measurements
2. What is tabulation of data?
Tabulation refers to the systematic arrangement of the
information in rows and columns. Rows are the horizontal arrangement. In simple
words, tabulation is a layout of figures in rectangular form with appropriate
headings to explain different rows and columns. The main purpose of the table
is to simplify the presentation and to facilitate comparisons.
3. What is presentation of data?
Descriptive statistics can be illustrated in an understandable
fashion by presenting them graphically using statistical and data presentation
tools.
4. What are the different elements of
tabulation?
Tabulation:
§
Table Number
§
Title
§
Captions and Stubs
§
Headnotes
§
Body
§
Source
5. What are the forms of presentation of the
data?
Grouped and ungrouped data may be presented as :
§
Pie Charts
§
Frequency Histograms
§
Frequency Polygons
§
Ogives
§
Boxplots
Measures used to summarise data
1. What are the measures of summarizing data?
§
Measures of Central
tendency: Mean, median, mode
§
Measures of
Dispersion: Range, Variance, Standard Deviation
2. Define mean, median, and mode?
Mean: The mean value is what
we typically call the “average.” You calculate the mean by adding up all of the
measurements in a group and then dividing by the number of measurements.
Median: Median is the middle most value in a series when
arranged in ascending or descending order
Mode: The most repeated value in a series.
3. Which measure of central tendency is to be
used?
The measure to be used differs in different contexts. If your
results involve categories instead of continuous numbers, then the best measure
of central tendency will probably be the most frequent outcome (the mode). On
the other hand, sometimes it is an advantage to have a measure of central tendency
that is less sensitive to changes in the extremes of the data.
4. Define range, variance and standard
deviation?
The range is defined by the smallest and largest data values in
the set.
Variance: The variance (σ2)
is a measure of how far each value in the data set is from the mean.
Standard Deviation: it is the square root of the variance.
5. How can standard deviation be used?
The standard deviation has proven to be an extremely useful
measure of spread in part because it is mathematically tractable.
1. What is Probability?
Probability is a way of expressing knowledge or belief that
an event will occur or has occurred.
2. What is a random experiment?
An experiment is said to be a random experiment, if it’s
out-come can’t be predicted with certainty.
3. What is a sample space?
The set of all possible out-comes of an experiment is called the
sample space. It is denoted by ‘S’ and its number of elements are n(s).
Example; In throwing a dice, the number that appears at top is
any one of 1,2,3,4,5,6. So here:
S ={1,2,3,4,5,6} and n(s) = 6
Similarly in the case of a coin, S={Head,Tail} or {H,T} and
n(s)=2.
4. What is an event? What are the different
kinds of event?
Event: Every subset of a sample space is an event. It is denoted
by ‘E’.
Example: In throwing a dice S={1,2,3,4,5,6}, the appearance of
an event number will be the event E={2,4,6}.
Clearly E is a sub set of S.
Simple event: An event, consisting of a single sample
point is called a simple event.
Example: In throwing a dice, S={1,2,3,4,5,6}, so each of
{1},{2},{3},{4},{5} and {6} are simple events.
Compound event: A subset of the sample space, which has
more than on element is called a mixed event.
Example: In throwing a dice, the event of appearing of odd
numbers is a compound event, because E={1,3,5} which has ‘3’ elements.
5. What is the definition of probability?
If ‘S’ be the sample space, then the probability of occurrence
of an event ‘E’ is defined as:
P(E) = n(E)/N(S) =
number of elements in ‘E’
number of elements in
sample space ‘S’
Theoretical Distributions
1. What are theoretical distributions?
Theoretical distributions are based on mathematical formulae and
logic. It is used in statistics to define statistics. When empirical and
theoretical distributions correspond, you can use the theoretical one to
determine probabilities of an outcome, which will lead to inferential
statistics.
2. What are the various types of theoretical
distributions?
§
Rectangular
distribution (or Uniform Distribution)
§
Binomial distribution
§
Normal distribution
3. Define rectangular distribution and
binomial distribution?
Rectangular distribution: Distribution in which all possible
scores have the same probability of occurrence.
Binomial distribution: Distribution of the frequency of events
that can have only two possible outcomes.
4. What is normal distribution?
The normal distribution is a bell-shaped theoretical
distribution that predicts the frequency of occurrence of chance events. The
probability of an event or a group of events corresponds to the area of the
theoretical distribution associated with the event or group of event. The
distribution is asymptotic: its line continually approaches but never reaches a
specified limit. The curve is symmetrical: half of the total area is to
the left and the other half to the right.
5. What is the central limit theorem?
This theorem states that when an infinite
number of successive random samples are taken from a population, the sampling
distribution of the means of those samples will become approximately normally
distributed with mean μ and standard deviation σ/√
N as the same size (N) becomes larger, irrespective of the shape of
the population distribution.
Sampling & Sampling Distributions
1. What is sampling distribution?
Suppose that we draw all possible samples of size n from a given
population. Suppose further that we compute a statistic (mean, proportion,
standard deviation) for each sample. The probability distribution of this
statistic is called Sampling Distribution.
2. What is variability of a sampling
distribution?
The variability of sampling distribution is measured by its
variance or its standard deviation. The variability of a sampling distribution
depends on three factors:
§
N: the no. of
observations in the population.
§
n: the no. of
observations in the sample
§
The way that the
random sample is chosen.
3. How to create the sampling distribution of
the mean?
Suppose that we draw all possible samples of size n from a
population of size N. Suppose further that we compute a mean score for each
sample. In this way we create the sampling distribution of the mean.
We know the following. The mean of the
population (μ) is equal to the mean of the sampling distribution (μx). And the standard error of the sampling
distribution (σx) is determined by the standard
deviation of the population (σ), the population size, and the sample size.
These relationships are shown in the equations below:
μx = μ and σx = σ * sqrt( 1/n – 1/N )
BUSINESS STATISTICS NOTES