Populations vs. Samples

“A POPULATION is the total set of items that we are concerned about” (Meier, Brudney, and Bohte, 2011,  p. 173).  In other words, the population is the complete set of individuals or items that share a common characteristic or set of characteristics.  We are often interested in population PARAMETERS, i.e., numerical values that are fixed and describe a characteristic of a population, such as the population mean (μ), variance (σ²), and standard deviation (σ).  

“A SAMPLE is a subset of a population” (Meier, Brudney, and Bohte, 2011, p. 173).  There are two different types of samples: probability samples and non-probability samples. In a PROBABILITY SAMPLE, all members of the population have a KNOWN CHANCE of being selected as part of the sample.  To construct a probability sample, you will need to obtain a list of the entire population; this list then serves as the SAMPLING FRAME from which the sample will be selected/drawn. An example of a probability sample is a RANDOM SAMPLE, in which all members of the population have an equal chance of being selected in a sample. In a NON-PROBABILITY SAMPLE, some members of the population have NO CHANCE of being selected as part of the sample (in other words, the probability of selection cannot be determined). An example of a non-probability sample is a CONVENIENCE SAMPLE, in which the sample is selected based on convenience (i.e., as a result of being easy to contact or reach).

“A STATISTIC is a measure that is used to summarize a sample” (Meier, Brudney, and Bohte, 2011, p. 173), such as the measures of central tendency (ex: sample mean, X̄)and dispersion (ex: sample standard deviation, s) for a variable.  In order to treat sample findings as GENERALIZABLE to the population (i.e., use sample statistics as reliable estimates of the population parameters), the sample should to be a probability sample. 

Why Are Only Probability Samples Generalizable?

Probability samples are more representative of the population.  Furthermore, in probability sampling, the sampling distribution of the sample statistic (e.g., sample mean) can be determined based on statistical principles.  Thus, probability sampling allows us to calculate measures such as margins of error and confidence levels, which account for uncertainty in our sample statistics and capture how reliably they estimate population parameters. 

In contrast, non-probability samples lack a clear and defined sampling distribution, making it impossible to accurately estimate the variability of the sample statistic.

Frequency Distributions

FREQUENCY DISTRIBUTION is a summary of how often each value or range of values occurs in a dataset.  It organizes data into a table or graph that displays the FREQUENCY (count) of each unique CLASS (category or value/interval of values) within the dataset. 

Frequency distributions are an important tool for understanding the distribution and patterns of data.  Frequency distributions provide a clear visual summary of the data, helping to identify patterns such as central tendency, dispersion, and skewness.  Frequency distributions are also an important tool for summarizing data: they condense large datasets into an easily interpretable format.  This can facilitate initial data exploration and analysis.  Furthermore, as data summary tools, frequency distributions can also aid in decision-making processes and serve as a mechanism through which findings can be effectively communicated to various stakeholders.

Frequency Tables

FREQUENCY TABLE is a tabular representation of data that shows the number of occurrences (frequency) of each distinct case (value or category in a dataset).  It organizes raw data into a summary format, making it easier to see how often each value appears. 

While frequency tables are helpful, they do not provide as much information as relative frequency tables.  A RELATIVE FREQUENCY TABLE extends the frequency table by including the relative frequency (i.e., the PERCENTAGE DISTRIBUTION), which is the proportion or percentage of the total number of observations that fall into each case.  A relative frequency table provides a sense of the distribution of data in terms of its overall context.

CUMULATIVE RELATIVE FREQUENCY TABLE shows the cumulative relative frequency (i.e., CUMULATIVE FREQUENCY DISTRIBUTION), which is the sum of the percentage distributions for all values up to and including the current value.  The cumulative percentages for all values should add up to 100% (or something close, depending on rounding errors).  A cumulative relative frequency table helps us to understand the cumulative distribution of the data. 

Another extension of the frequency table is a CONTINGENCY TABLE (also known as a cross-tabulation or crosstab).  A contingency table is used to display the frequency distribution of two or more variables; it shows the relationship between two or more CATEGORICAL VARIABLES (i.e., nominal- or ordinal-level variables) by presenting the frequency of each combination of variable categories.

Charts and Graphs

There are numerous charts and graphs that can be used to display frequency distributions:  

  • BAR GRAPHS (or bar charts) and HISTOGRAMS are graphical representations of data that use rectangular bars to represent the frequency of a value or intervals of values (i.e., BINS); bar charts and histograms are useful for showing the distribution of variables 
  • PIE CHART is a circular graph divided into slices to illustrate numerical proportions, the size of each slice proportional to the quantity it represents; pie charts are useful for showing the relative frequencies of different categories within a whole
  • LINE GRAPH (or line chart) is a type of graph that displays information as a series of data points connected by straight line segments.  Line graphs are often used to show trends over time; they can also be used to summarize frequency distributions of interval- and ratio-level variables

Qualitative (Non-Statistical) vs. Quantitative (Statistical) Research

Non-statistical (qualitative) and statistical (quantitative) research are two fundamental approaches to conducting research, each with its own methods, purposes, and strengths.  

QUALITATIVE (NON-STATISTICAL) RESEARCH aims to explore complex phenomena, understand meanings, and gain insights into people’s experiences, behaviors, and interactions.  It focuses on providing a deep, contextual understanding of a specific issue or topic.  Data is often obtained via interviews, focus groups, participant observations, and content analysis.  Data analysis involves identifying patterns, themes, and narratives and is often interpretative and subjective, relying on the researcher’s ability to understand and articulate the meanings within the data.

QUANTITATIVE (STATISTICAL) RESEARCH aims to identify relationships or causal effects between concepts and/or phenomena.  It seeks to produce results that can be generalized to larger populations.  Data is often obtained via original data obtained through surveys or experiments and secondary data that has already been collected (such as information collected by the U.S. Census Bureau).  Analysis involves using statistical methods to analyze numerical data.  Techniques can range from basic descriptive statistics (ex: mean, median, mode) to complex inferential statistics (ex: linear regression analysis, ANOVA).  Data analysis is typically more objective and replicable, with clear rules and procedures for conducting statistical tests.

While qualitative and quantitative research have distinct differences, they are often used together in mixed-methods research to provide a comprehensive understanding of a research problem.  Qualitative research can provide context and depth to quantitative findings, while quantitative research can offer generalizability and precision to qualitative insights.