Scatter Plot
Introduction
A scatter plot is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. The data is displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis. This kind of plot is also called a scatter chart, scattergram, scatter diagram, or scatter graph.
History
The scatter plot was first used by Francis Galton in the 19th century to visualize the relationship between two variables. Galton was a pioneer in eugenics, a field that sought to improve the genetic quality of the human population by promoting higher reproduction of people with desired traits. He used scatter plots to demonstrate the correlation between parents' heights and their children's heights, among other variables.
Mathematical Basis
Scatter plots are based on the concept of Cartesian coordinate system, which is a coordinate system that specifies each point uniquely in a plane by a set of numerical coordinates, which are the signed distances to the point from two fixed perpendicular oriented lines, measured in the same unit of length.
Usage
Scatter plots are used in many fields, including statistics, data science, and social sciences. They are useful for observing relationships between variables, identifying trends in data, and suggesting functional relationships between variables. Scatter plots are also used in quality control processes, where they can help identify the causes of problems in a process.
Construction
To construct a scatter plot, one must first have a set of data with two variables. The values of the first variable are plotted on the x-axis, while the values of the second variable are plotted on the y-axis. Each point on the plot represents a single observation. The position of each point on the horizontal and vertical axis indicates values for an individual data point.
Types of Scatter Plots
There are several types of scatter plots, each with its unique characteristics and uses. These include the simple scatter plot, the grouped scatter plot, and the 3D scatter plot.
Simple Scatter Plot
A simple scatter plot is the most basic type of scatter plot. It involves plotting two variables against each other on a two-dimensional graph. The x-axis represents one variable, and the y-axis represents the other. Each point on the graph represents a single observation.
Grouped Scatter Plot
A grouped scatter plot, also known as a colored scatter plot, is a scatter plot with a color-coding system. Each color represents a different category or group. This type of scatter plot is useful for visualizing the relationship between three variables.
3D Scatter Plot
A 3D scatter plot is a scatter plot that incorporates a third variable. This third variable is represented by the size, color, or shape of the points. 3D scatter plots are useful for visualizing complex data with multiple variables.
Interpretation
Interpreting a scatter plot involves looking for patterns, trends, or correlations in the data. If the points on the scatter plot seem to form a line or curve, there may be a positive or negative correlation between the variables. If the points are randomly distributed with no apparent pattern, there may be no correlation.
Limitations
While scatter plots are useful tools for visualizing and analyzing data, they do have some limitations. They are not suitable for use with categorical data, as this type of data does not have a numerical value that can be plotted on a graph. Scatter plots also may not accurately represent the relationship between variables if the data is not linear.