R: Quantile-Quantile (Q-Q) Plot (2024)

qqPlot {EnvStats}

R Documentation

Quantile-Quantile (Q-Q) Plot

Description

Produces a quantile-quantile (Q-Q) plot, also called a probability plot. The qqPlot function is a modified version of the R functions qqnorm and qqplot. The EnvStats function qqPlot allows the user to specify a number of different distributions in addition to the normal distribution, and to optionally estimate the distribution parameters of the fitted distribution.

Usage

 qqPlot(x, y = NULL, distribution = "norm", param.list = list(mean = 0, sd = 1), estimate.params = plot.type == "Tukey Mean-Difference Q-Q", est.arg.list = NULL, plot.type = "Q-Q", plot.pos.con = NULL, plot.it = TRUE, equal.axes = qq.line.type == "0-1" || estimate.params, add.line = FALSE, qq.line.type = "least squares", duplicate.points.method = "standard", points.col = 1, line.col = 1, line.lwd = par("cex"), line.lty = 1, digits = .Options$digits, ..., main = NULL, xlab = NULL, ylab = NULL, xlim = NULL, ylim = NULL)

Arguments

`x`	numeric vector of observations. When `y` is not supplied, `x` represents a sample from the hypothesized distribution specifed by `distribution`. When `y` is supplied, the distribution of `x` is compared with the distribuiton of `y`. Missing (`NA`), undefined (`NaN`), and infinite (`Inf`, `-Inf`) values are allowed but will be removed.
`y`	optional numeric vector of observations (not necessarily the same lenght as `x`). Missing (`NA`), undefined (`NaN`), and infinite (`Inf`, `-Inf`) values are allowed but will be removed.
`distribution`	when `y` is not supplied, a character string denoting the distribution abbreviation. The default value is `distribution="norm"`. See the help file for `Distribution.df` for a list of possible distribution abbreviations. This argument is ignored if `y` is supplied.
`param.list`	when `y` is not supplied, a list with values for the parameters of the distribution. The default value is `param.list=list(mean=0, sd=1)`. See the help file for `Distribution.df` for the names and possible values of the parameters associated with each distribution. This argument is ignored if `y` is supplied or `estimate.params=TRUE`.
`estimate.params`	when `y` is not supplied, a logical scalar indicating whether to compute quantiles based on estimating the distribution parameters (`estimate.params=TRUE`) or using the known distribution parameters specified in `param.list` (`estimate.params=FALSE`). The default value of `estimate.params` is `FALSE` if `plot.type="Q-Q"` because the default configuration is a standard normal (mean=0, sd=1) Q-Q plot, which will yield roughly a straight line if the observations in `x` are from any normal distribution. The default value of `estimate.params` is `TRUE` if `plot.type="Tukey Mean-Difference Q-Q"`. The argument `estimate.params` is ignored if `y` is supplied.
`est.arg.list`	when `y` is not supplied and `estimate.params=TRUE`, a list whose components are optional arguments associated with the function used to estimate the parameters of the assumed distribution (see the help file Estimating Distribution Parameters). For example, all functions used to estimate distribution parameters have an optional argument called `method` that specifies the method to use to estimate the parameters. (See the help file for `Distribution.df` for a list of available estimation methods for each distribution.) To override the default estimation method, supply the argument `est.arg.list` with a component called `method`; for example `est.arg.list=list(method="mle")`. The default value is `est.arg.list=NULL` so that all default values for the estimating function are used. This argument is ignored if `estimate.params=FALSE` or `y` is supplied.
`plot.type`	a character string denoting the kind of plot. Possible values are `"Q-Q"` (Quantile-Quantile plot, the default) and `"Tukey Mean-Difference Q-Q"` (Tukey mean-difference Q-Q plot). This argument may be abbreviated (e.g., `plot.type="T"` to indicate a Tukey mean-difference Q-Q plot).
`plot.pos.con`	numeric scalar between 0 and 1 containing the value of the plotting position constant. The default value of `plot.pos.con` depends on whether the argument `y` is supplied, and if not the value of the argument `distribution`. When `y` is supplied, the default value is `plot.pos.con=0.5`, corresponding to Hazen plotting positions. When `y` is not supplied, for the normal, lognormal, three-parameter lognormal, zero-modified normal, and zero-modified lognormal distributions, the default value is `plot.pos.con=0.375`. For the Type I extreme value (Gumbel) distribution (`distribution="evd"`), the default value is `plot.pos.con=0.44`. For all other distributions, the default value is `plot.pos.con=0.4`.
`plot.it`	a logical scalar indicating whether to create a plot on the current graphics device. The default value is `plot.it=TRUE`.
`equal.axes`	a logical scalar indicating whether to use the same range on the `x`- and `y`-axes when `plot.type="Q-Q"`. The default value is `TRUE` if `qq.line.type="0-1"` or `estimate.params=TRUE`, otherwise it is `FALSE`. This argument is ignored if `plot.type="Tukey Mean-Difference Q-Q"`.
`add.line`	a logical scalar indicating whether to add a line to the plot. If `add.line=TRUE` and `plot.type="Q-Q"`, a line determined by the value of `qq.line.type` is added to the plot. If `add.line=TRUE` and `plot.type="Tukey Mean-Difference Q-Q"`, a horizontal line at `y=0` is added to the plot. The default value is `add.line=FALSE`.
`qq.line.type`	character string determining what kind of line to add to the Q-Q plot. Possible values are `"least squares"` (the default), `"0-1"` and `"robust"`. For the value `"least squares"`, a least squares line is fit and added. For the value `"0-1"`, a line with intercept 0 and slope 1 is added. For the value `"robust"`, a line is fit through the first and third quartiles of the `x` and `y` data. This argument is ignored if `add.line=FALSE` or `plot.type="Tukey Mean-Difference Q-Q"`.
`duplicate.points.method`	a character string denoting how to plot points with duplicate `(x,y)` values. Possible values are `"standard"` (the default), `"jitter"`, and `"number"`. For the value `"standard"`, a single plotting symbol is plotted (this is the default behavior of R). For the value `"jitter"`, a separate plotting symbol is plotted for each duplicate point, where the plotting symbols cluster around the true value of `x` and `y`. For the value `"number"`, a single number is plotted at `(x,y)` that represents how many duplicate points are at that `(x,y)` coordinate.
`points.col`	a numeric scalar or character string determining the color of the points in the plot. The default value is `points.col=1`. See the entry for `col` in the help file for `par` for more information.
`line.col`	a numeric scalar or character string determining the color of the line in the plot. The default value is `points.col=1`. See the entry for `col` in the help file for `par` for more information. This argument is ignored if `add.line=FALSE`.
`line.lwd`	a numeric scalar determining the width of the line in the plot. The default value is `line.lwd=par("cex")`. See the entry for `lwd` in the help file for `par` for more information. This argument is ignored if `add.line=FALSE`.
`line.lty`	a numeric scalar determining the line type of the line in the plot. The default value is `line.lty=1`. See the entry for `lty` in the help file for `par` for more information. This argument is ignored if `add.line=FALSE`.
`digits`	a scalar indicating how many significant digits to print for the distribution parameters. The default value is `digits=.Options$digits`.
`main`, `xlab`, `ylab`, `xlim`, `ylim`, `...`	additional graphical parameters (see `par`).

Details

If y is not supplied, the vector x is assumed to be a sample from the probability distribution specified by the argument distribution (and param.list if estimate.params=FALSE). When plot.type="Q-Q", the quantiles of x are plotted on the y-axis against the quantiles of the assumed distribution on the x-axis.

If y is supplied and plot.type="Q-Q", the empirical quantiles of y are plotted against the empirical quantiles of x.

When plot.type="Tukey Mean-Difference Q-Q", the difference of the quantiles is plotted on the y-axis against the mean of the quantiles on the x-axis.

Special Distributions
When y is not supplied and the argument distribution specifies one of the following distributions, the function qqPlot behaves in the manner described below.

"lnorm": Lognormal Distribution. The log-transformed quantiles are plotted against quantiles from a Normal (Gaussian) distribution.
"lnormAlt": Lognormal Distribution (alternative parameterization). The untransformed quantiles are plotted against quantiles from a Lognormal distribution.
"lnorm3": Three-Parameter Lognormal Distribution. The quantiles of log(x-threshold) are plotted against quantiles from a Normal (Gaussian) distribution. The value of threshold is either specified in the argument param.list, or, if estimate.params=TRUE, then it is estimated.
"zmnorm": Zero-Modified Normal Distribution. The quantiles of the non-zero values (i.e., x[x!=0]) are plotted against quantiles from a Normal (Gaussian) distribution.
"zmlnorm": Zero-Modified Lognormal Distribution. The quantiles of the log-transformed positive values (i.e., log(x[x>0])) are plotted against quantiles from a Normal (Gaussian) distribution.
"zmlnormAlt": Lognormal Distribution (alternative parameterization). The quantiles of the untransformed positive values (i.e., x[x>0]) are plotted against quantiles from a Lognormal distribution.

Explanation of Q-Q Plots
A probability plot or quantile-quantile (Q-Q) plot is a graphical display invented by Wilk and Gnanadesikan (1968) to compare a data set to a particular probability distribution or to compare it to another data set. The idea is that if two population distributions are exactly the same, then they have the same quantiles (percentiles), so a plot of the quantiles for the first distribution vs. the quantiles for the second distribution will fall on the 0-1 line (i.e., the straight line y = x with intercept 0 and slope 1). If the two distributions have the same shape and spread but different locations, then the plot of the quantiles will fall on the line y = x + b (parallel to the 0-1 line) where b denotes the difference in locations. If the distributions have different locations and differ by a multiplicative constant m, then the plot of the quantiles will fall on the line y = mx + b (D'Agostino, 1986a, p. 25; Helsel and Hirsch, 1986, p. 42). Various kinds of differences between distributions will yield various kinds of deviations from a straight line.

Comparing Observations to a Hypothesized Distribution
Let \underline{x} = x_1, x_2, \ldots, x_n denote the observations in a random sample of size n from some unknown distribution with cumulative distribution function F(), and let x_{(1)}, x_{(2)}, \ldots, x_{(n)} denote the ordered observations. Depending on the particular formula used for the empirical cdf (see ecdfPlot), the i'th order statistic is an estimate of the i/(n+1)'th, (i-0.5)/n'th, etc., quantile. For the moment, assume the i'th order statistic is an estimate of the i/(n+1)'th quantile, that is:

Value

qqPlot returns a list with components x and y, giving the (x,y) coordinates of the points that have been or would have been plotted. There are four cases to consider:

1. The argument y is not supplied and plot.type="Q-Q".

`x`	the quantiles from the theoretical distribution.
`y`	the observed quantiles (order statistics) based on the data in the argument `x`.

2. The argument y is not supplied and plot.type="Tukey Mean-Difference Q-Q".

`x`	the averages of the observed and theoretical quantiles.
`y`	the differences between the observed quantiles (order statistics) and the theoretical quantiles.

3. The argument y is supplied and plot.type="Q-Q".

`x`	the observed quantiles based on the data in the argument `x`. Note that these are adjusted quantiles if the number of observations in the argument `x` is greater then the number of observations in the argument `y`.
`y`	the observed quantiles based on the data in the argument `y`. Note that these are adjusted quantiles if the number of observations in the argument `y` is greater then the number of observations in the argument `x`.

4. The argument y is supplied and plot.type="Tukey Mean-Difference Q-Q".

`x`	the averages of the quantiles based on the argument `x` and the quantiles based on the argument `y`.
`y`	the differences between the quantiles based on the argument `x` and the quantiles based on the argument `y`.

Note

A quantile-quantile (Q-Q) plot, also called a probability plot, is a plot of the observed order statistics from a random sample (the empirical quantiles) against their (estimated) mean or median values based on an assumed distribution, or against the empirical quantiles of another set of data (Wilk and Gnanadesikan, 1968). Q-Q plots are used to assess whether data come from a particular distribution, or whether two datasets have the same parent distribution. If the distributions have the same shape (but not necessarily the same location or scale parameters), then the plot will fall roughly on a straight line. If the distributions are exactly the same, then the plot will fall roughly on the straight line y=x.

A Tukey mean-difference Q-Q plot, also called an m-d plot, is a modification of a Q-Q plot. Rather than plotting observed quantiles vs. theoretical quantiles or observed y-quantiles vs. observed x-quantiles, a Tukey mean-difference Q-Q plot plots the difference between the quantiles on the y-axis vs. the average of the quantiles on the x-axis (Cleveland, 1993, pp.22-23). If the two sets of quantiles come from the same parent distribution, then the points in this plot should fall roughly along the horizontal line y=0. If one set of quantiles come from the same distribution with a shift in median, then the points in this plot should fall along a horizontal line above or below the line y=0. A Tukey mean-difference Q-Q plot enhances our perception of how the points in the Q-Q plot deviate from a straight line, because it is easier to judge deviations from a horizontal line than from a line with a non-zero slope.

In a Q-Q plot, the extreme points have more variability than points toward the center. A U-shaped Q-Q plot indicates that the underlying distribution for the observations on the y-axis is skewed to the right relative to the underlying distribution for the observations on the x-axis. An upside-down-U-shaped Q-Q plot indicates the y-axis distribution is skewed left relative to the x-axis distribution. An S-shaped Q-Q plot indicates the y-axis distribution has shorter tails than the x-axis distribution. Conversely, a plot that is bent down on the left and bent up on the right indicates that the y-axis distribution has longer tails than the x-axis distribution.

Author(s)

Steven P. Millard ([email protected])

References

Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). Graphical Methods for Data Analysis. Duxbury Press, Boston, MA, pp.11-16.

Cleveland, W.S. (1993). Visualizing Data. Hobart Press, Summit, New Jersey, 360pp.

D'Agostino, R.B. (1986a). Graphical Analysis. In: D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, Chapter 2, pp.7-62.

Examples

 # The guidance document USEPA (1994b, pp. 6.22--6.25) # contains measures of 1,2,3,4-Tetrachlorobenzene (TcCB) # concentrations (in parts per billion) from soil samples # at a Reference area and a Cleanup area. These data are strored # in the data frame EPA.94b.tccb.df. # # Create an Q-Q plot for the reference area data first assuming a # normal distribution, then a lognormal distribution, then a # gamma distribution. # Assume a normal distribution #----------------------------- dev.new() with(EPA.94b.tccb.df, qqPlot(TcCB[Area == "Reference"])) dev.new() with(EPA.94b.tccb.df, qqPlot(TcCB[Area == "Reference"], add.line = TRUE)) dev.new() with(EPA.94b.tccb.df, qqPlot(TcCB[Area == "Reference"], plot.type = "Tukey", add.line = TRUE)) # The Q-Q plot based on assuming a normal distribution shows a U-shape, # indicating the Reference area TcCB data are skewed to the right # compared to a normal distribuiton. # Assume a lognormal distribution #-------------------------------- dev.new() with(EPA.94b.tccb.df, qqPlot(TcCB[Area == "Reference"], dist = "lnorm", digits = 2, points.col = "blue", add.line = TRUE)) dev.new() with(EPA.94b.tccb.df, qqPlot(TcCB[Area == "Reference"], dist = "lnorm", digits = 2, plot.type = "Tukey", points.col = "blue", add.line = TRUE)) # Alternative parameterization dev.new() with(EPA.94b.tccb.df, qqPlot(TcCB[Area == "Reference"], dist = "lnormAlt", estimate.params = TRUE, digits = 2, points.col = "blue", add.line = TRUE)) dev.new() with(EPA.94b.tccb.df, qqPlot(TcCB[Area == "Reference"], dist = "lnormAlt", digits = 2, plot.type = "Tukey", points.col = "blue", add.line = TRUE)) # The lognormal distribution appears to be an adequate fit. # Now look at a Q-Q plot assuming a gamma distribution. #---------------------------------------------------------- dev.new() with(EPA.94b.tccb.df, qqPlot(TcCB[Area == "Reference"], dist = "gamma", estimate.params = TRUE, digits = 2, points.col = "blue", add.line = TRUE)) dev.new() with(EPA.94b.tccb.df, qqPlot(TcCB[Area == "Reference"], dist = "gamma", digits = 2, plot.type = "Tukey", points.col = "blue", add.line = TRUE)) # Alternative Parameterization dev.new() with(EPA.94b.tccb.df, qqPlot(TcCB[Area == "Reference"], dist = "gammaAlt", estimate.params = TRUE, digits = 2, points.col = "blue", add.line = TRUE)) dev.new() with(EPA.94b.tccb.df, qqPlot(TcCB[Area == "Reference"], dist = "gammaAlt", digits = 2, plot.type = "Tukey", points.col = "blue", add.line = TRUE)) #------------------------------------------------------------------------------------- # Generate 20 observations from a gamma distribution with parameters # shape=2 and scale=2, then create a normal (Gaussian) Q-Q plot for these data. # (Note: the call to set.seed simply allows you to reproduce this example.) set.seed(357) dat <- rgamma(20, shape=2, scale=2) dev.new() qqPlot(dat, add.line = TRUE) # Now assume a gamma distribution and estimate the parameters #------------------------------------------------------------ dev.new() qqPlot(dat, dist = "gamma", estimate.params = TRUE, add.line = TRUE) # Clean up #--------- rm(dat) graphics.off()

[Package EnvStats version 2.8.1 Index]

FAQs

What does a quantile-quantile or Q-Q plot compare? ›

A QQ plot is a scatterplot created by plotting two sets of quantiles against one another. If both sets of quantiles came from the same distribution, we should see the points forming a line that's roughly straight. Here's an example of a normal QQ plot when both sets of quantiles truly come from normal distributions.

Tell Me More ›

How to interpret Q-Q plot results? ›

Examining data distributions using QQ plots

Points on the Normal QQ plot provide an indication of univariate normality of the dataset. If the data is normally distributed, the points will fall on the 45-degree reference line. If the data is not normally distributed, the points will deviate from the reference line.

What does a good Q-Q plot look like? ›

The normal distribution is symmetric, so it has no skew (the mean is equal to the median). On a Q-Q plot normally distributed data appears as roughly a straight line (although the ends of the Q-Q plot often start to deviate from the straight line).

Tell Me More ›

What is the Q-Q plot in R normal distribution? ›

The QQ plot shows the data on the vertical axis ranked in order from smallest to largest (“sample quantiles” in the figure below). On the horizontal axis, it shows the expected value of an individual with the same quantile if the distribution were normal (“theoretical quantiles” in the same figure).

Explore More ›

What should my quantile score be? ›

For example, a student's Quantile measure should be at 1350Q by high school graduation to handle the math needed in college and most careers. A student Quantile measure helps you to know: Which skills and concepts students are ready to learn.

Know More ›

How does a Q-Q plot allow us to compare two distributions? ›

Q-Q (quantile-quantile) plots play a vital role in graphically analyzing and comparing two probability distributions by plotting their quantiles against each other. If the two distributions that we are comparing are exactly equal, then the points on the Q-Q plot will perfectly lie on a straight line y = x.

Find Out More ›

How to tell if a Q-Q plot is acceptable? ›

If the points in the Q-Q plot are on a line from the lower left to the upper right then the data is basically normally distributed. Try playing with different sets of data and Equations in this online tool to see the impact on the Q-Q Plots.

Learn More Now ›

How to read quantile-quantile plot? ›

Interpreting QQ plots is intuitive. When all the dots generally follow the straight line y = x, the sample distribution is similar to the theoretical one. The data points don't have to fall right on the line. Instead, they only need to follow a line generally—with random variability placing them above and below it.

Which Q-Q plot indicates a negative skew? ›

Skewed Q-Q plots

If the bottom end of the Q-Q plot deviates from the straight line but the upper end is not, then the distribution is Left skewed(Negatively skewed). Now if upper end of the Q-Q plot deviates from the staright line and the lower is not, then the distribution is Right skewed(Positively skewed).

Read The Full Story ›

What is the difference between Q-Q plot and normal probability plot? ›

For the most part, the normal P-P plot is better at finding deviations from normality in the center of the distribution, and the normal Q-Q plot is better at finding deviations in the tails. Q-Q plots tend to be preferred in research situations. Both Q-Q and P-P plots can be used for distributions other than normal.

Show Me More ›

What does the slope of a Q-Q plot mean? ›

The slope of the Q-Q plot reflects the ratio of the standard deviation of your data to the standard deviation of the normal distribution. If the slope is greater than 1, it means that your data are more spread out than the normal distribution, and vice versa.

Get More Info ›

How to interpret a Q-Q plot in a linear regression model? ›

To interpret a Q-Q plot, you need to look at the shape and pattern of the points. If the points lie on or close to a 45-degree line, it means that the data follow the reference distribution closely.

Discover More Details ›

What does a quantile plot show? ›

The quantile-quantile (q-q) plot is a graphical technique for determining if two data sets come from populations with a common distribution. A q-q plot is a plot of the quantiles of the first data set against the quantiles of the second data set.

What does a quantile-quantile graph do for you in a regression analysis? ›

A Q-Q plot can be used in regression models to check some of the assumptions that are required for valid inference. For example, you can use a Q-Q plot to check if the residuals of the model are normally distributed, which is an assumption for many parametric tests and confidence intervals.

Find Out More ›

What is the difference between a quartile and a quantile? ›

Quantiles are values that split sorted data or a probability distribution into equal parts. In general terms, a q-quantile divides sorted data into q parts. The most commonly used quantiles have special names: Quartiles (4-quantiles): Three quartiles split the data into four parts.

		Distribution
		Often Used
Name	a	With	References
Weibull	0	Weibull,	Weibull (1939),
		Uniform	Stedinger et al. (1993)
Median	0.3175	Several	Filliben (1975),
			Vogel (1986)
Blom	0.375	Normal	Blom (1958),
		and Others	Looney and Gulledge (1985)
Cunnane	0.4	Several	Cunnane (1978),
			Chowdhury et al. (1991)
Gringorten	0.44	Gumbel	Gringorton (1963),
			Vogel (1986)
Hazen	0.5	Several	Hazen (1914),
			Chambers et al. (1983),
			Cleveland (1993)

R: Quantile-Quantile (Q-Q) Plot (2024)

Quantile-Quantile (Q-Q) Plot

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

FAQs

What does a quantile-quantile or Q-Q plot compare? ›

What does the slope of a Q-Q plot mean? ›