Subscriber Login

  • This account has no valid subscription for this site.

Forgotten your password?

Contents

Statistical Graphs 

Chapter:
Visual Presentation of Data
Author(s):

Stacy Christiansen

Statistical Graphs

Line Graphs.

Line graphs have 2 or 3 axes with continuous quantitative scales on which data points connected by curves demonstrate the relationship between 2 or more quantitative variables, such as changes over time. Line graphs usually are designed with the dependent variable on the vertical axis (y-axis) and the independent variable on the horizontal axis (x-axis)3 (Example F1, Example F2).

Example F1 Line graph with the dependent variable on the vertical axis (y-axis) and the independent variable on the horizontal axis (x-axis).

Example F2 Line graph with 3 axes to facilitate comparison of related data.

Survival Plots.

Survival plots of time-to-event outcomes, such as from Kaplan-Meier analyses (see Figure 3 in 20.0, Study Design and Statistics), display the proportion of individuals, represented on the y-axis as a proportion or percentage, remaining free of or experiencing a specific outcome over time, represented on the x-axis. When the outcome of interest is relatively frequent (occurs in approximately ≥70% of the study population), event-free survival is plotted on the y-axis from 0 to 1.0 (or 0% to 100%), with the curve starting at 1.0 (100%). When the outcome is relatively infrequent (occurs in <30% of the study population), it is preferable to plot upward starting at 0 so that the curves can be seen without breaking or truncating the y-axis scale.4 The curve should be drawn as a step function (not smoothed).

The number of individuals followed up for each time interval (number at risk) should be shown underneath the x-axis. Time-to-event estimates become less certain as the number of individuals diminishes, so consideration should be given to not displaying data when less than 20% of the study population is still in follow-up.4 Plots should include some indication of statistical uncertainty, such as error bars on the curves at regular time points or, when time-to-event data are being compared for 2 or more groups, an overall estimate of treatment difference, such as a relative risk (with 95% confidence interval) or log-rank P value (Example F3).

Example F3 Survival curve with the curves clearly marked by study group. The number of study participants at risk is listed under each major time point and a log-rank P value is included in the legend.

Scatterplots.

In scatterplots, individual data points are plotted according to coordinate values with continuous, quantitative x- and y-axis scales. By convention, independent variables are plotted on the x-axis and dependent variables on the y-axis. Data markers are not connected by a curve, but a curve that is generated mathematically may be fitted to the data and summarize the relationship among the variables. The statistical method used to generate the curve and the statistic that summarizes the relationship between the dependent and independent variables, such as a correlation or regression coefficient, should be provided in the figure or legend (Example F4).

Example F4 Scatterplot including the regression line, correlation coefficient, and P value in the plot.

Histograms and Frequency Polygons.

Histograms and frequency polygons display the distribution of data in a data set by plotting the frequency (count or percentages) of observations (y-axis) for each interval represented on the x-axis. In both histograms and frequency polygons, the y-axis must begin at 0 and should not be broken, and the x-axis is a continuous, quantitative scale. Histograms use continuous bars of equal widths determined by the x-axis intervals, where bar height represents frequency (Example F5).

Example F5 Histogram showing frequencies, centered over the bar, for each time period (bar height represents number of cases). Note the use of a figure inset to show how the data fit into a larger context.

Frequency polygons use data markers to represent frequency connected by a curve. Data distributions from 2 data sets that overlap can be plotted in a frequency polygon but not in a histogram (Example F6).

Example F6 Frequency polygons can illustrate distributions for multiple groups.

Bar Graphs.

Bar graphs have a single axis and are used to display frequencies (counts or percentages) on the axis according to categories shown on a baseline. A bar graph is typically vertical, with frequencies shown on a vertical y-axis (Example F7), but may be horizontal (Example F8). Data in each category are represented by a bar. Bars should have the same width, be separated by a space, and be wider than the space between them. Bar lengths are proportional to frequency, the scale on the frequency axis should begin at 0, and the axis should not be broken. All bars must have a common baseline to facilitate comparison.5 Categories of data should be presented in logical order and consistently with other figures and tables in the article. The baseline of a bar graph is not a coordinate axis and therefore should not have tick marks.

Example F7 Vertical bar graph with shading to distinguish the 3 groups that are compared. Note that the bars are presented in the same order (white, black, other) in each grouping.

Example F8 Horizontal bar graph with the frequencies on the x-axis and categories on the y-axis.

Bar graphs may be used to compare frequencies between groups. In most cases, the number of bars in a grouped bar graph should not exceed 3. Colors or tones used to designate each group should be distinct. To ensure that bars in black-and-white figures are distinguishable, a contrast in shading of at least 30% for adjacent bars is suggested. Color or shades of gray should be used instead of patterns and cross-hatching (eg, diagonal lines) on bars.

Component Bar Graph.

Component bar graphs (or divided bar graphs) display the proportion of components constituting the total group, represented by the whole bar (Example F9A). Individual components are designated by distinguishing formats, such as different shading. When possible, it is preferable to use clusters of individual bars to represent each component (Example F9B) because the only values easily interpreted in a component bar graph are the total and the end segments.5

Example F9A A 100% bar graph, a type of component bar graph, shows the components as part of the whole. However, the exact values are not easy to compare with one another in this format.

Example F9B The example in Example F9A replotted using clusters of bars.

Pie Chart.

Like the component bar graph, pie charts compare relationships among component parts. Categories are represented by sections, with the area of the section being proportional to the relative frequency of each category. Pie charts are used commonly in publications intended for lay audiences but should be avoided in scientific publications.6 The angular areas of the individual components of pie charts may be difficult to compare between pie charts. Usually, data depicted in pie charts can be summarized in the text or in a table.7

Dot (Point) Graph.

Dot or point graphs display quantitative data other than counts or frequencies on a single scaled axis according to categories on a baseline (the scaled axis may be horizontal or vertical). Like that in bar graphs, the baseline does not represent a scale and therefore does not contain tick marks. Point estimates are represented by discrete data markers, preferably with error bars to designate variability (Example F10) or box and whisker symbols (Example F11). Dot or point graphs may be used to compare data between study groups, including positive and negative data values relative to a centrally located 0 baseline (“derivation graph”), paired data fromsingle individuals (Example F12), or pooled data in meta-analyses and other analyses that combine data from individual studies (Example F13).

Example F10 Point estimates plotted by category, including error bars and a marker (dotted line) of significance.

Example F11 Box and whisker plot with each element defined in the legend.

Example F12 Individual-value graphs of weight change for each study participant.

Example F13 Effect sizes and pooled (combined) data in a meta-analysis, with the size of the data markers indicating the relative weight of each study. Note that the values plotted are also provided in the risk ratio column. The dotted line at 1.0 represents no effect and allows for quick visualization of the effect of each study listed. The overall χ2 and P values are provided in the figure.

Previous | Next