4.2.1 Statistical Graphs
Line graphs have 2 or 3 axes with continuous quantitative scales on which data points connected by curves demonstrate the relationship between 2 or more quantitative variables, such as changes over time. Line graphs usually are designed with the dependent variable on the vertical axis (y-axis) and the independent variable on the horizontal axis (x-axis)3 (Example F1, Example F2).
Survival plots of time-to-event outcomes, such as from Kaplan-Meier analyses (see Figure 3 in 20.0, Study Design and Statistics), display the proportion of individuals, represented on the y-axis as a proportion or percentage, remaining free of or experiencing a specific outcome over time, represented on the x-axis. When the outcome of interest is relatively frequent (occurs in approximately ≥70% of the study population), event-free survival is plotted on the y-axis from 0 to 1.0 (or 0% to 100%), with the curve starting at 1.0 (100%). When the outcome is relatively infrequent (occurs in <30% of the study population), it is preferable to plot upward starting at 0 so that the curves can be seen without breaking or truncating the y-axis scale.4 The curve should be drawn as a step function (not smoothed).
The number of individuals followed up for each time interval (number at risk) should be shown underneath the x-axis. Time-to-event estimates become less certain as the number of individuals diminishes, so consideration should be given to not displaying data when less than 20% of the study population is still in follow-up.4 Plots should include some indication of statistical uncertainty, such as error bars on the curves at regular time points or, when time-to-event data are being compared for 2 or more groups, an overall estimate of treatment difference, such as a relative risk (with 95% confidence interval) or log-rank P value (Example F3).
In scatterplots, individual data points are plotted according to coordinate values with continuous, quantitative x- and y-axis scales. By convention, independent variables are plotted on the x-axis and dependent variables on the y-axis. Data markers are not connected by a curve, but a curve that is generated mathematically may be fitted to the data and summarize the relationship among the variables. The statistical method used to generate the curve and the statistic that summarizes the relationship between the dependent and independent variables, such as a correlation or regression coefficient, should be provided in the figure or legend (Example F4).
Histograms and Frequency Polygons.
Histograms and frequency polygons display the distribution of data in a data set by plotting the frequency (count or percentages) of observations (y-axis) for each interval represented on the x-axis. In both histograms and frequency polygons, the y-axis must begin at 0 and should not be broken, and the x-axis is a continuous, quantitative scale. Histograms use continuous bars of equal widths determined by the x-axis intervals, where bar height represents frequency (Example F5).
Frequency polygons use data markers to represent frequency connected by a curve. Data distributions from 2 data sets that overlap can be plotted in a frequency polygon but not in a histogram (Example F6).
Bar graphs have a single axis and are used to display frequencies (counts or percentages) on the axis according to categories shown on a baseline. A bar graph is typically vertical, with frequencies shown on a vertical y-axis (Example F7), but may be horizontal (Example F8). Data in each category are represented by a bar. Bars should have the same width, be separated by a space, and be wider than the space between them. Bar lengths are proportional to frequency, the scale on the frequency axis should begin at 0, and the axis should not be broken. All bars must have a common baseline to facilitate comparison.5 Categories of data should be presented in logical order and consistently with other figures and tables in the article. The baseline of a bar graph is not a coordinate axis and therefore should not have tick marks.
Bar graphs may be used to compare frequencies between groups. In most cases, the number of bars in a grouped bar graph should not exceed 3. Colors or tones used to designate each group should be distinct. To ensure that bars in black-and-white figures are distinguishable, a contrast in shading of at least 30% for adjacent bars is suggested. Color or shades of gray should be used instead of patterns and cross-hatching (eg, diagonal lines) on bars.
Component Bar Graph.
Component bar graphs (or divided bar graphs) display the proportion of components constituting the total group, represented by the whole bar (Example F9A). Individual components are designated by distinguishing formats, such as different shading. When possible, it is preferable to use clusters of individual bars to represent each component (Example F9B) because the only values easily interpreted in a component bar graph are the total and the end segments.5
Like the component bar graph, pie charts compare relationships among component parts. Categories are represented by sections, with the area of the section being proportional to the relative frequency of each category. Pie charts are used commonly in publications intended for lay audiences but should be avoided in scientific publications.6 The angular areas of the individual components of pie charts may be difficult to compare between pie charts. Usually, data depicted in pie charts can be summarized in the text or in a table.7
Dot (Point) Graph.
Dot or point graphs display quantitative data other than counts or frequencies on a single scaled axis according to categories on a baseline (the scaled axis may be horizontal or vertical). Like that in bar graphs, the baseline does not represent a scale and therefore does not contain tick marks. Point estimates are represented by discrete data markers, preferably with error bars to designate variability (Example F10) or box and whisker symbols (Example F11). Dot or point graphs may be used to compare data between study groups, including positive and negative data values relative to a centrally located 0 baseline (“derivation graph”), paired data fromsingle individuals (Example F12), or pooled data in meta-analyses and other analyses that combine data from individual studies (Example F13).