(1) Department of Clinical Epidemiology and MTA, Maastricht University Medical Center, Maastricht, The Netherlands
(2) Department of General Practice, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
(3) Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
(4) Kleijnen Systematic Reviews Ltd, York, UK
(5) Care and Public Health Research Institute (CAPHRI), Maastricht University, Maastricht, The Netherlands
(6) Horten Zentrum, Medical Faculty, University of Zurich, Switzerland
* Corresponding author Email:fons.kessels@mumc.nl
Introduction
The aim of this paper is to propose a transparent, alternative approach for network meta-analysis based on a regression model that allows inclusion of studies with three or more treatment arms.
Methodology
Based on the contingency tables describing the frequency distribution of the outcome in the different intervention arms, a data set is constructed. A logistic regression is used to determine the parameters describing the difference in effect between a specific intervention and the reference intervention and to check the assumptions needed to model the effect parameters. The method is demonstrated by re-analysing 24 studies investigating the effect of smoking cessation interventions. The results of the analysis were similar to two other published approaches to network analysis using the same data set. The presence of heterogeneity, including inconsistency, was examined.
Conclusion
The proposed method provides an easy and transparent way to estimate treatment effect parameters in meta-analyses involving studies with more than two arms. It has several additional attractive features such as not overweighting small studies as the random effect models do, dealing with zero count cells, checking of assumptions about the distribution of model parameters and investigation of heterogeneity across trials and between direct and indirect evidence.
Including indirect comparisons in meta-analysis can be of value when evidence from head-to-head comparisons is not or only sparsely available, to increase the precision of any meta-analysis of the direct comparisons or to formally rank the benefits of a larger set of different treatments^{[1,2]}. An even more imperative reason for inclusion of indirect comparisons is that valid data should not be ignored and like all other pertinent evidence should be included in the meta-analysis. As Fisher^{[3]} said: ‘All statisticians know that data are falsified if only a selected part is used’.
To compare the effects of more than two competing treatments, several methods have been proposed to pool all available evidence in one network analysis. Some methods^{[1,4]} cannot be applied when trials with more than two treatment arms are included. A method that solves this problem is the Mixed Treatment Comparisons based on a Bayesian model^{[2]}. Given the likelihood function based on the data collected in the studies, this method assumes a number of prior distributions which generate the posterior distribution of the effect parameters.
We propose an approach based on an ordinary regression model in which all available information is pooled with respect to the difference in effect of two or more treatments (i.e. combining all information from direct and indirect comparisons). The outline of this paper is as follows. First, we describe the method using studies with binary outcomes and demonstrate the method with a data set of 24 studies which investigated the effect of smoking cessation interventions. We compare the results of our analysis with those of two other published approaches to network analysis (i.e. Hasselblad, and Lu and Ades (Bayesian hierarchical model))^{[2,5,6]} that had previously been applied to the same data set. In the next section we discuss the causes of heterogeneity (i.e. the difference in effect estimates between studies that cannot be explained by chance alone)^{[7]}. This includes inconsistency of results between direct and indirect comparisons for the same treatment contrast. Next, we show how heterogeneity can be modelled and its causes investigated. In the last section we discuss some features of our approach and show how the method can be expanded to trials with interval scaled outcomes.
The authors have referenced some of their own studies in this methodology. These referenced studies have been conducted in accordance with the Declaration of Helsinki (1964) and the protocols of these studies have been approved by the relevant ethics committees related to the institution in which they were performed. All human subjects, in these referenced studies, gave informed consent to participate in these studies.
From the difference in outcomes between treatment A and B and the difference between treatment A and C, the difference between treatment B and C can be deduced. In a network analysis the results of these indirect comparisons as well as the results of studies comparing the outcome of treatments B and C directly are used to generate the overall result.
A network analysis using a regression model can be done at three outcome levels. The customary approach^{[8]} is to calculate for each study an effect parameter and its variance describing the difference in effect between two treatments. In the second step these effect parameters are pooled, and weighted by the inverse of their variances^{[9]}.
A less common approach is to calculate an effect parameter and its variance for each treatment arm separately and use these parameters in a regression model as described by Hasselblad^{[5]} and Berlin et al.^{[10]}. Both the methods are two-step approaches. First an effect parameter per study or study arm is determined and in the second step the results are pooled. The pooling leads to distinguishing within-study or study treatment arms variance from between-study or study treatment arms variance.
What we propose here is a one-step approach that uses the data at the patient level. With the reported two-by-two contingency table a data file can be constructed in which each of the contributing patients is represented by a record with one variable describing the study (S_{i}, i = 1, 2...N), one variable describing the treatment (T_{k}, k = 1, 2...K) and a variable describing the outcome Y (zero or one). In this way the original study population data set is reconstructed. Of course, patient-level information on covariates is not available.
In a logistic regression model using dummy variables to code for the studies and treatments, the logarithm of the odds (LnOdds) of the outcome Y for each patient can be written as:
Where α_{i} is the LnOdds of treatment success (or failure) in study S_{i} for the reference treatment (for example a placebo) and β_{k} is the difference of the LnOdds between treatment T_{k} and the reference treatment. In this model we assume that the treatment effect is constant (fixed) over the studies. In section 3, we will relax this assumption. The study-specific intercept α_{i} adjusts for differences in risk profiles between the study populations and other between-study differences in study design. These study-specific intercepts adjust for differences across studies. Furthermore, in this one-step analysis, no distinction is made between within- and between-variances of the parameter β_{k} and both are included in the variance of β_{k}. There is no limitation on the number of treatment arms per study. As each patient is included in the analysis only once, this method avoids the problem of using the same patients several times in more than two-armed trials as in the approaches proposed by Lumley^{[4]} and Bucher et al.^{[1]}. Considerations of statistical sufficiency show that our approach is equivalent to using grouped logistic regression on the outcome counts in each treatment arm.
We applied our method to data of 24 studies on the effect of four smoking cessation treatments. These studies compared the effect of four treatments for smoking cessation: ‘no contact’ (n = 7231 with 605 cases), ‘self-help’ group (n = 1568 with 155 cases), ‘individual counselling’ (n = 7383 with 1209 cases) and ‘group counselling’ (n = 555 with 103 cases). Two studies compared three different interventions. A short description of these studies can be found in Supplementary information. These 24 studies were also used by Hasselblad^{[5]} estimating the effect parameters for each treatment arm in a random effect model and by Lu and Ades^{[2]} in a Bayesian network analysis^{[6]}.
The results of the different approaches, shown in Table 1, are similar except for the results of the Bayesian model with random effects.
In our approach the effect parameters and confidence intervals of all possible contrasts can be determined by successively choosing another treatment as the reference category.
Furthermore, with this model we can calculate an effect parameter based on the results of the subset of indirect comparisons only. For example, to calculate the result of the indirect contrast, ‘no contact’ versus ‘individual counselling’ we use the same model as described in ‘expression 1’ but exclude all studies that investigate this specific contrast directly. Similarly, an overall coefficient based on direct contrast can be calculated by including only direct comparisons.
In the analysis of heterogeneity the result of separately pooling the indirect comparisons will be presented as the result of a virtual trial just like the results of the direct comparisons. In this way, we may explore the extent to which adding indirect comparisons influence the outcome of the analysis.
The assumption that the effect of a treatment is constant or fixed in all studies is not realistic as the treatment effect (i.e. the parameter β_{k}) can vary from study to study. The effect of a treatment as recorded in a study may be modified by population characteristics like age, sex, stage of disease, duration and severity of complaints and characteristics of the study set-up, such as duration of follow-up, differences in the application of the intervention or the way the outcome is measured. Furthermore, variation could occur through design flaws and imperfections in the conduct of the study. Studies with severe methodological shortcomings should not be included in a meta-analysis. But, for example, the treatment groups in a trial can show some differences with respect to their prognosis and behaviour and some residual confounding may be present. Both effect modification and residual bias can be accounted for by including an interaction term.
To do so, the formula of LnOdds becomes:
Where δ_{i,k} denotes the deviation from the overall effect of treatment β_{k} by the interaction between treatment k with study i. The interaction term of the reference treatment and study i is included in the study-specific intercept α_{i}.
In a two-step analysis the variation in δ_{i,k} describes the heterogeneity and it is assumed that δ_{i,k} is normally distributed. The random effects approach is based on this assumption. If it is assumed that the included studies are a random sample of a population of possible studies, the random effects approach could be justified. However, the design and conduct of a study is the product of a cognitive process, and designs and ways of study conduct could be tailored in various ways. This could result in a non-normal distribution through outlying values of δ_{i,k} or δ_{i,k} having a bimodal or skewed distribution and this may have a substantial influence on the estimates of the coefficients and their variances. If the deviation δ_{i,k} is small, there are no important interactions apparent, and a model of type ‘expression 1’ is adequate. If the values of δ_{i,k} are substantial, model of type ‘expression 2’ is adequate. If, in addition, the δ_{i,k} are normally distributed, a random effects model may be used. Then the variation in the δ_{i,k} will be included in the variance of the estimated parameters β_{k}. Therefore, investigating whether the normality assumption holds and which factors could modify the interaction δ_{i,k} (effect modification or bias), should be a part of the meta-analysis. This is demonstrated in the next example using the comparison between ‘no contact’ and ‘individual counselling’.
The distribution of the coefficients of the direct comparisons and the coefficient of the combined indirect comparisons are described with the help of a Forest plot and a histogram as shown in Figures 1 and 2.
To investigate heterogeneity we used the classical I^{[2]}-statistic^{[11]}.
A part of the heterogeneity could be caused by including indirect comparisons in the analysis. The Forest plot in Figure 1 and the tests for heterogeneity, with and without the indirect comparison, show that there is heterogeneity within the direct comparisons which is not increased by adding the result of the indirect comparisons.
Figure 2 shows that the 15 coefficients, β_{k}, which describe the difference in effect between ‘no contact’ versus ‘individual counselling’ appear to follow a bimodal distribution.
The implication of this is that assuming a random effect model might well give erroneous results. Looking more closely at the differences between the characteristics of the 11 utmost-left and four utmost-right studies could give some clue for the causes of the bimodality. Stratifying the analysis in this way resulted in two estimates of β (se) of 0.28 (0.07) and 2.25 (0.14), which are significantly different (P < 0.001). As a consequence, one might consider to report both results separately and not as a single pooled estimate.
In the literature, inconsistency, that is, the difference between the results of the direct and indirect comparisons, has been emphasised by many authors as a special phenomenon that could jeopardise the pooling of all the results^{[12,13,3,4]}. Furthermore, it has been argued that the results of indirect comparisons are less valid than those based on direct comparisons^{[14]} and when comparing the results of indirect with the results of direct comparisons the latter should be the gold standard^{[15]}.
Of course, studies being part of an indirect comparison might well differ in characteristics of design or population. This could occur when the trials at issue used different inclusion criteria. Song et al.^{[16]} describes an example in which trials of intervention A versus B were all carried out in primary care and trials of intervention A versus C in secondary care. This means that the indication for which the contrast was investigated was different between the trials and, of course, studies included in a meta-analysis should have comparable study populations (i.e. comparable study questions). Another reason could be that imperceptibly different study populations are included. In the same study, Song et al.^{[16]} describes an example where in one study antibiotics A and C are investigated in patients whose pathogenic bacteria are sensitive to both antibiotics and that in a second study antibiotics B and C are investigated in patients with the same indication, whose pathogenic bacteria are resistant to antibiotic B. In both cases the reasoning is that an indirect comparison of A versus B, based on the two trials would be invalid. These explanations for effect modification, perceptible or imperceptible, will affect the validity of the results, but there is no reason to believe that this only happens when indirect comparisons are investigated. As effect modification can affect both direct and indirect comparison, we think that inconsistency should be described as a part of the heterogeneity as we demonstrated in Figures 1 and 2. At any rate, with our approach it is possible to compare the results of the direct and indirect comparisons and test this ‘inconsistency’ parameter. This is shown in Table 2.
Table 2
The results (Log Odds ratios) of indirect and direct comparisons and their differences with standard errors in brackets |
The results of the direct and indirect comparisons do not significantly differ. In fact, Figure 1 shows that the significant heterogeneity in the coefficients describing the difference between ‘individual counselling’ and ‘no contact’ originates from the direct comparisons. It is important to note that the assessment of heterogeneity, including inconsistency, should not only rely on statistical significance testing alone. In fact, we are interested whether the direct and indirect estimates are similar enough to be pooled. A possible approach would be, similar to the analysis of non-inferiority or equivalence studies, to specify
An alternative way of modelling the data would be to assume a normal distribution for the intercepts. Although a random intercept model can easily be implemented, there are two reasons to use a fixed intercept model. First, this model generates estimates of the study intercepts α_{i}. This makes it possible to check their distribution and, for example, detect outliers (i.e. studies with extremely low or high chances to stop smoking in the reference group), here the ‘no contact’ condition (see Figure 3).
Second, a fixed-effect model depends on fewer assumptions, that is, no normality assumption for the distribution of intercepts and therefore may be more reliable, depending on the degree to which the assumption is violated. A reason to use a random intercept model would be that this reduces the number of parameters and increases statistical efficiency. In our example the number of parameters is five in a random intercepts model and 27 in a fixed-intercepts model. The fit of this model with a binary outcome is dependent on the minimum number of cases and non-cases^{[17]}. As the number of cases is 2072 and non-cases 14 665, we can easily afford to estimate 22 additional parameters. In general the number of observations will be much larger than the number of parameters in a fixed-intercepts model. This implies that there is no urgent reason to use a random intercepts model for efficiency reasons.
Our approach can also be used when for each treatment arm means and standard deviations of interval-scaled outcomes are reported. Conventional pooling is based on the assumption that the outcome variable is normally distributed. Using the same assumption of a normal distribution of the outcomes, a data set can be generated which is comparable to the original data set of individual values, except for individual level information on covariates.
Assume that in a treatment arm N persons are included with a specific mean and standard deviation. Drawing N values from a normal distribution with these parameters will result in a sample with a mean and standard deviation. By chance, these parameters will be somewhat different from the original parameters, but a simple linear transformation can adjust for these differences.
For all the treatment arms such a data set can be generated that leads to the same likelihood function as that from the original data and that can be used as input for a suitable linear regression model. Analogous with the foregoing, all sources of variation can be investigated and modelled. We have applied this method in several studies^{[14,18,19,20]}.
The approach proposed here is based on ordinary regression models and therefore is a transparent, easy to implement and valid method that gives estimates of effect parameters with its variances using all available information (i.e. results of direct and indirect comparisons). This method is equivalent to performing an analysis with pooled data from all participating studies. It can only be applied if all studies have been randomised with balanced potentially confounding factors across treatment arms. However, this limitation holds for all meta-analyses that do not use individual patient data.
Applying this method to 24 studies investigating the effect of four different interventions for smoking cessation, we demonstrated that the method works well and by investigating the mechanisms of heterogeneity we exposed some incongruities that needed further exploration. Examination of such incongruities should be a part of each analysis, but we could not extract sufficient information from the studies. Furthermore, the primary goal was to demonstrate the performance of our network analysis and to show the need of a thorough examination of heterogeneity.
At present, the most frequently used approach to do network analysis including studies with three or more arms is the Mixed Treatment Comparisons based on a Bayesian model^{[2]}. This method assumes a number of prior distributions and together with the likelihood based on the data collected in the studies, the posterior distribution of the effect parameters is determined. However, the estimates used for indirect comparisons are not independent and respective correlations are generally unknown. In Table 1, the results of two different Bayesian models are shown. The result of the Bayesian fixed-effect model was quite similar to the results of Hasselblad^{[5]} and our results. However, the results of the Bayesian random effects model showed substantial differences when self-help and group counselling with no contact were compared. This could be caused by the fact that the data on the direct comparison in these two cases are sparse (4 and 3 studies, respectively). It is known that the result of a Bayesian model may be highly dependent on the prior distribution when data is sparse, possibly leading to these considerable deviations^{[21]}.
Furthermore, the use of a different likelihood function with additional parameters and assuming a normal distribution of the coefficients could also be an explanation.
Our method has a number of attractive features. First, it is a straightforward generalisation of the fixed-effects meta-analysis and we have applied the method in several network analyses^{[22,23,24,25]}. Second, it is based on well-known procedures of logistic regression. Third, all assumptions made may be checked against the data. Fourth, the analysis is based on data on the (notional) patient level, meaning that studies with three or more treatment arms can be included as every patient is entered in the analysis only once. Fifth, the regression model preserves randomisation. Moreover, with the intercept parameters, the similarity of the studies and possible causes for heterogeneity can be investigated. Sixth, estimates of direct and indirect comparisons are easily available. Seventh, the method avoids overweighting small studies as all random effects models do, which may not be desirable^{[7]}. This method ensures that each patient, irrespective of the size of the study s/he participated in, contributes in the same way to estimating the coefficients. Eighth, no additional assumptions are needed for the zero cell problem (like adding 0.5). The regression model will produce an estimate, correctly taking into account the information from the zero cell arms, provided not all arms of a specific treatment contain zero cell counts. Ninth, this model and the resulting estimates allow checking the assumptions needed to model the effect parameters, in particular whether the effect parameters can be assumed to follow a normal distribution. Tenth, if necessary and warranted, a random effect model may be used. Finally, causes of heterogeneity (and inconsistency) can be investigated by examining the influence of population and design characteristics on the effect parameters (meta-regression in network analysis).
These advantages lead us to propose this method either as the first method of analysis, or to compare to the Bayesian approach. Because of the directness and transparency of the method we also advocate its use in meta-analyses of direct comparisons only.
All authors contributed to the conception, design, and preparation of the manuscript, as well as read and approved the final manuscript.
None declared.
None declared.
All authors abide by the Association for Medical Ethics (AME) ethical rules of disclosure.
Description of the studies included. The studies 1 and 6-19 contribute to the direct comparisons and the remaining studies (2-5 and 20-24) to the indirect comparison of ‘individual counselling’ versus ‘no contact’
Study | No contact | Self-help | Individual counselling | Group counselling |
---|---|---|---|---|
1. Cottreaux 1983, Beh Res Ther | 9/140 | 23/140 | 10/138 | |
2. Mothersill 1988, Addict Beh | 11/78 | 12/85 | 29/170 | |
3. Gritz 1992, Health Psychol | 79/702 | 77/694 | ||
4. Campbell 1986, Thorax | 18/671 | 21/535 | ||
5. Pallonen 1994, Prev Med | 8/116 | 19/149 | ||
6. Reid 1974, Lancet | 75/731 | 363/714 | ||
7. Slama 1990, BMJ | 2/106 | 9/205 | ||
8. Jamrozik 1984, BMJ | 58/549 | 237/1561 | ||
9. Rabkin 1984, Addict Behav | 0/33 | 9/48 | ||
10. Richmond 1986, BMJ | 3/100 | 31/98 | ||
11. Leung 1991, Psychologia | 1/31 | 26/95 | ||
12. Langford 1983, Can J Pub Health | 6/39 | 17/77 | ||
13. Russell 1983, BMJ | 95/1107 | 143/1031 | ||
14. Stewart 1982, Can Med Assoc J | 15/187 | 36/504 | ||
15. Russell 1979, BMJ | 78/584 | 73/675 | ||
16. Kendrick 1995, Am J Pub Health | 69/1177 | 54/888 | ||
17. Sanders 1989, JR Coll Gen Pract | 64/642 | 107/761 | ||
18. Page 1986, Addict Behav | 5/62 | 8/90 | ||
19. Vetter 1990, Age Ageing | 20/234 | 34/237 | ||
20. Williams 1988, Addict Behav | 0/20 | 9/20 | ||
21. Decker 1989, Addict Behav | 20/49 | 16/43 | ||
22. Mogielnicki 1986, J Behav Med | 7/66 | 32/127 | ||
23. Hilleman 1993, Ann Pharmacother | 12/76 | 20/74 | ||
24. Gillams 1984, Practioner | 9/55 | 3/26 |
Regression coefficients (Log Odds ratios) and corresponding standard errors of four different models describing the effect of three treatments for smoking cessation as compared with ‘no contact’ as reference
Our analysis | Hasselblad^{5} | Lu and Ades^{2,6} Fixed effect | Lu and Ades^{2,6} Random effect | |
---|---|---|---|---|
Self-help | 0.23(0.13) | 0.26(0.17) | 0.23(0.13) | 0.52(0.39) |
Individual counselling | 0.64(0.06) | 0.64(0.07) | 0.77(0.06) | 0.81(0.23) |
Group counselling | 0.83 (0.17) | 0.83 (0.26) | 0.84 (0.18) | 1.16 (0.45) |
The results (Log Odds ratios) of indirect and direct comparisons and their differences with standard errors in brackets
Direct comparisons | Indirect comparisons | Difference | P-value of difference | |
---|---|---|---|---|
Self-help | 0.14(0.14) | 0.52(0.26) | −0.38(0.30) | 0.20 |
Individual counselling | 0.64(0.06) | 0.42(0.30) | 0.22(0.31) | 0.40 |
Group counselling | 0.86(0.43) | 1.00(0.21) | −0.14(0.48) | 0.79 |