The World Health Organization 1973 classification system for grade is an important prognosticator in T1 non‐muscle‐invasive bladder cancer
Stage pT1 bladder carcinomas (BCs) represent a difficult clinical scenario as they have different outcomes and are associated with a high risk of progression to muscle‐invasive tumours. The optimal therapeutic approach for individual patients in this setting is still unclear: conservative treatment with BCG instillation and intravesical chemotherapy may lead to disease progression and death, while radical cystectomy may represent a mutilating overtreatment for patients with tumours that may have low potential for progression.
The ability to discriminate those patients who will probably progress to carcinoma invading bladder muscle is therefore crucial. Among prognostic factors associated with progression to muscle invasion, tumour grade is one of the most important. In their important paper, van de Putte et al.  aimed to compare the prognostic value of the WHO 1973 and 2004 grading systems, the latter being recommended by the AUA guidelines as the most widely accepted in the USA , although it has not been proven superior to the other .
The authors collected transurethral resections from 601 primary T1 BCs, initially managed conservatively (BCG), from four institutions, and three pathologists reviewed the slides. Importantly, a second transurethral resection was performed if the muscularis propria was absent and/or the initial resection was incomplete. Grade was assigned according to the WHO 1973 (G1–3) and WHO 2004 (low grade [LG] and high grade [HG]) systems. None of the cases was classified as G1. The prognostic value of both grading systems for progression‐free and cancer‐specific survival was then assessed. Notably, the author found WHO1973 G3 to be significantly negatively associated with progression‐free survival and cancer‐specific survival on multivariable analysis, while the WHO 2004 grading system was not. Importantly, intra‐observer variability was assessed in 66 cases and was found to be almost perfect for the WHO 1973 and moderate to substantial for the WHO 2004 system, while inter‐observer variability ranged from moderate to substantial for both systems. One of the reasons for the lack of prognostic potential of the WHO 2004 system, as underscored by the authors, is the fact that the morphological criteria defined in the WHO 2004 system cause an important shift of many cases from the G2 to HG category, rendering it an almost one‐tier system with consequently very few LG tumours. Other studies have assessed the prognostic value of the WHO 1973 and WHO 2004 systems  but so far no clear superiority emerged for one system over the other, probably because of relatively low sample sizes.
Other clinical prognostic factors associated with progression to muscle‐invasive tumours include tumour dimension, the presence of multiple lesions, the presence of carcinoma in situ, lymphovascular invasion and level of lamina propria invasion. Regarding the latter prognostic factor, different studies have defined T1 sub‐staging according to invasion above (T1a), within (T1b) or beyond (T1c) the muscularis mucosae and vascular plexus; however, this approach has been found not to be applicable in >40% of cases because of difficulties in identifying the vascular plexus or lack of orientation of the specimens. A more friendly and reproducible method has been proposed by some of the authors of the study, consisting of a categorization of T1 BCs into microinvasive (T1m) and extensively invasive (T1e) tumours, which has been demonstrated to be applicable in 100% of cases and more reproducible . Further study incorporating T1 sub‐staging together with grade may prove very useful.
Different studies have been performed to identify prognostic markers at the molecular level; however, despite huge efforts, no molecular biomarker with prognostic potential is currently suitable for clinical application . Moreover, in six studies that investigated T1 sub‐stage and molecular markers in the same series, T1 sub‐stage showed the highest prognostic value . More recently, subtyping BC into basal‐like and genomically unstable or squamous cell carcinoma‐like tumours has emerged as a promising tool for dividing T1 BCs into low‐ and high‐risk categories ; however, such an approach must be combined with the prognostic value of the classic histological variables discussed so far before eventually being integrated into prognostic tools.
In this regard, van de Putte et al.  have shown that tumour grade still represents a powerful marker in T1 BC and that the WHO 2004 grading system cannot replace the WHO 1973 system as a prognosticator of T1 BC; therefore, as recommended by the European Association of Urology guidelines, the WHO 1973 grading system categories should always be present in the pathology reports.
In this issue, BJU International has made the conscious decision to publish a systematic review (SR) and meta-analysis (MA) by Guo et al. to inform the question of whether patients undergoing nephrouretectomy for upper tract urothelial carcinoma are at increased risk of worse oncological outcomes. This question was also the topic of a similar review by Marchioni et al.  published earlier this year in this journal. Both studies were submitted around the same time and underwent independent, parallel peer review that resulted in different editorial decisions. Given their similarity in methodological quality they both deserved similar consideration for publication, which the journal is hereby honouring.
At the same time, this provides the unique opportunity to reflect on methodological developments in the field and BJU International‘s efforts to raise the bar of the methodological quality of SRs, which include the provision of an Assessment of Multiple Systematic Reviews (AMSTAR) rating . AMSTAR is a validated tool to assess the components of a SR on an 11-point scale (0–11), with higher scores reflecting higher methodological rigor. An updated version of this tool has recently been provided, which offers greater clarity in interpretation . Another related instrument that has become available is the Risk of Bias in Systematic Reviews (ROBIS), which assesses the study limitations in SRs (i.e., the relevance of the review, concerns with the review process, and potential bias introduced during the review) . Meanwhile, while it would be premature to claim success, it is our impression that BJU International’s initiative to provide AMSTAR ratings is making a valuable contribution in raising awareness for such methodological issues and improving the transparency of published reviews.
As BJU International takes a lead in promoting high-quality SRs in urology, the journal has seen a considerable increase in the number of submissions, including SRs of non-randomised studies (NRS). Whilst much of what we practice on a day-to-day basis is based on evidence from NRS, studies of those designs have infrequently been included in the Cochrane Library, which has pioneered much of the underlying methodology. This is for a few reasons: First, the ‘garbage in–garbage out’ phenomenon; if the underlying individual studies only provide very low-quality evidence, combining these studies will rarely enhance the confidence we place in their results. Second, the need for methodological advances in the assessment and analyses of NRS. Third, when high-quality evidence from randomised controlled trials (RCTs) is available, it may be inefficient to review the NRS literature.
However, progress is being made on the methodology front. Members of the Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group are credited with having developed an approach for rating the quality or certainty of evidence from randomised and NRS to inform decision making . While a body of evidence from RCTs, which starts as high-quality evidence, may be downgraded for study limitations, a body of evidence from NRS, which starts as low-quality evidence, may be upgraded for one of three reasons, most commonly for large magnitude of effect . The underlying assumption is that, whilst we have to assume that bias is likely to be present in these studies, it is unlikely to explain the entire observed effect.
It nevertheless remains critical to assess the risk of bias of NRS. While the Newcastle-Ottawa scale (as used in both of these SRs) is a widely used instrument to evaluate risk of bias in NRS, it has critical limitations that the recently developed Risk Of Bias In Non-randomised Studies – of Interventions (ROBINS-I) seeks to overcome . ROBINS-I evaluates NRS by using a standardised comparison to an RCT (i.e. target trial) . In this way, ROBINS-I captures the bias inherent to studies without proper randomisation or allocation concealment, namely the lack of a balance of known and unknown confounders and selection of participants. ROBINS-I allows users to fundamentally start all studies at the same quality level, providing the transparency requested by some SR authors conducting SRs of NRS.
While ureterorenoscopy before nephroureterectomy may indeed increase the risk of intravesical recurrence as the authors suggest, additional exploration would be needed to make a statement about the causality of the relationship. Guo et al.  conducted sensitivity analyses to describe the potential for bias introduced by confounders of previous bladder tumour history and bladder-cuff management, thereby increasing our confidence that the observed effect may be closer to the truth. It seems equally important to note that Guo et al. found no increased risk in cancer-specific, recurrence-free or overall survival, which are other outcomes of potentially greater patient importance.
Understanding the inherent limitations of NRS, and placing their findings into appropriate clinical context are critical to the conduct of SRs. Moving forward, BJU International will continue to seek out the highest quality reviews that make use of the best, up-to-date methodology. We hope that these efforts will both serve as a beacon for the research community, but more importantly, result in improved evidence-based care for our patients.
Major foci for clinically oriented specialty journals are systematic reviews and meta-analyses. Systematic reviews have a preeminent role in guiding the practice of evidence medicine by addressing focused clinical questions in a systematic, transparent and reproducible manner. Defining criteria of a high-quality systematic review include: an a priori registered protocol, a comprehensive search of multiple sources including unpublished studies (to avoid publication bias), an assessment of the quality of evidence that goes beyond study design alone, and a thoughtful interpretation of the findings. Systematic reviews inform clinicians and patients at the point of care, form the foundation of evidence-based clinical practice guidelines, and help shape health policy . They also find frequent citation and can raise a journal’s impact factor. There is therefore more than one good reason for journals to care about the quality of systematic reviews.
Meanwhile, a study in this issue of the BJUI  shows that the methodological quality of systematic reviews published in the urological literature is modest, varies substantially, and has failed to improve over time. This contrasts to randomised controlled trials’ reporting quality that appears to have improved substantially over time, probably due to increased awareness among clinical researchers, urology readers and journal reviewers [4, 5]. The study  used the Assessment of Multiple Systematic Reviews (AMSTAR), a validated 11-item instrument, to measure the methodological quality of systematic reviews with higher scores reflecting better quality.
The authors  surveyed four major urological journals and compared the periods 2013–2015 to 2009–2012 and 1998–2008. Despite a dramatic increase in the number of systematic reviews published each year, methodological quality has stagnated with mean AMSTAR scores ± standard deviations of 4.8 ± 2.4 (2013–2015; n = 125), 5.4 ± 2.3 (2009–2012; n = 113) and 4.8 ± 2.0 (1998–2008; n = 57). The average systematic review therefore has deficits in over half the 11 AMSTAR criteria and is of only modest quality thereby undermining our confidence in their results. Although the mean AMSTAR score of 5.6 ± 2.9 for 25 systematic reviews published in the BJUI in 2013–2015 compared favourably to similar studies in other leading urology journals, the difference was not statistically significant.
What are we going to do about it? Inspired by these findings, the BJUI is launching a new initiative to raise awareness for the issue of methodological quality of systematic reviews among its readership and raise the bars for its contributors. Future systematic review authors will be asked to submit an AMSTAR-based checklist to provide enhanced transparency about its methods that will be reviewed as part of the editorial review process. These include documentation of an a priori written protocol and ideally, registration of the systematic review through the Cochrane Collaboration or the Prospective Register of Systematic Reviews (PROSPERO). Such a protocol should outline all important steps of the review process including the definition of outcomes, study inclusion and exclusion criteria, details about the literature search, study selection and data abstraction process, analytical approach including planned sensitivity and subgroup analyses. Authors should also rate the quality of evidence looking beyond study limitation alone by using an approach such as the Grading of Recommendations Assessment, Development, and Evaluation (GRADE), which recognises such additional domains such as imprecision, inconsistency, indirectness and publication bias . Critical steps of the systematic review process should be completed in duplicate to guard against random and systematic error and authors should provide readers with the information about who funded the studies included in the review, as well as their own potential conflicts of interests. To guard against publication bias, systematic review authors should also search for ongoing trials and unpublished studies through registries and abstract proceedings.
It is understood that the methodological handiwork that goes into the planning, execution and reporting of a systematic review do not assure clinical relevance or newsworthiness, nor does it address any issues surrounding the limited quality of studies that the review may be summarising. However, it is nevertheless a sine quae no to assure readers that they can be confident of the results. The new BJUI initiative will raise awareness for the issue of systematic review quality by providing a summary AMSTAR score to accompany each article. We hope that with this initiative we will provide a beacon for other specialty journals to follow, with the goal of raising the bar for all published systematic reviews and ultimately leading to improved patient care.
Department of Urology, Minneapolis Veterans Administration Health Care System and University of Minnesota , Minneapolis, MN, USA
In their systematic review and meta-analysis, Grasso et al.  address the question of whether posterior muscolofascial reconstruction (PMR), the so-called Rocco stitch, positively affects urinary continence after radical prostatectomy. The relevance of the question to this structured form of inquiry is that individual studies to date have been inconclusive. We recognize Sir Archie Cochrane, who gave his name to the Cochrane Collaboration that pioneered the methods for conducting systematic reviews, for emphasizing the critical importance of looking at the entire body of evidence in a structured manner when seeking to answer a clinical question . In the present study, which included both randomized controlled trials (RCTs) and observational studies of variable methodological quality, a favourable impact of PMR across all postoperative time points (3–7 days, 30 days, 3 and 6 months) was observed. The effect was most pronounced early on at the time of catheter removal, when the patients undergoing PMR were nearly twice as likely as the control group (risk ratio 1.9; 95% CI 1.3–2.9) to be continent, thereby suggesting a major benefit of this approach. It should be noted, however, that this analysis was dominated by the observational studies, particularly retrospective observational studies, which offer the least degree of methodological rigor.
Even more important, therefore, than the act of pooling across studies is the rating of the quality of evidence for the body of evidence on an outcome-specific basis. Based on the GRADE approach, which has become the most widely endorsed framework for rating the quality of evidence, we would initially place a high and low level of confidence in a body of evidence drawn from RCTs and observational studies, respectively . As a result, one might plan a separate analysis of those two groups of studies first, and only move to pool them if their results were similar. In this case, the results from the RCTs and observational studies were different, with prospective and retrospective studies reporting larger, probably exaggerated effect sizes; however, it is also understood that other aspects such as study limitation (risk of bias), inconsistency, impression, indirectness and risk publication bias may lower our confidence in the effect estimates from RCTs . Focusing on the body of evidence from RCTs alone (Table 1) we have ‘moderate’ confidence that PMR may not improve early continence at the time of catheter removal. Similarly, the few RCTs that contributed to the assessment of continence at later timepoints do not provide evidence that continence is affected favourably, although our confidence for those outcomes is only ‘low’ or ‘very low’, suggesting that future trials may change these estimates of effect. Meanwhile, it should be noted that none of the RCTs appeared to provide information on the potential downsides of PMR, such as rates of urinary retention or bladder neck contracture. As a result, enough uncertainty remains to state that the jury on PMR is still out; this is consistent with the authors’ call for a future high-quality trial, which is reportedly ongoing. While PMR is already widely used by open and robot-assisted prostatectomy surgeons around the globe, this example sheds light on current evidentiary standards of surgical innovation. Following the IDEAL recommendations, it would be much preferred if the urological community committed to well designed trials for novel surgical approaches and device-dependent interventions up front, before moving to widespread dissemination .