An InContiNet Critical Review:
P.A. Burns, K. Pranikoff, T.H. Nochajski, E.C. Hadley, K.J. Levy, and M.G. Ory.
A Comparison of Effectiveness
of Biofeedback and Pelvic Muscle Exercise
Treatment of Stress Incontinence
in Older Community-Dwelling Woman.
Journal of Gerontology, 1993, Vol. 48, No. 4, (p. M167-M174)
Preliminary Report by John D. Perry, PhD
As most of you know, the big federally-funded 135-subject study by Pat Burns et al is the major, if not only, study being used to fight Medicare and insurance reimbursement for biofeedback treatment of urinary incontinence. It was cited by Aetna (1994) http://www.incontinet.com/aetna.htm and BC/BS TEC (1997) http://www.incontinet.com/bcbstec.htm, among others.
I have completed a re-analysis of Burns' published data, and I believe I've found the sources of our (originally, THEIR) problems. They turn out NOT to be esoteric, just hard to discover, due to the C.Y.A. mode in which the research reports were written.
The detailed analysis and supporting citations will be included in my final report, which I hope to be able to publish soon on this website. In the meantime, I wanted to share highlights with those of you who are already actively engaged in political action to correct this injustice. If you feared that Burns' data looked bad for biofeedback, look again. It actually looks bad for Burns et al.
As a reminder, Burns got a 54% symptom reduction rate in a Pelvic Muscle Exercise (only) group, and a 61% reduction in a PME plus "biofeedback" group; this was "statistically insignificant", and therefore evidence that "biofeedback" doesn't contribute anything to PME alone. Therefore it should NOT be reimbursed, according to Aetna and BC/BS.
My major conclusions are as follows:
(1) Burns' results should ONLY be generalized to a highly specific geriatric population, and
(2) they refer only to the very primitive biofeedback "protocol" she used -- a protocol which is far below the current standard of biofeedback practice, at least in America, and
(3) Burns used an inappropriate "drug treatment" model to evaluate biofeedback effects, and even then did not apply that model correctly.
1. Burns' work has no bearing on the treatment of urinary incontinence in general populations of women suffering from urinary stress incontinence. Burns used a highly selected subset of 135 subjects, out of a recruited pool of 1,042 possible subjects. The selected subjects were markedly different in many respects from the majority of patients seen in out-patient Continence Clinics, and even from their geriatric cohorts. This will be discussed below.
Most critics of biofeedback have failed to appreciate the significance of Burns' well-chosen title -- she studied only "Older Community-Dwelling Women"; she did not study a typical incontinence clinic population. Her work was published only in geriatric publications. The extent to which it can be generalized to the entire population of "biofeedback for incontinence" patients will be discussed.
This mis-application of geriatric research to women in general has happened once before. In 1986 Teh-wei Hu, PhD, published "The Economic Impact of Urinary Incontinence" -- in a volume called "Clinics in Geriatric Medicine".
For nearly a decade people were mis-quoting his $8 billion figure as the "cost of incontinence" in America, based on his title alone, when in fact the figure was clearly labeled in the text, if not the title, as "Total Economic Costs of U.I. AMONG THE ELDERLY! The total cost for all ages has never been estimated, but it would be several times greater than $8 Billion.
2. Burns' unique biofeedback protocol was markedly inferior in (a) intensity of training, (b) quality of reinforcement,and (c) standards for behavioral therapy to what was already commonplace at the time the research was conducted (over 10 years ago), and it is vastly inferior to what is being done today.
Part of the fault lies in the lack of training in biofeedback procedures for the biofeedback researcher, who commenced the study in 1983, prior to the availability of workshops and seminars (which started in 1986) in the use of biofeedback to treat incontinence. The biofeedback researcher had received no formal training in biofeedback, and HAD failed to obtain the next best training, BCIA Certification in biofeedback in general, despite the fact that BCIA certification was EXPLICITLY promised in the federal grant application.
On the other hand, many presentations on EMG biofeedback therapy had been made at Biofeedback Society of America conventions since 1981, so she should have been familiar with current practice. In her May, 1984 grant application she cited membership in BSA since 1983.
3. Burns attempted to use a strict "drug research" model for this study, which she even called a "trial". She went to great lengths to provide random assignment to treatment conditions. Subject instructions were very carefully standardized in a pamphlet describing the at- home pelvic muscle exercise program (which both treatment groups followed). Care was taken to ensure that the biofeedback group did not get more "exposure" to the staff at weekly biofeedback training sessions.
Those who are familiar with the biofeedback literature know that in 1986 Shellenberger and Green published "The Ghost in the Box", in which they provided very convincing arguments about why the drug research model is not appropriate for biofeedback research. The reliance on a fixed number of sessions, instead of training to criterion, however long it takes, is only one example. But it seems unfair to fault Burns for not having read a 1986 book in 1984.
In addition, one of the reasons why drug research models continue to be advocated, in spite of the lessons of "Ghost", is that they APPEAR to work. So we will allow Burns her choice of models, and address the question of how adequately she applied her selection.
The most significant distinguishing characteristic of Burns' population was the uniform presence of serious pelvic muscle impairment. This is shown by comparison of her 135 study subjects with her own PILOT study subjects (n=30) in the 1984 project.
Among the Pilot Study Subjects, 36% had cystoceles and 23% had rectoceles. (This is probably typical for clinic populations.) But in the larger study, 74% had cystoceles and 42% had rectoceles. In other words, the final study subjects has 106% more cystoceles and 83% more rectoceles than the Pilot Study subjects. Thus they were a uniquely disadvantaged population, whose results cannot even be generalized to other "62 year olds".
Unfortunately we cannot ascertain if these differences in genital relaxation made the incontinence problems of the pilot and final study different, since in 1984 Burns rated subjects on a "questionnaire", not on leaks. By 1988, however, Burgio's "percent symptom reduction" statistic had been universally accepted, and Burns used it then.
[Strangely, the average age of women in the Pilot Study is NOT given, either in the 5 1/2 page summary in her grant application, nor in the preliminary published report in Nurse Practitioner, February, 1985. Since those patients were referred by colleagues, rather than recruited by ads, it seems possible that they were considerably younger than an average of 62. Most out-patient clinics report age ranges in the upper 40s to mid 50s for off-the-street patients. Incontinence IS more prevalent in upper age brackets, but it is more troublesome to younger, active women.]
By screening out all manner of organic illness and complicating co-conditions, Burns was left with a unique sample of elderly women who were distinguished on only one variable -- they suffered extremely poor pelvic muscle strength; much poorer, in fact, than the majority of older women!
This severe deficit in pelvic health among the study participants is further documented by Burns' data on their muscle strength, as measured by their EMG scores. Comparing Burns' Pilot data with her final data, we find that the 30 pilot subject were considerably stronger than 135 final subjects. This is not only important in its own right, but is also responsible for her "non-significant" results. It turns out that Burns' did a power analysis of the Pilot results, and concluded that 40 subjects in each of the treatment groups (PME alone and PME plus) would be required to produce significant results. However, that analysis assumed that the final study would produce group differences of the same magnitude as the Pilot differences, and because the subject populations were so different, the 40+40 formula failed to achieve significance. [And therefore, we are being denied reimbursement.]
Comparison of the Pilot and Final study subjects at the start of treatment shows that the Pilot subjects were considerably stronger to start with:
1. Quick (Flick) Contractions "before" therapy
| Group | Pilot | Final | Difference |
| PME Only | 5.21 | 2.9 | 80% stronger |
| PME PLUS | 4.58 | 3.5 | 31% stronger |
| Controls | 4.09 | 3.4 | 20% stronger |
Since the groups were almost equal in size, a simple average shows that the Pilot subjects began therapy with an average of 44% strong muscles -- on flicks.
2. Sustained (hold) Contractions "before" therapy
| Group | Pilot | Final | Difference |
| PME Only | 5.90 | 1.7 | 247% stronger |
| PME PLUS | 6.03 | 2.0 | 202% stronger |
| Controls | 4.75 | 1.8 | 164% stronger |
Again, an unweighted average shows that the Pilot study subjects began treatment with an average of 204% stronger muscles than the final study subjects. [I.e., the Pilot subjects were 3 times stronger.]
Even Burns admits that the "hold" scores are more important than the "flicks" in predicting continence, so it is easy to see that we are talking about BIG differences between the Pilot and its Final study. Differences that are big enough to bias the outcome.
[Those who follow the Perry Protocol and subtract the resting level from the contraction level should note that Burns apparently didn't do this, so her reported scores are ALL 1 to 2 microvolts HIGHER than the your normal scores. Resting levels were NOT reported by Burns.]
One further comparison will be drawn to show how fundamentally different Burns' final study subjects differed from her pilot study subjects. The following table compares the strength of the main study subject AFTER therapy with the pilot study subjects before therapy in the critical "hold" EMG levels:
| Group | Main Study AFTER | Pilot Study BEFORE |
| PME PLUS | 4.0 | 6.03 |
| PME Alone | 1.8 | 5.90 |
| Controls | 2.0 | 4.75 |
Although the PME plus biofeedback group showed the most therapeutic effect on muscle strength, the subjects remained substantially weaker even after treatment than the Pilot study subjects had been BEFORE treatment.
The differences were even more pronounced in the PME alone and Control group. In other words, Burns was dealing with an exceptionally weak population, even compared to her own Pilot study. And, as we will next show, her "biofeedback" protocol was not sufficiently robust to bring them up to even a normal patients' typical "before" level.
The Handbook also stated that, "In general, biofeedback therapy for pelvic muscle problems consists of the informal repetition of the same procedures used in diagnosis (above), with clinical insight and interpretation." It went on to say: "In our clinical experience, we usually spend 10 to 15 minutes reviewing the patient's at-home practice records, 10 minutes in formal diagnostic re-testing, and an additional 15 to 20 minutes in biofeedback training at each weekly session." That means about 25 to 30 minutes of biofeedback per visit -- and more if needed by the patient's poor performance on the evaluation.
Finally, the Handbook states that "The office practice is explained as a training and progress checking session, with the 'real work' to be done with a 'home trainer' EMG instrument, at home on a daily basis between weekly appointments."
[It should be noted that the formality of the weekly EMG Evaluation was based in part on the fact that it was reimbursed under 51785 and later 51784, whereas in the early days the supervision of the in-clinic biofeedback practice session was NOT reimbursed, so formal documentation of the practice was not needed. (PTs needed only the same level of documentation as other "neuromuscular reeducation" billings.)]
Burns deviated from the manufacturer's instructions in three important ways. First, she did not conduct the weekly EMG evaluation itself. [A modified form of evaluation was given to ALL research subjects before and after their total 8 week program.]
Apparently Burns' misunderstood the function of the weekly EMG evaluation, which is to make a formal assessment of the patients' week of at-home practice, in order to reward the patient for documented improvements in muscle contraction strength.
Without concrete printed test results, it is difficult to understand how Burns' subjects got sufficiently motivated from week to week.
The second deviation consists of a drastically reduced "set" of muscle repetitions, or Kegel exercises, at each coached biofeedback practice session. Whereas the sensor instructions call for "an additional 15 to 20 minutes in biofeedback training", Burns offered her subjects only 10 3-sec. flicks and 10 10-sec. holds. Ten flicks takes about 30 seconds, while 10 x 10 is 100 seconds, and assuming a rest of twice as long, 200 seconds, we have a total of only 330 seconds, or five and a half minutes of total biofeedback practice time -- which is considerably less than the "20 minutes" of practice claimed in the text. Recall that they did not do an EMG evaluation prior to the practice, either.
In other words, Burns' subjects got CONSIDERABLY LESS biofeedback than the manufacturer's recommendations, or than anyone else's recommendations, which probably is the largest single factor in explaining why her subjects showed so little improvement (compared to biofeedback subjects in other studies).
Finally, Burns' deviated from the manufacturer's recommendation in still another way; she did not allow her biofeedback subjects to practice at home with a "home trainer". In spite of frequently comparing her work to that of Arnold Kegel, she does not comment on this major deviation from Kegel's own biofeedback method.
In retrospect <see http://www.incontinet.com/comparison.htm> it is clear that the daily use of home trainers (even without office evaluations!) produces the highest results, whereas the method Burns' used (office instruments without home trainers) produced the worst results.
1. The Single Blind
Burns describes the study, and reviewers uncritically accept it, that this study involves a "single blind" design. Unfortunately, this seems to be a mis-application of the term. Double Blind, the "highest" and "purest" form of drug research, means that neither the researcher nor the subject knows which condition (drug or placebo, for example) they are receiving. Single Blind means that the subject doesn't know if they are getting the new drug or a placebo.What sense does "single blind" actually have in this study? Burns uses the term because the subjects didn't know what treatment options were available, except the one that they were receiving. But surely the subjects knew if they were doing plain PME or if they were getting biofeedback. Elderly women in western New York State in 1985-87 may not be the most sophisticated citizens, but they probably read newspapers and knew the difference between plain PMEs and biofeedback. To call this a "single blind" study is somewhat disingenuous.
2. Compliance In drug studies it is often assumed that the subjects actually swallowed the pill which they were given. In critical situations, this fact is confirmed by blood tests which verify the presence (or absence) of the treatment drug. In behavioral therapies, however, the best verification is contemporaneous diaries or logs in which subjects record their cooperation with study requirements. [Now, of course, this has been rendered moot by the use of data-logging EMG trainers. Instead of asking for the subject's diary, the researcher merely downloads the exact practice session data directly into the office computer each week.]In the grant application, Burns indicated that compliance logs would be required of all subjects, but no analysis has ever been presented for those logs. Are we to assume that the biofeedback subjects did the same amount of home exercise as the PME-only group? Even on days they went into the clinic? Studies of "compliance" have shown as little as 15% compliance in exercise groups. It would have been nice to know if Burns' subjects had the same problems.
3. Treatment Levels. The Analysis of Variance (ANOVA) was original developed to enable detailed inspection of fertilizer effects in agricultural research. The amounts of various fertilizers (10%, 20%, 30% nitrogen and 10%, 20% and 30% phosphorous, for example) could be systematically applied to identical parcels of land, and the resultant crops compared to determine which levels of which chemicals produced the most desire plants.In drug research, different dosage levels (5 mg, 10 mg, 25 mg) are often used to determine the most effective concentration for carefully matched patients. This is important because too much or too little of a treatment can produce poor results, whereas the "right amount" will produce good results.
Unfortunately, Burns' research does not include any dosage levels. Burns assumed -- with absolutely no empirical justification or basis -- that just ten 10-second holds a week was the "correct" treatment level, and all 40 biofeedback subjects got the same under-powered treatment. Is it any surprise, then, that her subjects proved her guess was wrong -- only a 61% symptom reduction rate?
To apply the drug research model correctly, Burns should have given, say, 10, 25, 50 and 100 10-second "Kegels" at each office session, to first establish the ideal treatment level.
In addition, Burns arbitrarily decided that all patients would be limited to a set of 8 weekly practice sessions -- again, without any empirical basis for such a decision. Patients should have been assigned to 6, 8, 12, and 15 week programs, in order to determine which produced the maximum gain in strength and reduction in symptoms.
In summary, Burns' use of an atypical population, a vastly inferior biofeedback protocol, and her failure to follow the requirements of her chosen design, disqualify this study from serious consideration, either in scientific discussions, or in national health care policy decisions.
This is the first draft of a formal critique which will eventually be published on the internet at: <http://www.incontinet.com/burns.htm>. This version was published on Wednesday, May 04, 2005.
As in all InContiNet "Critical Reviews", we welcome opposing viewpoints, and will publish relevant comments from qualified sources without cost. Contact DrPerry@incontinet.com for more information.
We Want to Know What You Think!