How do you know a surgeon in training is ready to operate independently?

Washington University in St. Louis
St Louis, Missouri
DOI: 10.18258/9101
Raised of $4,000 Goal
Funded on 6/28/17
Successfully Funded
  • $4,310
  • 107%
  • Funded
    on 6/28/17



Participants: Trainees routinely record video of their cataract surgeries.  All eligible residents and fellows in the Department of Ophthalmology and Visual Sciences will be invited to release their de-identified videos for assessment by lay and expert raters.  All trainees who perform cataract surgery during the month of June 2017 will be invited to participate. IRB approval and consents will be obtained. Participants will submit their videos to the study coordinator to be de-identified for rating.  

Video collection and editing: 80 videos will be available for study. 30 will be collected in month 1 from any resident who operated during June 2017. An additional 50 videos will be collected over the course of the next 12 months for the PGY4 residents. Because residents rotate on more or less surgically busy surgical rotations at different times during the year, we will collect videos by case number (30, 60, 90, 120, 150, 180) rather that by date. Videos must be 10 minutes or less in duration for analysis in the CSATS platform. All videos will be reviewed by the coordinator and edited to include only the critical portion of the case, defined as removal of the cataractous lens from the eye. De-identified videos will be submitted to the surgical experts and to CSATS for lay rater review.

Video scoring: We will use a modified Objective Structured Assessment of Technical Skill (OSATS). This tool includes the four elements of assessment form the OSATS (Economy of movement, respect for tissue, flow of operation, instrument handling) and includes a fifth element from the GRASIS assessment tool: microscope centration. Each of the five categories is graded on a 5-point Likert scale. Both lay raters and experts will use OSATS to assess the de-identified surgical videos.  Based upon analyses of other assessments of surgical skill, CSATS recommends 30 lay raters to achieve agreement with expert raters. We will compare the overall and sub-scale scores by lay raters (n=30) and by surgical experts (n=3) for each video.

Raters: We will submit de-identified cataract surgical videos to CSATS to obtain OSATS assessments by lay raters (n=30). Three surgical experts (2 WU faculty, one private practice ophthalmic surgeon) will evaluate the same de-identified videos locally using OSATS. 

Statistics: This pilot study will determine the feasibility and validity of lay vs. expert (gold standard) OSAT scoring as determined by intra-class correlation coefficients (95% confidence intervals) for all Aims. A power analysis based on prior studies using lay raters to grade videos of simulated robotic surgery suggests that in our proposed study  we will have greater than 90 percent power to detect a correlation of 0.7 with 80 videos.

AnalysesAim 1:To determine whether there is agreement between lay raters and experts OSATS overall (range 5-25) and sub-scale (range 1-5) scores for cataract surgical videos, we will use intra-class correlations (ICC). ICC will be computed using all 80 videos acquired during the 12-month study period. 30 of these will be collected in the first month and 50 will be collected in the next 11 months.  We have selected ICC as a measure of association because ICC takes into account the mean score on a Likert type scale, as well as the variability of the data, which the Pearson correlation does not.  All analyses will be computed with SAS V9.4.  Aim 2: To determine whether agreement between lay raters and surgical experts on OSATS overall and sub-scale scores is consistent over the range of surgical skill (PGY2, PGY3, PGY4, fellow), we will use a mixed model (PROC MIXED of SAS V9.4). The consistency of agreement will be tested using an interaction term for surgical experience of trainee and rater type (lay or expert). The analysis sample for Aim 1B is 30 videos of cataract surgery of PGY2, PGY3, PGY4 and fellow trainees acquired during the month of June 2017.   Aim 3:   To determine whether agreement between lay raters and surgical experts on OSATS overall and sub-scale scores is consistent over a 12-month period as surgical skills of trainees improve from baseline to month 12, we will use a repeated measures model using PROC MIXED of SAS V9.4.  50 videos, 2 videos per resident (n=5) at 5 collection points will be evaluated. Consistency will be tested using an interaction term for experience as measured by case number from baseline and by rater type (lay/expert).  


The greatest challenge we expect to encounter is the time limitation for the video. All videos must be edited 10 minutes or less for submission. An expert surgeon can complete this step in 1 minute. But not all beginning surgeons can complete nuclear removal in less than 10 minutes. The reasons for this are multiple. First, they are less efficient and simply take longer to execute maneuvers. Second, they tend to be more timid and tend to make many small moves rather than fewer more efficient moves. Third, the attending surgeon (who is present for all trainee surgical procedures) will ask the trainee to pause while he/she explains how to better manipulate the tissue to accomplish the task at hand. Thus there will be segments on the video where no work is happening at all. To keep videos at 10 minutes we had two major options.  We could submit the first 10 minutes of the nuclear removal and the video would end prior to completion of the procedure. Alternately we could edit the video to eliminate the frequent long pauses of no activity in the early learner videos. As the intent of the study is to grade surgical technical skill we preferred the edited video assessment. However, the time to completion is a second factor that should be considered in resident skill acquisition. Therefore we will record the total duration of time it takes to complete the task as a separate indicator of surgical skill that we can analyze as a post-hoc factor. Because we do not know how many videos will require editing and over what length of training nuclear removal will take >10 minutes, we could not estimate a power for this element and therefore will not include it as a primary analysis. These data will be useful as hypothesis generating for the next phase of study.

We did not want to bias the study by allowing residents to "select" the videos they submit for analysis: we selected cases 30, 60, 90, 120, 150 and 180. In this way we have a routine progression of experience for each trainee, they will represent a "typical case" at that level of training, and it will be standardized across all trainees. The second issue is related to the randomness of surgical complications. Complications during nuclear removal are uncommon, but can occur. If a resident happens to have a complication on case #150, his trajectory of skill acquisition might appear to be worsening with time. In the event of an unexpected complication, we will ask the resident to submit a video of the case immediately prior to the requested case. We will not request the case after, because when trainees experience a complication it can affect their confidence and they tend to be more tentative moving forward for a few cases. If both cases contain a complication, the appropriate case number video will be graded with the complication included in the assessment.

Pre Analysis Plan

Aim 1: are lay raters equivalent to expert raters in scoring surgical videos of nuclear removal in cataract surgery? If Aim 1 confirms an acceptable ICC between lay and expert raters, our study could be among the first to open the way for wide-spread application of CSAT platform and use of lay readers to evaluate surgical competence in other surgical specialties.  This will be only the second time that lay and expert rater assessments will have been applied to actual surgical experience (as opposed to assessment of skills in a simulated laboratory). Because the CSATS platform is being marketed commercially, it is imperative that these kinds of analyses to address whether the agreement that has been demonstrated for laboratory based skills is valid for actual surgical experience. If Aim 1 fails to meet the 0.70 threshold, our study will undercut the marketing claims of the CSATS company and thwart further use of this platform without appropriate validation studies.  Whether Aim 1 supports or refutes the claims of CSAT company, our results will have impact because our study has a more rigorous design (vs. wet and dry-lab simulator studies), a larger sample size (trainees=~13, videos=80), pre-planned and appropriate analyses (not Cronbach’s alpha or Pearson correlation) and longitudinal data.  

Aim 2 addresses whether the agreement between lay and expert raters is acceptable over the full range of surgical expertise from PGY-2 to fellows.  The Holst study sampled a range of training levels and found little difference in agreement over the full range.  Our study is strongly positioned to test this hypothesis in the cross-sectional sample.  No other study has examined lay and expert agreement as rigorously our study proposes to do. Furthermore, we will drill down to determine agreement for each of 5 sub-scales, to identify scales with low agreement.  When sub-scale agreement is low, some options include revising lay instructions or modifying the sub-scale. Because of its potential value to surgical training programs, we would avoid wholesale dropping lay assessment unless it proves hopelessly flawed or unfeasible. 

Aim 3 is the longitudinal study of resident skill development over 12 mos. This aim tests whether agreement between lay raters and experts is good enough to monitor improvement over time in the same trainee (i.e., are lay raters scores sensitive enough to monitor improvement. Are the lay raters as good as experts at all points on the learning curve? Are they only in agreement at specific points along the training curve (is there a ceiling or basement effect of lay rater ability)?  With these data we will be able to develop learning curves for individual trainees over 12 months and determine over what range lay raters may be useful to monitor their surgical skill progression. 


This project has not yet shared any protocols.