Lab Note 12: The time factor
Welcome back to our experiment! Grace and I presented our work at the Educating Educators Meeting in Austin, TX in January and we want to share where we stand with regard to our data analysis.
In Lab Note 11 we showed you that while the crowd was highly CORRELATED with the experts, the actual assessments were very different. This created a bit of a problem for us if we were going to use crowd scores to grade resident surgical skill. The crowd scores were simply too compressed and did not provide enough difference between scores to discriminate the trainees from the attending surgeons.
So... we looked at other elements in the data that may help us make an assessment of the video that predicts expert score. We examined time. How long did it take the surgeon to complete the surgical task? Four of the five categories assessed by the OSATS tool that we used to grade the videos (Lab Note 1- Video Grading Tool) will impact the total surgical time (microscope centration, economy of movement, "flow of operation" and instrument handling). So we theorized that the higher the surgeon's skill, the shorter the time it would take to complete the task.
Indeed, it did.

This result suggest a few things: it suggests that time (an easily captured measurement) should be explored as a potentially robust indicator of surgical skill, either independently or in combination with other metrics. It also validates the use of surgical time as an indicator of surgical skill in studies aimed at assessing the impact of educational interventions on resident surgical skill.
Let's explore this a little more. For the sake of this discussion we are going to make a (reasonable but not validated) assumption. We will assume that a score of 3 in each of the 5 categories on the OSATS (total=15) reflects the skill of a competent surgeon. The selection of a score of 3 for categories on the OSATS to define competence is not arbitrary-- prior studies of surgical competence in other surgical disciplines have used this threshold. Thus a surgeon who scores 15 or higher would be deemed proficient.
With a threshold for competence identified, we can examine how well (or not) the time a surgery takes can predict proficiency (as defined by a score of 15 or higher).

When we look at the video length data through this lens we find the following. Videos taking longer than 400 seconds never achieve a score of 15 or higher (red box) and therefore do not meet proficiency. Videos that are shorter than 250 seconds all achieve a score of 15 or higher (green box) indicating proficiency. Videos that take between 250 and 400 seconds have an equal likelihood of being above or below a score of 15 (yellow box) and thus proficiency cannot be determined by video length alone within this range. If we used the same definition of proficiency to look at how well the crowd characterizes the videos, we find that all of the crowd assessments would predict proficiency. But we know from the expert scores that only half of the surgeons were proficient.

In the next lab note we will explore ways to transform the crowd data to improve its ability to predict surgeon competence.
0 comments