Thoughts on Data from Two Decades Ago

As an educator, I cannot go to a meeting where “the data show” is used in one version or another. I appreciate the role of data when we draw conclusions. This character of my nature is deeply embedded in my professional life; it started with my undergraduate studies in science and continued during my life as an occasional education researcher.

I am troubled by our current fascination with data, however, and it is more troubling as I clean out old files and read the papers I wrote 20 years ago as a doctoral student preparing for my dissertation.

One sentence is particularly striking: “The intense interest in data related to school performance can inform professional practice in a meaningful manner but also can burden educators by placing too much data or data of dubious validity before decision makers.”

There are two issues in this sentence I wrote in August 2027. First, the amount of data. Second, the validity of the data we do have.

Although I am not working in K-12 education, I do have colleagues who are, and they tell me distressing stories of the diagnostic testing that teachers administer one a regular basis. One of my now-retired teacher colleagues tells the story that she decided to retire when the school administrators were going to insist first and second grade teachers administer “reading preparation” diagnostic tests. (She was a very skilled reading teacher and insisted every child in her class was a reader when they left her classroom.) Her reasoning was “I can tell if students can read by reading with them. Testing will take me and my students away from reading, and the results won’t tell me anything I don’t already know.”

She did use diagnostic testing for students who she suspected had specific deficiencies and would work with special education teachers as needed. But when students were making progress within what she considered a normal range, she continued her teaching without what she considered unnecessary data.

Because of the popularity of “data” in decision-making, decision-makers accept it from any source. This makes sense as they are expected to have data, but it leads to them accepting is uncritically. When using data in research settings (and we are trying to mimic research is all decisions where we use data), we pose a question and define the data that will reasonably answer it. When we conduct research, we typically work in groups, and one purpose of the group is to challenge the data and methods we select.

In the data I see used daily in education, I see data selected because it is convenient or data that makes the point the leader is trying to justify. I understand using data that is convenient; we don’t have the resources to collect original data all the time, and if we are interested in human subjects, we cannot impose data collection on them too frequently. (The fact that we collect data affects then data we are collecting and it affects later data collection; “every time you test reading skills you affect their reading,” said my reading teaching colleague.)

In education, we have also forgotten that there is a group of statistics that help us differentiate real effects from chance. Without such statistics in our quantitative data, we can make all the clams we want, but no one should believe them.

When the members of your research group challenge you on your data, they are also challenging you to make sure what you are measuring will actually measure what you intend, and they are challenging you to make sure the statistics are going to measure what you intend. We rarely ensure these tests of validity in the data we use as educators. When we are using propriety tests, we just assume the vendors are being honest in their claims about the tests.

In the decades since I composed the sentence I quoted early in this, things have not improved. We have too much data, we never challenge it to see it is what it claims, and our analysis is weak. But, let’s carry on with it.