Lies, Damned Lies and (Misused) Statistics…

Mark Twain is credited with stating that there are three kinds of lies, "Lies, damned lies and statistics". As a statistician, I often hear people quoting this to me to which I often point out that Twain was only partially correct. There are three kinds of lies, "Lies, damned lies and lies told by misusing statistics". It appears that once again in Australia, vital funding choices are about to be made through the use of a faulty statistic, the citation count.

During this past week I’ve been happily working away with my students, teaching them all I can about the wonderful world of mathematics and statistics. In particular, we’ve looked at what inferences you can draw from a set of descriptive statistics. One particular example in their textbooks asks the students to compare the pulse rates of 21 adult males and 22 adult females and make an inference about male and female pulse rates. As expected from the data supplied, these students conclude that females, in general, have higher and more variable pulse rates than the males. Whilst that is a correct conclusion from the supplied data, I always ask the students if there is anything they might be concerned about with regards to this data and how it was collected.

Statistics is the science of uncertainty, but there is no uncertainty in the mathematics or logic behind statistical techniques. The uncertainty is all in the data and how it was collected, what controls were in place to limit confounding and how accurately these measurements were taken. For example, if you go and stand outside a supermarket asking shoppers to fill in a survey, your results will be biased towards the demographics of people who shop at that supermarket. Most scientists learn about bias and confounding in their early undergraduate careers and therefore know that you can only generalise your findings to the population of interest. Of course, to be able to generalise, every member of the population of interest needs to have equal probability of being selected in your sample.

These same scientists are also taught that you cannot measure subjective properties, like happiness, with objective measures. The best you can do is to use pseudo measures or Likert scales. You can, of course, make inferences from these pseudo measures, however, you have to be extremely careful about what kind of inferences you make. For example, if you ask students at the end of their course how satisfied they were with the quality of teaching, you would have a hard time inferring anything except that this cohort of students felt more satisfied or less satisfied with the teaching of this particular course compared to their other courses in this semester. You could not say that this survey objectively measured the quality of the teaching provided to the students.

So we come to the reason for this posting. Thanks to Dr Mewburn (The Thesis Whisperer), I was pointed to an article in The Age about the supposed lack of quality of Australian research. The first sentence of this article is enough to turn any Australian scientist’s blood to boiling point.

“MORE than half of Australian scientific research papers are below a key world standard, leading to calls by the country’s chief scientist for a review of how research funds are distributed.”

Wow! More than half below a key world standard! But what is this standard that Australian researchers are doing so poorly on? Is it the number of spelling errors? Is it using the right statistical tests for analysing their data? Is it meeting the word count or formatting standards for the journal? No, it is the citation count. In other words, how often a paper written by an Australian author is cited by other papers.

The article continues to describe the apparently woeful Australian situation:

“Although the citation average for Australian scientific papers overall was about 13 – above the world average – Professor Chubb said 12 to 17 per cent of our research was never cited.”

Hold on! If the citation average for these papers is above the world average, what is the problem? Why the doom and gloom headline?

"He said Australia should compare itself to the best in the world. The average rate for Europe is 13.5."

Okay, I see it now, we have to do better than Europe. Now let me think; according to Wikipedia (I know, not the best or most reliable source, but…) Europe has a population of about 739 million compared to Australia’s approximately 23 million. That seems like a fair comparison (not!). If we assume that around 5% of the population work in research roles (a pretty generous estimate) and produce an average of three peer-reviewed papers per year, then the Europeans would produce about 37 million research articles a year compared to just over 1 million in Australia. Hmm… now if the each paper has equal probability of being cited, then a randomly selected paper has a 1 in 38 chance of being Australian and a 37 in 38 chance of being European. Not a level playing field by any stretch of the imagination.

Most worrying about this article is the concluding quote from the Australian chief scientist, Professor Ian Chubb:

"Criteria such as quality, merit and interest do have to be fairly stringently applied … we have to see if our systems are working in our favour. How much money is distributed? How much is available? They are valid questions to ask, though the answers will be very controversial."

So research funds are going to be allocated based upon the immeasurable characteristics of quality, merit and interest, prior to research being carried out. How do we know what the quality of the research output, or the merit or the interest is until after the research has been performed? Equally, scientists at any level, and in particular Australia’s Chief Scientist, should know that past performance is no guarantee of future success. You only have to look at the music industry to see this truism demonstrated time and time again by the "one-hit wonders". Basing funding decisions on past performances or the citation count is doomed to failure, for it will, at best, only maintain the status quo. Funding for research is always risky, it is a gamble on probable success and should therefore be based on objectively measurable characteristics or better yet, listen to research proposals and determine which have a sound basis, well designed protocols and clear research questions. Of course, I would rather see that researchers did not actually have to fight for funding and students didn’t have to fight for quality education. I can only hope that one day the bean counters and politicians will realise just how it is their job to SUPPORT research and education, not the job of researchers and educators to make their jobs easier. After all, today’s students will one day be responsible for keeping us all alive and comfortable in our old age. Surely we want to ensure that they really know what they’re doing? Equally, today’s research activities will lead to tomorrow’s inventions. Surely we want to give researchers the best chance of success?

Citation Count: What does it measure?

Let’s take a quick look at what the citation count actually measures. Firstly, it measures the popularity of a particular paper. This popularity can be due to one of many factors, including:

It is such a terrible paper with so many errors that everyone cites it as an example of what not to do
It draws interesting conclusions, but is so narrow in scope it hasn’t included large sections of currently open research questions
It was written by the author’s supervisor, head of department, dean, grant committee member or someone to whom the author might be trying to "grease the wheels"
Everyone else cites the paper, so the author feels they must, just to get published (or they are told to by the anonymous reviewers)
It prompts so many more research questions
It refutes or supports previous findings or the findings of the author’s own work
The topic is "cool" or "funky" (e.g. new ways to work with iPads) and so author’s cite it as motivation/justification for their own studies

Of course, this list is not exhaustive, but should give you some idea of why some papers are cited. A natural question to then ask is, "What could be a reason why a paper is not cited?", to which one answer might be that this paper is the definitive last word on a topic and no further research in that area is required, but there are other reasons, like exposure.

Exposure of a research paper to a wider audience is also indirectly measured by the citation count. If a paper is published in Nature or The Lancet it is likely to be seen by a large number of other researchers (and hence cited), whereas if it is published (or even self-published) in an obscure place, it is not likely to be seen by many, especially those outside the usual readership. Publishers themselves spend a great deal of effort in promoting their content, but can also prevent access to many great articles through the use of their paywalls. This too can lead to a paper not being available to another researcher to read, let alone cite. (One more reason to bring down the paywalls since they do not pay for the author’s work, but I digress.)

Thirdly, the citation count does not include all those times a paper is read by researchers or even cited in unpublished works (e.g. undergraduate/postgraduate essays and assignments). The citation count only includes the number of times a paper is cited in a published work. I know that at least one of my early peer-reviewed published papers does not appear in a Google Scholar search and hence it is difficult to ascertain an accurate citation count for it, yet I know that it has been read at least 125 times in the past two years and cited in several published works (some of whom do appear in Google Scholar searches).

Fourthly, the citation count has a natural bias towards older papers. It is unreasonable to expect that a paper published last week would have a high citation count this week. Good (and bad) papers take time to digest and respond to, along with the fact that the review process for some journals can take months to complete, leading to a natural gap between a paper’s publication and the first time it is cited by another (independent) author. Older papers, in particular seminal papers, therefore have an advantage over more recent papers in regards to their citation counts. It is not unusual for a paper to have a citation count of zero for at least a year after publication, then have a large spike in citations, followed by a falling away. Alternatively, a paper may take time to be “discovered” by researchers and so there is a gradual increase in the number of times the paper is cited.

Finally, there is one other issue that has an effect on citation count that is very hard to remove, particularly for Australian authors. Researchers tend to be somewhat cliquey and stick with those they know. Despite the advances in technology that make communication between the continents lightning fast, some research requires the physical presence of all parties involved. For those in the European sphere, this is not that difficult as most places can be reached via an overnight train journey or a short plane flight. For Australian researchers, even travelling from Sydney to Perth can be an all day affair, let alone thinking of travelling to Europe to collaborate. As such, it can be hard for Australian researchers to become well known to their European peers and hence their papers are less likely to be cited. This is especially true when another author from a European or US institution has a similarly themed published paper.

How do we measure quality?

Why is it that some people are prepared to pay extra money for a particularly branded product? Most marketing executives will tell you the price difference is all about this mystical thing called quality. If you ask them to point to the "quality" on the item, they’ll generally say something about "quality is about the feel" or the “"after-sales service", they cannot point to a spot on the product (other than the branding) that is the "quality". But if I am going to pay extra, I want to know what this "quality" thing is that I’m paying for. I want to know if paying double the price would give me double the quality. So how do we measure it? The answer is, we can’t measure it objectively.

There are plenty of indirect measures of quality. Product life, maintenance cycles or the amount of wastage of consumables (like ink or toner in a printer), but no direct measures. This is because quality, like beauty, is in the eye of the beholder. One person might think that a Picasso is an excellent example of beautiful and quality art, whereas another might think that it looks like something a four-year-old created at kindergarten.

So how can we measure the quality of research? Surely it is in the ability of the research to answer the research question investigated, not how many times it is cited.

What are your thoughts?

Originally published Feb, 2013 | Last updated: 2 November 2024

Citation Count: What does it measure?

How do we measure quality?

Related