Keywords

bibliometrics; h-index; random forest; citations

Location

Session G2: Data Mining for Environmental Sciences (s-DMTES IV)

Start Date

17-6-2014 9:00 AM

End Date

17-6-2014 10:20 AM

Abstract

We assessed all papers published in two key environmental modelling journals in 2008 to determine the degree to which the citation counts of the papers could be predicted without considering the paper's quality. We applied both random forests and general additive models to predict citation counts using a range of easily quantified or categorised characteristics of the papers as covariates. The more highly cited papers were, on average, longer, had longer reference lists, had more authors, were more likely to have been published in Environmental Modelling and Software and less likely to include differential or integral equations than papers with lower citation counts. Other equations had no effect. Although these factors had significant predictive power regardless of which statistical modelling approach was applied, unknown factors (presumably, research quality and relevance) accounted for the majority of variability in citation rates.

 
Jun 17th, 9:00 AM Jun 17th, 10:20 AM

Predicting citation counts of environmental modelling papers

Session G2: Data Mining for Environmental Sciences (s-DMTES IV)

We assessed all papers published in two key environmental modelling journals in 2008 to determine the degree to which the citation counts of the papers could be predicted without considering the paper's quality. We applied both random forests and general additive models to predict citation counts using a range of easily quantified or categorised characteristics of the papers as covariates. The more highly cited papers were, on average, longer, had longer reference lists, had more authors, were more likely to have been published in Environmental Modelling and Software and less likely to include differential or integral equations than papers with lower citation counts. Other equations had no effect. Although these factors had significant predictive power regardless of which statistical modelling approach was applied, unknown factors (presumably, research quality and relevance) accounted for the majority of variability in citation rates.