Keywords
bibliometrics; h-index; random forest; citations
Location
Session G2: Data Mining for Environmental Sciences (s-DMTES IV)
Start Date
17-6-2014 9:00 AM
End Date
17-6-2014 10:20 AM
Abstract
We assessed all papers published in two key environmental modelling journals in 2008 to determine the degree to which the citation counts of the papers could be predicted without considering the paper's quality. We applied both random forests and general additive models to predict citation counts using a range of easily quantified or categorised characteristics of the papers as covariates. The more highly cited papers were, on average, longer, had longer reference lists, had more authors, were more likely to have been published in Environmental Modelling and Software and less likely to include differential or integral equations than papers with lower citation counts. Other equations had no effect. Although these factors had significant predictive power regardless of which statistical modelling approach was applied, unknown factors (presumably, research quality and relevance) accounted for the majority of variability in citation rates.
Included in
Civil Engineering Commons, Data Storage Systems Commons, Environmental Engineering Commons, Hydraulic Engineering Commons, Other Civil and Environmental Engineering Commons
Predicting citation counts of environmental modelling papers
Session G2: Data Mining for Environmental Sciences (s-DMTES IV)
We assessed all papers published in two key environmental modelling journals in 2008 to determine the degree to which the citation counts of the papers could be predicted without considering the paper's quality. We applied both random forests and general additive models to predict citation counts using a range of easily quantified or categorised characteristics of the papers as covariates. The more highly cited papers were, on average, longer, had longer reference lists, had more authors, were more likely to have been published in Environmental Modelling and Software and less likely to include differential or integral equations than papers with lower citation counts. Other equations had no effect. Although these factors had significant predictive power regardless of which statistical modelling approach was applied, unknown factors (presumably, research quality and relevance) accounted for the majority of variability in citation rates.