Keywords

atmospheric aerosols, particle formation, data mining, classification

Start Date

1-7-2006 12:00 AM

Description

Atmospheric aerosol particle formation is frequently observed in various environments. Yet, despite numerous studies, processes behind these so called nucleation events remain unclear. In this work we describe the use of data mining techniques to detect factors influencing particle formation. These techniques are applied to a dataset of eight years of 80 variables collected at the boreal forest station (SMEAR II) in Southern Finland, including air pollutant, weather, gas and particle measurements. In a previous study classification methods have been used together with feature selection in order to understand what causes nucleation. Each day was classified as an event day, when a nucleation event occurs, or as a nonevent day, and looking at which features were selected gives us information on which factors are important for the aerosol formation process. This way it was possible to identify two key variables, relative humidity and preexisting aerosol particle surface (condensation sink), capable of explaining 88% of the nucleation events. Using these two variables a nucleation probability function could be derived. In this paper this nucleation probability function has been tested on data collected from other sites, Värriö in Northern Lapland and Aspvreten in Sweden. We show that in the extreme conditions in Värriö the nucleation parameter does not work, whereas in Aspvreten the two key variables can be used to identify nucleation events, though the nucleation parameter has to be adjusted slightly. The two key variables are related to mechanisms that prevent nucleation. One reason for the domination of preventive mechanisms could be the existence of more than one mechanism causing nucleation. Another intriguing phenomenon, possibly related to this, is the temporal variation of nucleation events. We have investigated temporal phenomena in nucleation by using classification methods in a sliding window.We discuss some aspects of this approach and present some results obtained.

Share

COinS
 
Jul 1st, 12:00 AM

Data mining approaches to explaining aerosol formation

Atmospheric aerosol particle formation is frequently observed in various environments. Yet, despite numerous studies, processes behind these so called nucleation events remain unclear. In this work we describe the use of data mining techniques to detect factors influencing particle formation. These techniques are applied to a dataset of eight years of 80 variables collected at the boreal forest station (SMEAR II) in Southern Finland, including air pollutant, weather, gas and particle measurements. In a previous study classification methods have been used together with feature selection in order to understand what causes nucleation. Each day was classified as an event day, when a nucleation event occurs, or as a nonevent day, and looking at which features were selected gives us information on which factors are important for the aerosol formation process. This way it was possible to identify two key variables, relative humidity and preexisting aerosol particle surface (condensation sink), capable of explaining 88% of the nucleation events. Using these two variables a nucleation probability function could be derived. In this paper this nucleation probability function has been tested on data collected from other sites, Värriö in Northern Lapland and Aspvreten in Sweden. We show that in the extreme conditions in Värriö the nucleation parameter does not work, whereas in Aspvreten the two key variables can be used to identify nucleation events, though the nucleation parameter has to be adjusted slightly. The two key variables are related to mechanisms that prevent nucleation. One reason for the domination of preventive mechanisms could be the existence of more than one mechanism causing nucleation. Another intriguing phenomenon, possibly related to this, is the temporal variation of nucleation events. We have investigated temporal phenomena in nucleation by using classification methods in a sliding window.We discuss some aspects of this approach and present some results obtained.