Keywords

data science; clustering; Decarbonization; data-driven decision making

Start Date

6-7-2022 9:20 AM

End Date

6-7-2022 9:40 AM

Abstract

Article Understanding USA Power Plants renewable behaviours with data science Ignacio Peñafiel (1), Karina Gibert (2) (1) Knowledge Engineering and Machine Learning group at Intelligent Data Science and Artificial Intelligence Research Center; Research Institute of Science and Technology for Sustainability; Universitat Politècnica de Catalunya-BarcelonaTech, Spain), penafielm@gmail.com (2) Knowledge Engineering and Machine Learning group at Intelligent Data Science and Artificial Intelligence Research Center; Research Institute of Science and Technology for Sustainability; Universitat Politècnica de Catalunya-BarcelonaTech, Spain), karina.gibert@upc.edu 1. Abstract: Background: Decarbonization, renewable energy and sustainability awareness are growing. Society and Governments target a share of power plants from renewable energy sources up to 35% by 2020 and 80% by 2050 (1). This ambitious target involves decommissioning of existing fossil-fuel power plants and replace them by new renewable power plants. This plan needs an accurate data-driven decision making. Data: open data from USA power plants with all possible renewable and nonrenewable primary energy source higher than 100MW (2). Open data is enriched with additional explanatory variables authors considered as determinant for the problem, such as climatic and demographic data (sun exposure, temperature, precipitations, humidity, wind, population, number of companies, cultural/politics, and energy current capacity (demand)). Methods: Self Organizing Maps, and a further grouping of regions based on the centroids of their SOM clusters, generating groups of power plants with similar behavior. By introducing post-processing tools like Class panel graphs and Traffic lights panels, the conceptualization of the clusters is enhanced, providing to the human expert a useful tool to better define tailored actions and decisions for each one of the discovered groups. Additional binary decision tree has been trained to find which is the renewable percentage advisable based on the explanatory variables, so that missing values could be properly imputed. Results: Clusters result on geographically coherence and are reflecting very well the reality of the current status quo, we can see those specific states like California where we have a big population and a good renewable International Congress on Environmental Modelling & Software iEMSs balance forms his own cluster, other state as Texas with also big population but worse renewable balance forms a single cluster as well. States with very bad carbon-fuel energy source are forming their own clusters with differences in the explanatory and climate variables, so we can address tailored actions for the groups efficiently. On top of this results, experts could define some proposed actions that can help for the 80% target in 2050. Conclusions: Clustering+ post processing is useful to understand complex realities and support decision-making. In the future, more explanatory variables could be added to the model such population dispersion, PIB of the region, natural water availability (rivers and seas) etc… we could improve algorithm with data with higher resolution such cities.

Stream and Session

false

Share

COinS
 
Jul 6th, 9:20 AM Jul 6th, 9:40 AM

Understanding USA Power Plants renewable behaviours with data science

Article Understanding USA Power Plants renewable behaviours with data science Ignacio Peñafiel (1), Karina Gibert (2) (1) Knowledge Engineering and Machine Learning group at Intelligent Data Science and Artificial Intelligence Research Center; Research Institute of Science and Technology for Sustainability; Universitat Politècnica de Catalunya-BarcelonaTech, Spain), penafielm@gmail.com (2) Knowledge Engineering and Machine Learning group at Intelligent Data Science and Artificial Intelligence Research Center; Research Institute of Science and Technology for Sustainability; Universitat Politècnica de Catalunya-BarcelonaTech, Spain), karina.gibert@upc.edu 1. Abstract: Background: Decarbonization, renewable energy and sustainability awareness are growing. Society and Governments target a share of power plants from renewable energy sources up to 35% by 2020 and 80% by 2050 (1). This ambitious target involves decommissioning of existing fossil-fuel power plants and replace them by new renewable power plants. This plan needs an accurate data-driven decision making. Data: open data from USA power plants with all possible renewable and nonrenewable primary energy source higher than 100MW (2). Open data is enriched with additional explanatory variables authors considered as determinant for the problem, such as climatic and demographic data (sun exposure, temperature, precipitations, humidity, wind, population, number of companies, cultural/politics, and energy current capacity (demand)). Methods: Self Organizing Maps, and a further grouping of regions based on the centroids of their SOM clusters, generating groups of power plants with similar behavior. By introducing post-processing tools like Class panel graphs and Traffic lights panels, the conceptualization of the clusters is enhanced, providing to the human expert a useful tool to better define tailored actions and decisions for each one of the discovered groups. Additional binary decision tree has been trained to find which is the renewable percentage advisable based on the explanatory variables, so that missing values could be properly imputed. Results: Clusters result on geographically coherence and are reflecting very well the reality of the current status quo, we can see those specific states like California where we have a big population and a good renewable International Congress on Environmental Modelling & Software iEMSs balance forms his own cluster, other state as Texas with also big population but worse renewable balance forms a single cluster as well. States with very bad carbon-fuel energy source are forming their own clusters with differences in the explanatory and climate variables, so we can address tailored actions for the groups efficiently. On top of this results, experts could define some proposed actions that can help for the 80% target in 2050. Conclusions: Clustering+ post processing is useful to understand complex realities and support decision-making. In the future, more explanatory variables could be added to the model such population dispersion, PIB of the region, natural water availability (rivers and seas) etc… we could improve algorithm with data with higher resolution such cities.