Keywords
Decision suport; Data Science; Feature selection; clustering; Multiview clustering; chi-squared
Start Date
7-7-2022 2:20 PM
End Date
7-7-2022 2:40 PM
Abstract
In December 2020 the INSESS-COVID19 report was presented, as a result of a project to support data-driven policy-making in the COVID19 crisis in Catalonia. Reporting territorial results, focusing the prototyping situations of the Basic Areas Social Services (BASS) in terms of the kinds of impact of COVID19 confinement on social vulnerability was necessary. Nevertheless, groups with several BASS of similar profile was searched. Individual reports at BASS level were precluded, as multivariate analysis of relationships raise the risk of violating statistical secrecy, thus putting at risk the anonymity of the participants as well as their safety when vulnerable populations are involved, as is the case of INSESS-COVID19. A multiview clustering methodology was used, thus clustering BASS by different facets with several variables each. However too heterogeneous clusters were obtained, not useful for design of territorial policies. Therefore, we backtracked the previous step of variable selection methodology, to identify the most effective variables to define clusters. To choose these variables we have tested two evaluation criteria based on measuring the discriminant power of a variable in the territory. One is based on Chi-squared test and the other on Lebart test-values. Postprocessing tools Thermometers and Traffic Light Panels were used to interpret results. As preliminary conclusions, Chi-squared and Lebart test values catch different information and provide different selections. At the same time, we have discovered the relationship between the representativeness of the variables and the number of modalities per variable. The better criterion has been used to identify the most discriminant variable per block (it could be a new indicator or a single variable from the original database). The final clustering performed with the selected variables/indicators is described.
Variables Selection for improving clustering multiview processes
In December 2020 the INSESS-COVID19 report was presented, as a result of a project to support data-driven policy-making in the COVID19 crisis in Catalonia. Reporting territorial results, focusing the prototyping situations of the Basic Areas Social Services (BASS) in terms of the kinds of impact of COVID19 confinement on social vulnerability was necessary. Nevertheless, groups with several BASS of similar profile was searched. Individual reports at BASS level were precluded, as multivariate analysis of relationships raise the risk of violating statistical secrecy, thus putting at risk the anonymity of the participants as well as their safety when vulnerable populations are involved, as is the case of INSESS-COVID19. A multiview clustering methodology was used, thus clustering BASS by different facets with several variables each. However too heterogeneous clusters were obtained, not useful for design of territorial policies. Therefore, we backtracked the previous step of variable selection methodology, to identify the most effective variables to define clusters. To choose these variables we have tested two evaluation criteria based on measuring the discriminant power of a variable in the territory. One is based on Chi-squared test and the other on Lebart test-values. Postprocessing tools Thermometers and Traffic Light Panels were used to interpret results. As preliminary conclusions, Chi-squared and Lebart test values catch different information and provide different selections. At the same time, we have discovered the relationship between the representativeness of the variables and the number of modalities per variable. The better criterion has been used to identify the most discriminant variable per block (it could be a new indicator or a single variable from the original database). The final clustering performed with the selected variables/indicators is described.
Stream and Session
false