I conducted correlations using the provided data. Utilizing the Python Seaborn module, I visualized the distributions of the diabetes, obesity, and inactivity data.
I’ve also read about kurtosis, which is used to represent how peaked or how much of a distribution is in the tails, as we covered in the lecture.
I was talking to my teammates when we discovered more information regarding the socioeconomic status of various counties on the CDC website. In addition, we discovered information about food surplus. Therefore, based on socioeconomic factors, food surplus, transportation, population, etc., we can categorize counties as Urban and Rural.
I’ll use the Scikit-Learn model and linear regression to try and fix this issue. and will work to include further discoveries.