PROJECT PROPOSAL – DELIVERABLE 1 Case Solution
Description of Data Set
The data set which we would be using for this research project is based on Crime in the small cities of America. This project would utilize the information contained in the data file about the crime, education and police funding for the small American cities in the 10 southeastern and eastern states of America. The data is based on different states which include Florida, Georgia, South Carolina, North Carolina, New York, Maine, Rhode Island, Connecticut and New Hampshire (Thomas, 2016).
The data set consists of a total of 100 observations which represent the 50 small cities in America within these states. This data has been extracted from “Life in America’s Small Cities by G.S. Thomas. Y is the independent variable and all the other variables (X1, X2, X3, X4, X5, and X6) are the repressors and would be used as the independent variables in our final analysis. The description of each of the variable in the data set is provided as follows:
Y= Total overall reported rate of crime per 1 million residents
X1= reported violent crime rate per 100,000 residents
X2= annual police funding in $/resident
X3= % of people 25 years+ with 4 yrs. of high school
X4= % of 16 to 19 year-olds not in high school and not high school graduates.
X5= % of 18 to 24 year-olds in college
X6= % of people 25 years+ with at least 4 years of college
Research Question
The main research question which we want to answer in this research project by making the use of the selected data set is as follows:
“Whether crime, police funding and education have a relationship and impact on the rate of crime in small American cities?”
Cleaning & Processing of Raw Data Set
In deliverable 1 of this project, we have identified a number of the issues with the data set. The first issues related to the presence of the unusual values or called as outliers in the data set. The interquartile range had been calculated and based on the lower fence and the upper fence intervals we have identified a number of variables where the outliers were present including the dependent variable which is the rate of crime in small US cities represented by Y. All those variables which had outliers in their data are Y, X1, X2, X5 and X6. If we compare the lower and the upper fence with the raw data as present in the data set file, then we can identify all the outliers which are out of the lower and the upper fence for each variable.
Therefore, first thing we did is to clean the entire data set and remove all the outliers to create a new trimmed data set. The trimming of the data set and the removal of all the lower and higher value outliers has been done through Winsorization. Winsorization is the transformation of the statistical data sets by limiting the extreme values within the data set in order to reduce the effect of the spurious outliers for performing the exploratory analysis in deliverable 3 of the project. In Winsorization all the high extreme values have been replaced by the upper bound value of that particular variable and the lower extreme values have been replaced by the lower bound value. The trimmed data set could be seen in the excel spreadsheet. The comparison of the means and the standard deviation for all the variables between raw data set and trimmed data set is shown in exhibit 1 in the appendix. Specifically the standard deviations show the effect of the removal of outliers from the data set. The trimmed data set could now be checked for extreme values by looking at lower and upper bounds as shown in exhibit 2 in the appendix....................
This is just a sample partial case solution. Please place the order on the website to order your own originally done case solution.