Welcome to the OR-Exchange, your site for questions and answers in operations research.

vote up 1 vote down
star

With regards to this blog post. I have collected a number of datasets for different contributing variables to homicide incidents. I would like to know which of these variables is significantly contributing to the number of homicide in the area. I was planning to use our archaic friend, ANOVA, to see if there is any significant effect from each of these variables and then include the dominating ones in my model.

I just want to know if there is any better (or perhaps more modern) tool to find good signals. perhaps Bayesian Inference? :)

flag

2 Answers

vote up 1 vote down

For working straight out of the box analysis of good signals I recommend using decision trees. Good proprietary software tools include Agnoss KnowledgeSeeker. Some open source tools include R with the library "rpart" which implements CART. Here is a good link for an R tutorial.

http://www.statmethods.net/advstats/cart.html

link|flag
vote up 1 vote down

There are a number of data mining techniques (including CART, which larrydag mentioned) that can create a tree structure where you get locally accurate (or at least sort of accurate) predictions in the leaf nodes. Predictors may be significant in some leaf nodes and insignificant in others.

Another possibility would be to run "best subsets regression" with homicides per capita as the dependent variable and most/all of your available predictors (assuming the number of possible predictors is not too large). Look at the best few models of each size and see who is being used.

A statistician would probably say, though, that you should first posit a model for homicides per capita, including interactions of predictors if appropriate (and nonlinear if appropriate), rather than throwing things against the wall and seeing what sticks. I'm of mixed mind about it myself.

link|flag

Your Answer

Not the answer you're looking for? Browse other questions tagged or ask your own question.