INFORMS has recently announced its data mining competition. They have posted the famous billion Dollar question of "how to do day trading" and whoever solves the challenge may get some recognition (and no money from INFORMS!).
I am wondering, what are the interesting models, resources, software to tackle this question? I have used R and its GLM package to perform logistic regressions. The following is my code for getting the AUC of 0.611879 (I am currently in the 3rd place) feel free to improve upon this code (additionally, any citation to this code or anything that can help me get a faculty position in a year is hugely appreciated)
R code (I ran it on an Ubuntu Linux): please note that I developed this piece of code for personal use I noticed there are wording and grammatical errors in the comments.
# Logit Regression Model
rm(list = ls(all = TRUE))
setwd('~/Desktop/Informs\ Datamining\ Contest/Data/')
training.data <- read.csv("TrainingData.csv", header = TRUE)
head(training.data)
plot(training.data$Timestamp,training.data$Variable141LAST_PRICE, type='l')
attach(training.data)
#Finding the maximum positive and negative correlation
names(training.data)
corrs <- as.null()
for (stock.name in names(training.data)) {
correlation <- cor(eval(as.name(eval(stock.name))),TargetVariable)
corrs <- rbind(corrs, correlation)
print(paste("Correlation Between TargetStock and", stock.name," = ", correlation))
}
correls <-data.frame(ComparedStock = names(training.data), Correlations = corrs)
correls[order(correls$Correlations),]
write.table(correls[order(correls$Correlations),],file="Correlations.csv",sep=",",
quote=FALSE)
# Prediction Model
# Logistic Regression Model
mylogit<- glm(TargetVariable~Variable101OPEN+Variable101LOW+Variable101HIGH+Variable101LAST_PRICE+Variable133LOW+Variable133OPEN+Variable78LOW, family=binomial(link="logit"), na.action=na.pass)
#Readign Result Data
result.data <- read.csv("ResultData.csv", header = TRUE)
head(result.data)
FinalPrediction<-predict(mylogit,newdata=result.data,type="response")
#Saving Data
template.data <- read.csv("template.csv", header = TRUE)
my.output<-data.frame(template.data$Timestamp,FinalPrediction)
head(my.output)
names(my.output) <-c("Timestamp","TargetVariable")
write.table(my.output,file="submit.csv",sep=",",row.names=FALSE,
quote=FALSE)
My email address is linux_jvm@yahoo.com if you have any questions.