Journal‎ > ‎

finSentiment code plan - initial progress

posted Nov 10, 2014, 4:48 AM by Tarek Hoteit
Here is the plan:
0) Create code repository. Done. I am using my personal hosted gitlab on darwin.hoteit.org (use GitLab/PyCharm)
1) setup the basic database and framework. Done. I picked sqlite and django. Note that in the proposal I mentioned using mysql. We might change either the code or the draft for this after we determine the performance of either db (use Django shell script)
2) create the db tables. Done. Basic tables are created. Of course more to be added (run Django script)  
3) Create django views. In progress. Do the Admin views first because it is easier to first implement (run Django script)
4) Add the list of firms into the database (create Python script) using the Excel sheet from the Bankruptcy Research Database (BRD)

After completing 4, we should have an initial backbone for the system

5) develop Twitter code to extract textual data from the site. (create Python script) 
6) Execute the code and import the results into the database (create Python script)

After completing 6, we should have a loaded database with textual data. Now, the sentiment classifications step

7) setup a Java environment for Stanford Sentiment Classification algorithm (Linux admin)
8) execute the analyzer for each of the Twitter texts in our database (write a bash or Perl shell script to do that)
9) import the results into our database (bash or Perl script or sqlite/mysql script)

After completing 9, we should have our database loaded with the sentiment classifications for each Twitter text about the firms in the study. Next is complete the statistical data analysis

10) run a logistic regression algorithm to determine if the sentiment is statistically significant , etc... (more to come on that later)
Comments