ECS 189G Term Project
Due date:
Scheduled final exam day (no written final), 11:59 pm. NO LATE
SUBMISSIONS.
Problem A
Here you will apply recommender systems methods to a very nonstandard RS
problem, analyzing voting data in Congress. Here are the details:
- The
dataset
is in the UCI Machine Learning Repository, under the title "
Congressional Voting Records Data Set."
- Cast this as an RS problem, with the members of Congress being the
"moviegoers," the legislative bills being the "movies," and the votes (1
for yes, 0 for non-yes) being the "ratings."
- The goal is to predict the missing votes, coded as '?' in the data.
- Try the various methods from our course.
Problem B
Here you will use another item from UCI,
the
Drug Review Dataset (Large).
The task here will be different from what we've done. You will not be
predicting missing ratings, but instead will be analyzing verbal (i.e.
text) ratings.
- Keep in mind that this dataset is physically challenging. It is
somewhat larger than what we've been working on (though not "Big Data"),
and it is in a more complex format. See
this page (material on fread()) for tips.
- Your task here will be to investigate how reliable the verbal
reviews are in establishing a numeric rating. Can the given numeric
rating be predicted well from the verbal one (plus other variables, such
as the disease being treated)? Again, this is just exploring, but we
hope to gain insight for other datasets in which there is no numeric
rating available.
- As your predictive tools, use at least a linear model and neural
networks.
- Your main tool will be sentiment analysis. Various R
packages are available, such as
RSentiment.
- Your project must be in a file TermProject.pdf.
- You are allowed and encouraged to use the Web or other sources.
Of course, all such sources must be cited. Use of human sources is
allowed only if they are cited.
- Write as if you are consultant and this is a report to a client.
Assume the client is reasonably good with quantitative discussions
but does not have backround in RS. You may wish to cite our textbook
and/or other sources.
- Include a section (for each problem) discussing the "data
wrangling" you needed to do in order to ready the datasets for
analysis.
- You must do all your coding in R. This is so that I can easily
run your code when I grade your project.
- Your group submits just ONE copy of the report.
- Absolutely NO late reports will be accepted. As you near the
deadline, keep submitting what you have so (each one will overwrite
the last), so that at least you will get a lot of credit even if you
don't finish.
- Submit your report, including all files (.tex,
.pdf, R files, any image files, etc.) to my
handin site on CSIF (NOT the TA's site),
directory 189gproject. The name of your file must
be of the form email1.email2....tar , where each
emaili is the UCD e-mail address of group
member i, e.g. bclinton.gwbush.bobama.dtrump.tar.
Note the periods separating fields. Your
.tar file must contain only regular files, NO
SUBDIRECTORIES!!!! And .tar does NOT mean .tar.gz or
.tar.bz2 (or for that matter .rar, which one students
used once).
- Double check that you are meeting all the specs.
- NO SUBDIRECTORIES! NO SUBDIRECTORIES! NO SUBDIRECTORIES!
- Make sure that all partners' names are on the report, and that the
e-mail addresses in the file name are EXACTLY the official UCD e-mail
addresses for the students. These are the addresses at which you have
been receiving e-mail from me regarding blog posts.
DON'T RISK HAVING A TEAM MEMBER
FAILING TO GET CREDIT FOR THE PROJECT -- IT HAS HAPPENED!
- Include a section listing each team member's contribution -- who
did what. If a member did not participate, do not include him/
her in this section, and in the .tar file name.
- Include an appendix (\appendix \section{} etc.) for code
listings. Also remember to include your code files in the package.
- Did I mention, NO SUBDIRECTORIES!
- Groups that put a reasonable amount of time -- and thought! --
almost always receive an A or A+ grade on the project. Groups that
do not complete the project usually get a D grade. PLEASE START
EARLY!
- As explained in class, groups that do good work on the project
receive an extra bonus in their course grades, beyond the weight
stated in the course syllabus. It often happens that, say, a B+
grade becomes an A- or even an A.
- Technical content of the work (correctness, thoroughness etc.).
- Adherence to instructions.
- Professional quality of the work: Clear, engaging writing,
using correct grammar; it need not (should not) be pretentious, but
avoid being too colloquial ("the mean was kinda low"). Presentation
need not be fancy, but graphs and tables should be used when helpful.
- A+ grades are very possible, and can have a significant impact
on your course grade, letters of recommendation, knighthoods,
marriage prospects, coronations, etc.