HomeWorks
This document provides the homework for the each week of the course.
Homework 1
Choose one of the following topics, and create a PDF version of it using Quarto. For this, create a new repository in your GitHub account, this repository should be named homework_1
, and contain the QMD file that you will use to create the PDF.
For this homework use the following resources
- Template: template html
- Just copy the Heading of this file in your QMD file
- Bibliography: reference.bib
- Add this in the same folder as your QMD file
- All figures, if any, can be saved as PNG or JPG files from the linked pages.
- Tables, if any, may have to be replicated using
markdown
tables.
Submit the link to your repository, the PDF and QMD files via email.
Topics
- The Impact of Resource Management in StarCraft: A Strategic Analysis My replication
- The Strategic Depth of StarCraft and Its Esports Legacy html pdf
- The Mathematics of Dungeons and Dragons: A Statistical Adventure html pdf
- The Rise of LitRPG: Blending Literature and Gaming html pdf
- The Impact of ‘The Good Guys’ on Modern Fantasy Literature html pdf
- Economic Dynamics in Eric Ugland’s ‘The Good Guys’ Series html pdf
- The Impact of House Allegiances on Power Dynamics in Westeros html pdf
Homework 2
- Download country–year panel data on three variables (“indicators”) of your choice from the World Bank website https://data.worldbank.org/, or using their API program (
wbdata
). - Once you have the data, clean it up and save a tidy version of it in a Stata file.
- Indicate which countries are ranked highest and lowest in each of the three indicators in the year 2000.
- Write a short report on what you did to obtain the data, how many countries and years you ended up with in the data (after cleannig), and what difficulties you encountered, if any.
It is not necessary to download the data directly into Stata. You can download the data in CSV format, excel, and then Copy-paste or import it into Stata.
Homework 3
- Choose the same 2016/7 season from the
football
dataset as in data exercise (Book) and produce a different table showing the extent of home team advantage. Compare the results and discuss what you find.
or
- Using the
wms-management-survey
dataset, pick a country different from Mexico, reproduce all figures and tables of the Book case study, and compare your results to what was found for Mexico.
Homework 4
Nothing here
Homework 5
Nothing here
Homework 6
q1: Use the wms-management-survey
dataset and pick a country. (Not two students should pick the same country). Estimate a linear regression with the management quality score (X) and employment (Y). Interpret the slope coefficient, create its 95% CI, and interpret that, too. Explore potential nonlinearities in the patterns of association by lpoly
. Estimate a regression that can capture those nonlinearities, and carry out a test to see if you can reject that the linear approximation was good enough for the population of firms represented by the data.
Homework 7
Homework 8
Homework 9
Use the stocks-prices dataset that can be found here. This dataset contains the closing prices for the SP500, Apple, Disney, GameStop, Meta and Nvidia at the end of the month. Using this data, choose one of the companies and reproduce the analysis of the Returns on a company stock and market returns case study (for example SP500 and Apple).
The full case study can be found here. You need to produce the same tables and figures as in the case study, with brief explanations of what you find. No need to use the Daily data, just the Monthly data.
Homework 10
Use the data on house prices here. This dataset contains the prices of houses in a city, along with many other house charcteristics. You are obtaining 80% of the data to estimate the model. Build a model that you think its best to predict the price of a house. You can use any of the variables in the dataset, and you can create new variables if you think they are useful.
Describe at least 3 different specifications you tried, and which one you choose and why. The one with the best fit on the test data (I have access to) can skip next homework.
See the winner here
Homework 11
Homework 12
Task: Predicting the Chances of Reaching the Top 10% Chefs
In this task, you will analyze a simulated dataset of chefs participating in the fictional cooking show “MasterChef.” Your goal is to build a model that predicts the probability of being in the ~top 20% of chefs, based on their characteristics.
Dataset Description
The dataset here contains information about 3000 chefs, with the following variables:
- Name: The chef’s full name.
- Age: Chef’s age in years.
- Experience: Years of professional cooking experience.
- KnifeSkills: A score (1-10) assessing knife-handling skills.
- PlatingAesthetics: A score (1-10) for visual presentation.
- Creativity: A score (1-10) for innovative cooking.
- ChallengeWinRate: Percentage of cooking challenges won.
- JudgesFeedback: Average feedback score from judges (1-10).
- StressManagement: A score (1-10) for managing stress under pressure.
- SocialMediaFollowing: A normalized score for social media popularity (0-10).
- Education: Highest level of education attained (categorical).
- Country: The chef’s country of origin.
- CuisineSpecialty: Primary cuisine specialty (categorical).
- AudiencePopularity: Audience rating (0-10).
- SignatureDishes: Number of signature dishes created.
- UniqueIngredients: Number of unique ingredients used in dishes.
- HoursPracticed: Average weekly hours of practice.
- Top20Percent: Indicator variable (1 if the chef is in the top 20%, 0 otherwise).
Objectives
Descriptive Analysis:
- Summarize the dataset to understand the distributions of key variables.
- Explore how variables differ between chefs in the top 20% and those who are not.
Predictive Modeling:
- Use logistic regression to predict the probability of being in the top 20% (
Top20Percent
) based on the available characteristics. - Identify the most influential predictors of success.
- Use logistic regression to predict the probability of being in the top 20% (
Model Evaluation:
- Assess the model’s goodness-of-fit and predictive power.
- Use appropriate metrics (e.g., pseudo R-squared, ROC curve, and classification accuracy).
Deliverables
Write a short report summarizing your findings, including the following:
- Key descriptive statistics.
- The logistic regression model and its interpretation.
- Evaluation metrics and conclusion on the model’s predictive power.
- Submit your Stata do-file or log-file documenting your workflow.
- Your “model” will be evaluated against everyone else’s using a Test data.