HomeWorks

This document provides the homework for the each week of the course.

Homework 1

Choose one of the following topics, and create a PDF version of it using Quarto. For this, create a new repository in your GitHub account, this repository should be named homework_1, and contain the QMD file that you will use to create the PDF.

For this homework use the following resources

Template: template html
- Just copy the Heading of this file in your QMD file
Bibliography: reference.bib
- Add this in the same folder as your QMD file
All figures, if any, can be saved as PNG or JPG files from the linked pages.
Tables, if any, may have to be replicated using markdown tables.

Submit the link to your repository, the PDF and QMD files via email.

Topics

The Impact of Resource Management in StarCraft: A Strategic Analysis My replication
- html
- pdf
- qmd
The Strategic Depth of StarCraft and Its Esports Legacy html pdf
The Mathematics of Dungeons and Dragons: A Statistical Adventure html pdf
The Rise of LitRPG: Blending Literature and Gaming html pdf
The Impact of ‘The Good Guys’ on Modern Fantasy Literature html pdf
Economic Dynamics in Eric Ugland’s ‘The Good Guys’ Series html pdf
The Impact of House Allegiances on Power Dynamics in Westeros html pdf

Homework 2

Download country–year panel data on three variables (“indicators”) of your choice from the World Bank website https://data.worldbank.org/, or using their API program (wbdata).
Once you have the data, clean it up and save a tidy version of it in a Stata file.
Indicate which countries are ranked highest and lowest in each of the three indicators in the year 2000.
Write a short report on what you did to obtain the data, how many countries and years you ended up with in the data (after cleannig), and what difficulties you encountered, if any.

It is not necessary to download the data directly into Stata. You can download the data in CSV format, excel, and then Copy-paste or import it into Stata.

Homework 3

Choose the same 2016/7 season from the football dataset as in data exercise (Book) and produce a different table showing the extent of home team advantage. Compare the results and discuss what you find.

Using the wms-management-survey dataset, pick a country different from Mexico, reproduce all figures and tables of the Book case study, and compare your results to what was found for Mexico.

Homework 4

Nothing here

Homework 5

Nothing here

Homework 6

q1: Use the wms-management-survey dataset and pick a country. (Not two students should pick the same country). Estimate a linear regression with the management quality score (X) and employment (Y). Interpret the slope coefficient, create its 95% CI, and interpret that, too. Explore potential nonlinearities in the patterns of association by lpoly. Estimate a regression that can capture those nonlinearities, and carry out a test to see if you can reject that the linear approximation was good enough for the population of firms represented by the data.

Homework 7

Homework 8

Homework 9

Use the stocks-prices dataset that can be found here. This dataset contains the closing prices for the SP500, Apple, Disney, GameStop, Meta and Nvidia at the end of the month. Using this data, choose one of the companies and reproduce the analysis of the Returns on a company stock and market returns case study (for example SP500 and Apple).

The full case study can be found here. You need to produce the same tables and figures as in the case study, with brief explanations of what you find. No need to use the Daily data, just the Monthly data.

Homework 10

Use the data on house prices here. This dataset contains the prices of houses in a city, along with many other house charcteristics. You are obtaining 80% of the data to estimate the model. Build a model that you think its best to predict the price of a house. You can use any of the variables in the dataset, and you can create new variables if you think they are useful.

Describe at least 3 different specifications you tried, and which one you choose and why. The one with the best fit on the test data (I have access to) can skip next homework.

See the winner here

Homework 11

Homework 12

Task: Predicting the Chances of Reaching the Top 10% Chefs

In this task, you will analyze a simulated dataset of chefs participating in the fictional cooking show “MasterChef.” Your goal is to build a model that predicts the probability of being in the ~top 20% of chefs, based on their characteristics.

Dataset Description

The dataset here contains information about 3000 chefs, with the following variables:

Name: The chef’s full name.
Age: Chef’s age in years.
Experience: Years of professional cooking experience.
KnifeSkills: A score (1-10) assessing knife-handling skills.
PlatingAesthetics: A score (1-10) for visual presentation.
Creativity: A score (1-10) for innovative cooking.
ChallengeWinRate: Percentage of cooking challenges won.
JudgesFeedback: Average feedback score from judges (1-10).
StressManagement: A score (1-10) for managing stress under pressure.
SocialMediaFollowing: A normalized score for social media popularity (0-10).
Education: Highest level of education attained (categorical).
Country: The chef’s country of origin.
CuisineSpecialty: Primary cuisine specialty (categorical).
AudiencePopularity: Audience rating (0-10).
SignatureDishes: Number of signature dishes created.
UniqueIngredients: Number of unique ingredients used in dishes.
HoursPracticed: Average weekly hours of practice.
Top20Percent: Indicator variable (1 if the chef is in the top 20%, 0 otherwise).

Objectives

Descriptive Analysis:
- Summarize the dataset to understand the distributions of key variables.
- Explore how variables differ between chefs in the top 20% and those who are not.
Predictive Modeling:
- Use logistic regression to predict the probability of being in the top 20% (Top20Percent) based on the available characteristics.
- Identify the most influential predictors of success.
Model Evaluation:
- Assess the model’s goodness-of-fit and predictive power.
- Use appropriate metrics (e.g., pseudo R-squared, ROC curve, and classification accuracy).

Deliverables

Write a short report summarizing your findings, including the following:

Key descriptive statistics.
The logistic regression model and its interpretation.
Evaluation metrics and conclusion on the model’s predictive power.
Submit your Stata do-file or log-file documenting your workflow.
Your “model” will be evaluated against everyone else’s using a Test data.

Show down!

See Results here

Winner: Shane