Working on a MySQL project by Harvard university

avatar

Since I started web development several months ago, I have always wondered what happens behind the scene when a user creates an account on a website or when they fill in their email and password and click on the "login" button, how does the website know the person entered the right details and then grant them access to the site? well, I found out a few days ago when I started learning SQL (specifically MySQL).

It's things like this that make programming so fun and interesting, that "Ah-ha" moments when you suddenly discover how things work in a website. I have been taking an introduction to computer science course (CS50) with Harvard university online (Harvardx) and so far, I have been introduced to C, python, and now SQL.

After each lecture week, we are usually given problem sets to solve which pertain to the language and topic that was taught that week. I recently worked on a project that involves writing codes to query an SQL file filled with data about movies from IMDb and there are over a million rows of data in there. There were 13 questions I had to answer using SQL queries and I will talk about how I solved 5 of them.


Warning: if you're currently taking CS50 and you haven't worked on this problem, you might want to stop reading at this point because I will be providing solutions to some of them. I assume you must have read the course's policy on academic honesty and you know what it means to use someone's code in your work.

Understanding the database

The first thing I did was to use .schema to show the contents of the database file. There are 5 tables there and each of them holds different data.

The movies table has 3 columns: the title of the movies, the year they were released, and a column containing id numbers that uniquely identify each movie. The same thing can be seen in the people's table: unique id numbers, names of people (directors and movie stars), and their year of birth.

The stars and directors tables are used to link to the people's table and also the movies table. They have something called foreign keys, which are just id numbers that reference the unique id numbers of the movies and people's tables. The foreign keys can be used to link to the tables that have the unique id numbers it's referencing, we can combine two or more tables with this. Now let's answer the first question.

The first question

SELECT title
FROM movies
WHERE year = 2008
LIMIT BY 20;

You can sort of understand what's going on in the above code without me having to explain. The SELECT command is used to select data from a database and since the question said we should list the title of movies, that's what I did: select title from the movies table. But there's a constraint: we only need movies that were released in 2008 and that's where the next command comes in: WHERE.

With that command, I'm saying: Give me the titles of the movies that were released in 2008. The last line of code isn't supposed to be there but I added it just so I can show you guys what the result looks like. The LIMIT command does what it says: it limits the number of results according to the number provided and in this case, I provided 20. So, give me the top 20 movies that were released in the year 2008.

This is the output:

Second question

This is similar to the first one but the only difference is we are printing the year of birth of a person. It's the same logic as the first, but this time we are going to access the people's table because that's where we can find Emma stone and also her birth year

SELECT birth
FROM people
WHERE name = "Emma Stone";

As you can see, the logic is similar to the first but the WHERE command is now matching a string instead of numbers. The above code simply means: select the birth year from the people's table but only give me the birth year that matches Emma Stone. If we didn't provide that last command, it will just output the birth year of everyone in the table, and if Emma stone isn't in the table, it won't output anything.

This is the output:

Third question

The solution to this is almost the same as the first one, the difference is that we are not only listing movies from a particular year but from that year and upwards. So, instead of writing a query to list out movies released in only 2018, we need to also include the ones released in 2019, 2020, and so on

SELECT title
FROM movies
WHERE year >= 2018
ORDER BY title
LIMIT 20;

So, select all the movie titles from the movies table but only select the ones that were released in 2018 and above. The last thing we should do is to arrange all those movies in alphabetical order and that's where the ORDER BY command comes in. We are ordering the result according to the title, so the movies will be listed starting with numbers, before A, then B, then C, and so on.

This is the output:

Fourth question

The solution to this is also very similar to some of the previous ones but this time, we are to write a query that will output the total number of movies that have a rating of 10, does that mean we have to count? Exactly!

SELECT COUNT(rating)
FROM ratings
WHERE rating = 10.0;

The count command provides the total number of whatever you pass into the bracket. So in this case, we are selecting all the ratings from the rating table but we only need those that are 10, and we don't need to print them out, we only need the total number and that's why we have to enclose the rating with the COUNT command.

This is the output:

Fifth question

That's a lot of specifications but don't let the word "chronological" confuse you, it simply means to list the movies according to the year they were released, in simpler terms: order the movies by year of release. One thing you should note is that unlike before where we have to list out just one thing (maybe a movie title or a year of birth or ratings of a movie), this time we are listing out two things: the title of the movies and also the year they were released.

SELECT title, year
FROM movies
WHERE title
LIKE "Harry Potter%"
ORDER BY year;

To list out two or more things from a table, we simply use the SELECT command as usual and then separate each of the things we are selecting with a comma. So, select all titles and years from the movies table but... Now this is where it gets sought of confusing, haven't I been using the equality sign (=) with the WHERE command? How come I am using something else now?

Well, if we are to use the equality sign for this query, the program will only look for movies that exactly match the title "Harry Potter" and as far as I can tell, there are none. All the Harry Potter movies have some other things attached to them, for example; Harry Potter and the cursed child or Harry Potter and the sorcerer's stone. The equality sign won't pick up any of these movie titles, it will only look for those that are exactly Harry Potter.

To fix that, we will use the LIKE command which will look for titles that have the words "Harry Potter". But I believe you notice I added a percentage sign (%) at the end, but why? According to the specifications of this question, if a movie title begins with the words "Harry Potter", then it's considered an Harry Potter movie, which means we don't have to worry about what comes after that and that's why the % is there, it strips off every other words that comes after "Harry Potter".

So, we are listing out all titles and years from the movies table where the movie title starts with Harry Potter. Any movie title that starts with Harry Potter will be listed along with the year it is released, and they will be listed according to the year they were released.

This is the output:

That's all for now

I will talk about 2 more questions and their solutions in another article. I mentioned earlier that you can join two or more tables using a foreign key in one table that is referencing the unique id numbers of data from another table but I never showed how that works, so I will talk about that in the next article.

I will also show how you can nest queries (we call them subqueries). By doing this, you can pass the result of one query (the inner or subquery) to another query (the outer query). This can be very useful when you want to access the data in one table and use it in another table.

Thanks for reading

Connect with me on:
Twitter: @kushyzeena
Readcash: @kushyzee

Lead image: Image by pressfoto on Freepik.
Edited with Canva


0
0
0.000
23 comments
avatar

Congratulations @kushyzee! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s):

You distributed more than 700 upvotes.
Your next target is to reach 800 upvotes.

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

To support your work, I also upvoted your post!

Check out the last post from @hivebuzz:

Our Hive Power Delegations to the August PUM Winners
Feedback from the September 1st Hive Power Up Day
Hive Power Up Month Challenge 2022-08 - Winners List
Support the HiveBuzz project. Vote for our proposal!
0
0
0.000
avatar

This looks great though
I don't know much about MySQL
But I'd love to take a course soon

Your Harvardx is it on edX?

0
0
0.000
avatar

Yeah exactly, it's on edx. You can follow the Harvard link I dropped on this article if you're interested in it

0
0
0.000
avatar

Is the course free

0
0
0.000
avatar

Yeah totally. And you get a certificate after you complete it

0
0
0.000
avatar

The questions are interesting to read. I can't help my mind think of the solution while reading them. It's like solving a puzzle. 😊

Thanks for sharing.

!1UP

0
0
0.000
avatar

That's what I love about Harvard, the problem sets they give are very interesting. The one I'm currently working on now is all about using SQL to figure out who stole an item by going through evidence, security logs and witnesses' accounts, I'm playing detective! 😁

0
0
0.000
avatar

LOL. That must be fun. Certainly an interesting way to learn. 😁

0
0
0.000
avatar

It certainly is. Thank you for reading my post

0
0
0.000
avatar

Wow! I've always wanted to know what SQL is all about and this post really made it seem interesting. I started feeling like I'm very familiar with this language that I even solved the second and third question before scrolling to their solutions 😂, it's probably because of the detailed explanation. I definitely can't wait to start learning SQL but as for now I still have to deal with JavaScript 🙂.

0
0
0.000
avatar
(Edited)

The almighty JavaScript 😆 I'm considering changing my path from frontend developer to data scientist, SQL is just so interesting and it's quite easy to get into. JavaScript is the next language for me to learn after SQL, i'm definitely not looking forward to it 😆. Thank you for reading through my article bro, it really means a lot to me

0
0
0.000
avatar

You're welcome bro💪
Although I just started JavaScript, I can't say it's that bad😅. I guess it's just voluminous and requires a lot of practice.

0
0
0.000
avatar

I'll leave this subject to my daughter. My brain is bleeding lol.

0
0
0.000
avatar

🤣 Good idea. Your daughter still have a long way to go before she gets to this stage, it's very advanced

0
0
0.000
avatar

Thanks for this review on SQL. This is one of the language I have never considered learning, even when I was developing websites in my early days. Which kind of developments are you up to?

By the way, if you want to play with codes, would you be interested in taking part in my citizen science project on Hive? (I am taking the liberty to advertise my own stuff here, I know :) ).

Cheers!

0
0
0.000
avatar

For now I am into Frontend web development but I have some backend experience.

I will love to know more about your citizen science project, Sir 😊

0
0
0.000
avatar

For now I am into Frontend web development but I have some backend experience.

I see. Then you are complete, somehow ;)

I will love to know more about your citizen science project, Sir 😊

Feel free to browse the #citizenscience tag. You can also check this post, in which there is a list with the already published 5 episodes at its beginning (I recommend dealing with them in the right ordering).

0
0
0.000