ML-Ch2-P2-End to End Machine Learning Project

Introduction

Welcome to this exciting chapter Again!


Step 3: Explore and Visualize the Data to Gain Insights

  • make a copy of the original dataset so we can revert to it afterwards

Visualizing geographical data

Since the dataset includes geographical information, it is a good idea to create a scatterplot of all the districts to visualize the data.

1
2
housing.plot(kind='scatter', x='longitude', y='latitude', grid=True)
plt.show()
Fig 1: A geographical scatterplot of the data
But this is hard to see any particular pattern. Setting the alpha option to 0.2 makes it easier to visualize the places where there is a high density of data points.
1
2
housing.plot(kind='scatter', x='longitude', y='latitude', grid=True, alpha=0.2)
plt.show()
Fig 1: A better visualization that highlights high-density areas

Next, let’s look at the housing prices. The radius of each circle represents the district’s population (option s), and the color represents the price (option c). We can use a predefined color map (option cmap) called jet, which ranges from blue (low values) to red (high prices).

1
2
3
4
5
housing.plot(kind='scatter', x='longitude', y='latitude', grid=True,
s=housing["population"] / 100, label="population",
c="median_house_value", cmap="jet", colorbar=True,
legend=True, sharex=False, figsize=(10,7), alpha=0.2)
plt.show()
Fig 1: A better visualization that highlights high-density areas

Look for corrections

Experiment with attribute combinations


Step 4: Prepare the Data for Machine Learning Algorithms


Step 5: Select a Model


Step 6: Fine-Tune the Model


Step 7: Present the Solution


Step 8: Launch, Monitor, and Maintain the System

Author

Sai (Emily) Peng

Posted on

2025-01-09

Updated on

2025-08-01

Licensed under

Comments