This project analyzes Formula 1 race tracks using unsupervised machine learning techniques, aiming to identify meaningful clusters based on physical and performance-related characteristics. By applying K-Means clustering, we group circuits into distinct types, helping fans and analysts better understand strategic and technical demands across the championship calendar. The result is a data-driven classification of tracks into three main categories: "Modern All-Rounders", "Classic Speed Circuits", and the unique "Technical Street Challenge".
Formula 1 circuits vary widely in design, surface, and flow. These differences significantly impact car performance, race strategies, tire choices, and driver behavior. Instead of treating each circuit as a standalone case, this project uses clustering to group similar tracks together based on their structural features and driving characteristics.
The goal of this project is to classify and cluster F1 circuits using only track-specific data that does not rely on driver behavior or team strategies. For this reason, the dataset prioritizes features that describe the layout and physical attributes of each track. These include:
Asphalt features: Abrasion
Layout complexity: Number of corners, corner density
Composition of corner types: High-, Medium-, and Low-speed corners
The central idea is to maintain independence from external variables like weather, team updates, or driver skill, focusing instead on what makes each circuit unique from a structural perspective.
The clustering methodology followed a structured, business-agnostic yet analytically sound approach designed to extract clarity from complexity.
We used K-Means Clustering, a well-established unsupervised learning technique that groups data points into a specified number (k) of mutually exclusive categories. Each group is formed to minimize internal variation, which means that tracks within the same cluster share similar performance-defining features.
K-Means was chosen for several reasons:
It performs particularly well with standardized, continuous numerical features such as the ones used here.
It is interpretable and repeatable, creating clearly separated groups that can be easily explained and analyzed.
It offers straightforward implementation and visual inspection, enabling rapid testing of different assumptions and configurations.
Before clustering, all features were standardized to ensure equal weight and to prevent metrics with large values from dominating the outcome.
We then applied K-Means with a range of values for k (2 to 9), evaluating each with two key diagnostics:
The elbow method, which tracks the reduction in clustering error (inertia) as more groups are added.
The silhouette score, which measures how well-defined each cluster is. A higher score means better-defined and more cohesive clusters.
While the silhouette score peaked at k=2, the elbow method showed diminishing returns after k=3. Choosing k=3 provided a practical balance between statistical performance and analytical usefulness. It also allowed us to preserve the distinctiveness of Circuit de Monaco, which otherwise gets grouped inappropriately.
Details on how the clustering was performed can be found at my Github page, at this repo: https://github.com/cld05/F1_tracks_clustering
Each feature in the analysis is selected for its relevance to real-world circuit dynamics:
Abrasion is expected to differentiate tracks with high tire wear (e.g., Spa, Monza) from smoother surfaces (e.g., Monaco).
Corner Density is expected to identify tight, technical tracks with frequent direction changes.
Corner Entropy would highlight whether tracks offer a diverse challenge or are biased toward certain corner types.
Flow Score, refined through multiple iterations, is designed to capture the overall rhythm of the circuit—tracks with long straights and flowing sections versus those with constant stops and starts.
Using k=3, the clustering resulted in the groups described in the following subsections.
Balanced, modern tracks with a moderate mix of corner types and medium flow.
Tracks:
Yas Marina Circuit
Circuit of The Americas
Albert Park Circuit
Baku City Circuit
Circuit Gilles Villeneuve
Shanghai International Circuit
Hungaroring
Imola Circuit
Las Vegas Street Circuit
Autodromo Hermanos Rodriguez
Miami International Autodrome
Circuit Zandvoort
Lusail International Circuit
Interlagos Circuit
Jeddah Corniche Circuit
Silverstone Circuit
Marina Bay Street Circuit
Profile:
Moderate abrasion (2.24)
Moderate corner entropy (1.53)
High corner density (3.45)
Low flow score (0.18)
Interpretation:
This is the most populated group, encompassing modern circuits with a balanced mix of technical challenges—tight sections, medium-speed sequences, and moderate grip demand. The slightly elevated corner density suggests layouts that require high driver consistency and precise setup, while the lower flow score reflects more frequent braking zones and direction changes. These are tracks where adaptability and setup versatility matter most.
Historic or semi-permanent tracks known for speed, high tire wear, and flowing sections.
Tracks:
Red Bull Ring
Bahrain International Circuit
Circuit de Barcelona-Catalunya
Circuit de Spa-Francorchamps
Monza Circuit
Suzuka International Racing Course
Profile:
Highest abrasion (3.83)
Moderate corner entropy (1.45)
Lower corner density (2.60)
Highest flow score (0.33)
Interpretation:
These iconic venues are known for speed, flow, and strategy. They favor aerodynamic efficiency and tire management. High flow score combined with lower corner density tells us that drivers spend more time at high speeds, and teams must manage degradation aggressively. These tracks challenge teams to balance raw pace with tire longevity, making them ideal for performance benchmarking and long-run strategy comparisons.
A singular cluster reflecting Monaco’s unique and technical low-speed profile.
Track:
Circuit de Monaco
Profile:
Lowest abrasion (1.00)
Lowest corner entropy (0.98)
Highest corner density (5.69)
Lowest flow score (0.05)
Interpretation:
This cluster contains only Monaco—a statistical and tactical outlier. It is characterized by an overwhelming density of low-speed corners and minimal variation in corner type, as reflected in the lowest entropy score. The near-zero flow score confirms its stop-start rhythm. It rewards precision and track position over outright speed and punishes the smallest mistakes. Strategically, it is in a class of its own and justifies its standalone grouping.
This segmentation offers a repeatable framework to anticipate strategic challenges based on track profile—supporting performance reviews, simulator planning, or even predictive modeling.
By simplifying complexity into three track types, it helps teams, analysts, and decision-makers abstract away from anecdotal analysis and focus on structured, comparable characteristics.
Finally, it demonstrates how clustering and PCA—when used thoughtfully—can turn disparate data into high-impact insights, relevant beyond motorsport in any scenario where infrastructure and performance interact.
This clustering analysis of Formula 1 circuits provides a structured, quantitative view of track characteristics:
Three distinct types of circuits were identified
Monaco’s uniqueness was confirmed both statistically and visually
This model offers useful insight for fans, engineers, and strategists alike
Future work could explore driver performance across clusters or compare seasonal results by cluster type. Clustering also has potential for tire strategy predictions and car setup optimization frameworks.
Loaded data from 2025 F1 season tracks.
(This data were collected from Pirelli Tyre cards and technical data available on the Wikipedia page of each track).
Scaled features.
Reduced dimensionality with Principal Components Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP) for visual inspection
Applied K-Means clustering with multiple k values (2 to 9) (number of clusters).
Evaluated each with both inertia (elbow method) and silhouette score
The elbow point was around k=3 where inertia dropped sharply and began to level off
Silhouette score peaked at k=2 (0.609) but dropped to 0.421 at k=3 and continued to decline afterward
Despite the numerical drop, k=3 was chosen because:
It captured the unique nature of Circuit de Monaco (an outlier deserving its own cluster)
It preserved meaningful granularity among the remaining tracks
The GitHub code used for the analysis can be found here: https://github.com/cld05/F1_tracks_clustering.
A summary of the results is reported here by the following plots.
This chart evaluates different numbers of clusters (k) using two methods: the Elbow Method (blue) and the Silhouette Score (green). Inertia measures how tightly grouped each cluster is; silhouette score measures how well-separated clusters are. While k=2 achieves the highest silhouette score, k=3 represents a practical balance: it preserves distinctiveness (e.g., isolating Monaco) while keeping cluster cohesion. The curve’s bend at k=3 (elbow) suggests diminishing returns beyond this point, supporting the decision to segment the circuits into three strategic groups.
This Principal Component Analysis (PCA) plot visualizes how each circuit is positioned in a reduced two-dimensional space based on the four selected features. Cluster 0 circuits group tightly in the center, suggesting a well-balanced profile. Cluster 1 spreads more widely, indicating greater feature variability. Cluster 2—Monaco—sits distinctly apart, confirming its statistical uniqueness. The two PCA axes capture over 78% of the data’s variance, offering a reliable spatial representation of cluster separability and reinforcing the credibility of the classification framework.
This Uniform Manifold Approximation and Projection (UMAP) plot offers a non-linear projection of the circuit clusters, optimizing for local similarity rather than global structure. It confirms the same three-cluster segmentation: Cluster 0 circuits form a cohesive group, Cluster 1 circuits cluster separately with tighter internal proximity, and Cluster 2 (Monaco) is again shown as an outlier. UMAP’s strength lies in highlighting relationships that are not easily captured in linear projections, adding confidence to the overall consistency and robustness of the clusters uncovered by the analysis.
This heatmap shows the average feature values across the three identified circuit clusters. Cluster 0 exhibits moderate values across the board, suggesting versatility. Cluster 1 has high abrasion and flow scores, indicative of fast, historic circuits. Cluster 2—Monaco alone—stands out with extremely high corner density and minimal flow. This visualization confirms how specific features drive clustering and showcases the unique identity of each group. It underpins the credibility of the segmentation by revealing consistent, interpretable differences across all four performance dimensions.