Scraping the web with BeautifulSoup#
We are going to get information out of websites using requests
and beautifulsoup
.
Installation#
With conda, you can install the required dependencies with:
conda install beautifulsoup4 requests
or
python3 -m pip install beautifulsoup4 requests
Basic usage of BeautifulSoup#
First, we import the BeatifulSoup
class:
from bs4 import BeautifulSoup
We load the html source file from disk and pass the contents to the BeautifulSoup constructor.
with open("list.html") as f:
html = f.read()
document = BeautifulSoup(html, "html.parser")
print(html)
<!doctype html>
<html>
<head>
<title>Sample HTML document</title>
</head>
<body>
<h2>An Unordered HTML List</h2>
<ul id="unordered_list" style="color: #f0e">
<li>Coffee</li>
<li>Tea</li>
<li>Milk</li>
</ul>
<h2>An Ordered HTML List</h2>
<ol id="ordered_list" style="color: rgb(20, 200, 100)">
<li>First</li>
<li>Second</li>
<li>Third</li>
</ol>
</body>
</html>
from IPython.display import HTML
HTML(html)
An Unordered HTML List
- Coffee
- Tea
- Milk
An Ordered HTML List
- First
- Second
- Third
Accessing attributes#
The ul
tag also contains a style
attribute. Any bs4 tag behaves like a dictionary with attribute names as keys and attribute values as values:
ulist.attrs
{'id': 'unordered_list', 'style': 'color: #f0e'}
ulist["style"]
'color: #f0e'
Downloading a table from Wikipedia#
We aim to get a list of countries sorted by their population size: https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population
First, let’s import the required modules:
import re
import dateutil
import requests
from bs4 import BeautifulSoup
This time, we load the html directly from a website using the requests module:
url = "https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population"
r = requests.get(url)
url
'https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population'
The web server returns a status code to indicate if the request was (un-)succesfully. We use that status-code to check if the page was succesfully loaded:
assert r.status_code == 200
Next, we extract the html source and initiated BeautifulSoup:
html = r.text
document = BeautifulSoup(html, "html.parser")
by looking at the document, we can see that we are interested in first table. So we use find
:
table = document.find("table", class_="wikitable")
If you are not familiar with html table, read this example first: https://www.w3schools.com/html/tryit.asp?filename=tryhtml_table_intro
print(str(table)[:1024])
<table class="wikitable sortable" style="text-align:right">
<tbody><tr class="is-sticky">
<th></th>
<th style="width:17em"><a href="/wiki/List_of_sovereign_states" title="List of sovereign states">Country</a> / <a href="/wiki/Dependent_territory" title="Dependent territory">Dependency</a></th>
<th>Population</th>
<th style="width:2em">% of<br/>world</th>
<th>Date</th>
<th><span class="nowrap">Source (official or from</span><br/>the <a href="/wiki/United_Nations" title="United Nations">United Nations</a>)</th>
<th class="unsortable">
</th></tr>
<tr>
<th>–
</th>
<td style="text-align:left"><b>World</b>
</td>
<td><b>8,063,588,000</b></td>
<td><b>100%</b></td>
<td><b><span data-sort-value="000000002023-10-04-0000" style="white-space:nowrap">4 Oct 2023</span></b>
</td>
<td style="text-align:left"><b>UN projection</b><sup class="reference" id="cite_ref-unpop_4-0"><a href="#cite_note-unpop-4">[3]</a></sup></td>
<td>
</td></tr>
<tr>
<th>1
</th>
<td style="text-align:left"><span class="flagicon"><span class="mw-image-
At this point, it is a good idea to programatically check that the table contains the correct header:
header = " ".join([th.get_text(strip=True) for th in table.find_all("th")])
assert "Population" in header
header
' Country/Dependency Population % ofworld Date Source (official or fromtheUnited Nations) – 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 – 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 – 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 – 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 – 149 150 151 152 153 154 155 156 157 158 159 160 161 162 – 163 164 165 – 166 167 168 169 170 171 – 172 – 173 – – 174 – 175 176 177 – – – 178 179 180 – 181 – 182 183 184 – – 185 – 186 – – – – – – – 187 – – 188 189 190 – 191 – – – 192 – – 193 – 194 – – – – – – – – – 195 – –'
Exercise#
extract the information from the table
get the rows
find column names
get sensible data from each cell
parse numbers/dates where they show up
rows = table.find_all("tr")
rows[0]
<tr class="is-sticky">
<th></th>
<th style="width:17em"><a href="/wiki/List_of_sovereign_states" title="List of sovereign states">Country</a> / <a href="/wiki/Dependent_territory" title="Dependent territory">Dependency</a></th>
<th>Population</th>
<th style="width:2em">% of<br/>world</th>
<th>Date</th>
<th><span class="nowrap">Source (official or from</span><br/>the <a href="/wiki/United_Nations" title="United Nations">United Nations</a>)</th>
<th class="unsortable">
</th></tr>
column_names = [th.get_text(strip=True) for th in rows[0].find_all("th")]
column_names
['',
'Country/Dependency',
'Population',
'% ofworld',
'Date',
'Source (official or fromtheUnited Nations)',
'']
last_rank = 0
for row in rows[1:]:
cells = row.find_all(["th", "td"])
if not cells:
continue
cells_text = [cell.get_text(strip=True) for cell in cells]
rank, country, population, percentage, updated_at, source, *comment = cells_text
if not rank.isdigit():
rank = last_rank
else:
last_rank = rank
rank = int(rank)
population = int(population.replace(",", ""))
percentage = float(re.findall(r"[\d\.]+", percentage)[0]) / 100
updated_at = dateutil.parser.parse(updated_at).date()
print(rank, country, f"{population:,.2e}", f"{percentage:.1%}", updated_at)
0 World 8.06e+09 100.0% 2023-10-04
1 China 1.41e+09 17.5% 2022-12-31
2 India 1.39e+09 17.3% 2023-03-01
3 United States 3.35e+08 4.2% 2023-10-04
4 Indonesia 2.79e+08 3.5% 2023-07-01
5 Pakistan 2.41e+08 3.0% 2023-03-01
6 Nigeria 2.17e+08 2.7% 2022-03-21
7 Brazil 2.03e+08 2.5% 2022-08-01
8 Bangladesh 1.70e+08 2.1% 2022-06-14
9 Russia 1.46e+08 1.8% 2023-01-01
10 Mexico 1.29e+08 1.6% 2023-06-30
11 Japan 1.24e+08 1.5% 2023-09-01
12 Philippines 1.11e+08 1.4% 2023-10-04
13 Ethiopia 1.07e+08 1.3% 2023-07-01
14 Egypt 1.05e+08 1.3% 2023-10-04
15 Vietnam 1.00e+08 1.2% 2023-04-04
16 DR Congo 9.54e+07 1.2% 2019-07-01
17 Iran 8.53e+07 1.1% 2023-10-04
18 Turkey 8.53e+07 1.1% 2022-12-31
19 Germany 8.45e+07 1.0% 2023-06-30
20 Thailand 6.83e+07 0.8% 2021-07-01
21 France 6.82e+07 0.8% 2023-09-01
22 United Kingdom 6.70e+07 0.8% 2021-06-30
23 Tanzania 6.17e+07 0.8% 2022-08-23
24 South Africa 6.06e+07 0.8% 2022-07-01
25 Italy 5.88e+07 0.7% 2023-07-31
26 Myanmar 5.58e+07 0.7% 2022-07-01
27 Colombia 5.22e+07 0.6% 2023-06-30
28 Kenya 5.15e+07 0.6% 2023-01-01
29 South Korea 5.14e+07 0.6% 2022-12-31
30 Spain 4.83e+07 0.6% 2023-07-01
31 Argentina 4.67e+07 0.6% 2023-07-01
32 Algeria 4.54e+07 0.6% 2022-01-01
33 Iraq 4.33e+07 0.5% 2023-07-01
34 Uganda 4.29e+07 0.5% 2021-07-01
35 Sudan 4.20e+07 0.5% 2018-07-01
36 Ukraine 4.11e+07 0.5% 2022-02-01
37 Canada 4.04e+07 0.5% 2023-10-04
38 Poland 3.77e+07 0.5% 2023-07-31
39 Morocco 3.71e+07 0.5% 2023-10-04
40 Uzbekistan 3.64e+07 0.5% 2023-07-01
41 Afghanistan 3.43e+07 0.4% 2023-01-01
42 Peru 3.34e+07 0.4% 2022-07-01
43 Malaysia 3.34e+07 0.4% 2023-06-30
44 Angola 3.31e+07 0.4% 2022-06-30
45 Mozambique 3.24e+07 0.4% 2022-07-01
46 Saudi Arabia 3.22e+07 0.4% 2022-05-10
47 Yemen 3.19e+07 0.4% 2022-07-01
48 Ghana 3.08e+07 0.4% 2021-06-27
49 Ivory Coast 2.94e+07 0.4% 2021-12-14
50 Nepal 2.92e+07 0.4% 2021-11-25
51 Venezuela 2.83e+07 0.4% 2019-06-30
52 Cameroon 2.81e+07 0.3% 2023-07-01
53 Madagascar 2.69e+07 0.3% 2021-07-01
54 Australia 2.68e+07 0.3% 2023-10-04
55 North Korea 2.57e+07 0.3% 2021-07-01
56 Niger 2.54e+07 0.3% 2023-07-01
56 Taiwan 2.34e+07 0.3% 2023-08-31
57 Syria 2.29e+07 0.3% 2021-07-01
58 Mali 2.24e+07 0.3% 2022-06-15
59 Burkina Faso 2.22e+07 0.3% 2022-07-01
60 Sri Lanka 2.20e+07 0.3% 2023-07-01
61 Malawi 2.15e+07 0.3% 2022-07-01
62 Chile 2.00e+07 0.2% 2023-06-30
63 Kazakhstan 1.99e+07 0.2% 2023-09-01
64 Zambia 1.96e+07 0.2% 2022-09-14
65 Romania 1.91e+07 0.2% 2023-01-01
66 Ecuador 1.84e+07 0.2% 2023-10-04
67 Senegal 1.83e+07 0.2% 2023-07-01
68 Somalia 1.81e+07 0.2% 2023-07-01
69 Netherlands 1.79e+07 0.2% 2023-10-04
70 Guatemala 1.76e+07 0.2% 2023-07-01
71 Chad 1.74e+07 0.2% 2022-07-01
72 Cambodia 1.71e+07 0.2% 2023-07-01
73 Zimbabwe 1.52e+07 0.2% 2022-04-20
74 Guinea 1.33e+07 0.2% 2022-07-01
75 South Sudan 1.32e+07 0.2% 2020-07-01
76 Rwanda 1.32e+07 0.2% 2022-08-15
77 Burundi 1.28e+07 0.2% 2022-07-01
78 Benin 1.26e+07 0.2% 2023-07-01
79 Bolivia 1.20e+07 0.1% 2022-07-01
80 Tunisia 1.19e+07 0.1% 2023-01-01
81 Papua New Guinea 1.18e+07 0.1% 2021-07-01
82 Belgium 1.18e+07 0.1% 2023-08-01
83 Haiti 1.17e+07 0.1% 2020-07-01
84 Jordan 1.15e+07 0.1% 2023-10-04
85 Cuba 1.11e+07 0.1% 2022-12-31
86 Czech Republic 1.09e+07 0.1% 2023-06-30
87 Sweden 1.05e+07 0.1% 2023-08-01
88 Dominican Republic 1.05e+07 0.1% 2021-07-01
89 Greece 1.05e+07 0.1% 2021-10-22
90 Portugal 1.05e+07 0.1% 2022-12-31
91 Azerbaijan 1.02e+07 0.1% 2023-07-01
92 Tajikistan 1.01e+07 0.1% 2023-01-01
93 Israel 9.80e+06 0.1% 2023-07-31
94 Honduras 9.75e+06 0.1% 2023-07-01
95 Hungary 9.60e+06 0.1% 2023-01-01
96 United Arab Emirates 9.28e+06 0.1% 2020-12-31
97 Belarus 9.20e+06 0.1% 2023-01-01
98 Austria 9.13e+06 0.1% 2023-07-01
99 Switzerland 8.90e+06 0.1% 2023-06-30
100 Sierra Leone 8.49e+06 0.1% 2022-07-01
101 Togo 8.10e+06 0.1% 2022-11-08
101 Hong Kong(China) 7.50e+06 0.1% 2023-06-30
102 Laos 7.44e+06 0.1% 2022-07-01
103 Kyrgyzstan 7.10e+06 0.1% 2023-03-01
104 Turkmenistan 7.06e+06 0.1% 2022-12-17
105 Libya 6.93e+06 0.1% 2020-01-01
106 El Salvador 6.88e+06 0.1% 2022-07-01
107 Nicaragua 6.73e+06 0.1% 2022-06-30
108 Serbia 6.65e+06 0.1% 2022-10-31
109 Bulgaria 6.45e+06 0.1% 2022-12-31
110 Paraguay 6.11e+06 0.1% 2022-11-10
111 Congo 6.11e+06 0.1% 2023-07-01
112 Denmark 5.94e+06 0.1% 2023-07-01
113 Singapore 5.92e+06 0.1% 2023-06-30
114 Central African Republic 5.63e+06 0.1% 2020-07-01
115 Finland 5.56e+06 0.1% 2023-08-31
116 Norway 5.51e+06 0.1% 2023-06-30
117 Lebanon 5.49e+06 0.1% 2021-07-01
118 Palestine 5.48e+06 0.1% 2023-01-01
119 Slovakia 5.43e+06 0.1% 2023-06-30
120 Ireland 5.28e+06 0.1% 2023-04-01
121 Costa Rica 5.26e+06 0.1% 2023-06-30
122 New Zealand 5.22e+06 0.1% 2023-06-30
123 Oman 5.11e+06 0.1% 2023-08-31
124 Kuwait 4.67e+06 0.1% 2020-12-31
125 Liberia 4.66e+06 0.1% 2021-07-01
126 Mauritania 4.48e+06 0.1% 2023-07-01
127 Panama 4.34e+06 0.1% 2021-07-01
128 Croatia 3.86e+06 0.1% 2022-07-01
129 Eritrea 3.75e+06 0.1% 2023-07-01
130 Georgia 3.74e+06 0.1% 2023-01-01
131 Uruguay 3.57e+06 0.0% 2023-06-30
132 Mongolia 3.46e+06 0.0% 2022-12-31
133 Bosnia and Herzegovina 3.28e+06 0.0% 2022-07-01
133 Puerto Rico(US) 3.22e+06 0.0% 2022-07-01
134 Armenia 2.98e+06 0.0% 2023-01-01
135 Lithuania 2.87e+06 0.0% 2023-09-01
136 Jamaica 2.83e+06 0.0% 2019-07-01
137 Albania 2.76e+06 0.0% 2023-01-01
138 Qatar 2.66e+06 0.0% 2023-06-30
139 Namibia 2.64e+06 0.0% 2023-07-01
140 Moldova 2.51e+06 0.0% 2023-01-01
141 Gambia 2.42e+06 0.0% 2022-07-01
142 Botswana 2.41e+06 0.0% 2021-07-01
143 Lesotho 2.31e+06 0.0% 2023-07-01
144 Gabon 2.23e+06 0.0% 2021-07-01
145 Slovenia 2.12e+06 0.0% 2023-04-01
146 Latvia 1.88e+06 0.0% 2023-08-01
147 North Macedonia 1.83e+06 0.0% 2021-11-01
148 Guinea-Bissau 1.78e+06 0.0% 2023-07-01
148 Kosovo 1.77e+06 0.0% 2021-12-31
149 Bahrain 1.58e+06 0.0% 2023-07-01
150 Equatorial Guinea 1.56e+06 0.0% 2022-07-01
151 Estonia 1.37e+06 0.0% 2023-01-01
152 Trinidad and Tobago 1.37e+06 0.0% 2022-06-30
153 East Timor 1.35e+06 0.0% 2023-07-01
154 Mauritius 1.26e+06 0.0% 2023-06-30
155 Eswatini 1.22e+06 0.0% 2023-07-01
156 Djibouti 1.00e+06 0.0% 2022-07-01
157 Cyprus 9.18e+05 0.0% 2021-10-01
158 Fiji 8.93e+05 0.0% 2021-07-01
159 Bhutan 7.70e+05 0.0% 2023-10-04
160 Comoros 7.58e+05 0.0% 2017-12-15
161 Guyana 7.44e+05 0.0% 2019-07-01
162 Solomon Islands 7.35e+05 0.0% 2023-07-01
162 Macau(China) 6.79e+05 0.0% 2023-06-30
163 Luxembourg 6.61e+05 0.0% 2023-01-01
164 Montenegro 6.17e+05 0.0% 2023-01-01
165 Suriname 6.16e+05 0.0% 2021-07-01
165 Western Sahara 5.87e+05 0.0% 2023-07-01
166 Malta 5.20e+05 0.0% 2021-11-21
167 Cape Verde 4.91e+05 0.0% 2021-06-16
168 Brunei 4.45e+05 0.0% 2022-07-01
169 Belize 4.41e+05 0.0% 2022-07-01
170 Bahamas 3.97e+05 0.0% 2022-07-01
171 Iceland 3.94e+05 0.0% 2023-07-01
171 Northern Cyprus 3.83e+05 0.0% 2020-12-31
172 Maldives 3.83e+05 0.0% 2022-09-13
172 Transnistria 3.61e+05 0.0% 2022-12-31
173 Vanuatu 3.01e+05 0.0% 2021-07-01
173 French Polynesia(France) 2.80e+05 0.0% 2021-07-01
173 New Caledonia(France) 2.69e+05 0.0% 2023-01-01
174 Barbados 2.68e+05 0.0% 2022-12-31
174 Abkhazia 2.45e+05 0.0% 2020-01-01
175 São Tomé and Príncipe 2.15e+05 0.0% 2021-07-01
176 Samoa 2.06e+05 0.0% 2021-11-06
177 Saint Lucia 1.79e+05 0.0% 2018-07-01
177 Guam(US) 1.54e+05 0.0% 2020-04-01
177 Curacao(Netherlands) 1.49e+05 0.0% 2023-01-01
177 Artsakh 1.49e+05 0.0% 2019-10-01
178 Kiribati 1.21e+05 0.0% 2021-07-01
179 Grenada 1.13e+05 0.0% 2019-07-01
180 Saint Vincent and the Grenadines 1.11e+05 0.0% 2022-07-01
180 Aruba(Netherlands) 1.07e+05 0.0% 2022-09-30
181 Micronesia 1.06e+05 0.0% 2021-07-01
181 Jersey(UK) 1.03e+05 0.0% 2021-03-21
182 Antigua and Barbuda 1.01e+05 0.0% 2022-01-01
183 Seychelles 1.00e+05 0.0% 2022-04-22
184 Tonga 1.00e+05 0.0% 2022-01-01
184 US Virgin Islands(US) 8.71e+04 0.0% 2020-04-01
184 Isle of Man(UK) 8.41e+04 0.0% 2021-05-30
185 Andorra 8.35e+04 0.0% 2023-06-30
185 Cayman Islands(UK) 7.11e+04 0.0% 2020-09-30
186 Dominica 6.74e+04 0.0% 2017-12-31
186 Guernsey(UK) 6.42e+04 0.0% 2022-09-30
186 Bermuda(UK) 6.41e+04 0.0% 2021-07-01
186 Greenland(Denmark) 5.69e+04 0.0% 2023-07-01
186 South Ossetia 5.65e+04 0.0% 2021-12-31
186 Faroe Islands(Denmark) 5.47e+04 0.0% 2023-08-01
186 American Samoa(US) 4.97e+04 0.0% 2020-04-01
186 Northern Mariana Islands(US) 4.73e+04 0.0% 2020-04-01
187 Saint Kitts and Nevis 4.72e+04 0.0% 2011-05-15
187 Turks and Caicos Islands(UK) 4.61e+04 0.0% 2021-07-01
187 Sint Maarten(Netherlands) 4.29e+04 0.0% 2023-01-01
188 Marshall Islands 4.24e+04 0.0% 2021-09-30
189 Liechtenstein 3.97e+04 0.0% 2022-12-31
190 Monaco 3.90e+04 0.0% 2022-12-31
190 Gibraltar(UK) 3.40e+04 0.0% 2016-07-01
191 San Marino 3.39e+04 0.0% 2023-07-31
191 Saint Martin(France) 3.24e+04 0.0% 2020-01-01
191 British Virgin Islands(UK) 3.15e+04 0.0% 2023-07-01
191 Åland(Finland) 3.06e+04 0.0% 2023-08-31
192 Palau 1.67e+04 0.0% 2021-07-01
192 Anguilla(UK) 1.57e+04 0.0% 2021-12-31
192 Cook Islands 1.50e+04 0.0% 2021-07-01
193 Nauru 1.18e+04 0.0% 2021-07-01
193 Wallis and Futuna(France) 1.14e+04 0.0% 2021-01-01
194 Tuvalu 1.07e+04 0.0% 2021-07-01
194 Saint Barthélemy(France) 1.06e+04 0.0% 2020-01-01
194 Saint Pierre and Miquelon(France) 6.09e+03 0.0% 2020-01-01
194 Saint Helena, Ascension and Tristan da Cunha(UK) 5.65e+03 0.0% 2021-07-01
194 Montserrat(UK) 4.43e+03 0.0% 2022-07-01
194 Falkland Islands(UK) 3.66e+03 0.0% 2021-10-10
194 Norfolk Island(Australia) 2.19e+03 0.0% 2021-01-01
194 Christmas Island(Australia) 1.69e+03 0.0% 2021-01-01
194 Tokelau(NZ) 1.65e+03 0.0% 2019-01-01
194 Niue 1.55e+03 0.0% 2021-07-01
195 Vatican City 7.64e+02 0.0% 2023-06-26
195 Cocos (Keeling) Islands(Australia) 5.93e+02 0.0% 2020-06-30
195 Pitcairn Islands(UK) 4.70e+01 0.0% 2021-07-01
Attention: Beautiful Soup does not execute Javascript. This means that you the code in your browser inspector might look a bit different from the original HTML source code.
Another example of downloading a Wikipedia table#
Let’s consider another table in a Wikipedia page. This page has a lot more tables, so one challenge will be to pick the right table
https://en.wikipedia.org/wiki/Serena_Williams
We are interested in extracting these two tables:
Exercise:
Find the tables on a page by locating heading and using .find_next()
We begin by downloading the webpage and instatiating the BeautifulSoup object:
r = requests.get("https://en.wikipedia.org/wiki/Serena_Williams")
document = BeautifulSoup(r.text, "html.parser")
This page contains a lot of tables without specific attributes that would make it easy to find our table of interest. Further, the same headings of the tables are used for multiple tables, making it difficult to find a table just by its headings:
len(document.find_all("table"))
75
Therefore, we choose another strategy.
First, we find the tag with class mw-headling
whose string
content starts with Singles
.
Then we find the next table using heading_element.find_next(...)
:
document.find_all(class_="mw-headline", string=re.compile("^Singles"))
[<span class="mw-headline" id="Singles:_33_(23–10)">Singles: 33 (23–10)</span>]
# string class
singles_heading = document.find(class_="mw-headline", string=re.compile("^Singles"))
singles_heading
<span class="mw-headline" id="Singles:_33_(23–10)">Singles: 33 (23–10)</span>
singles_heading.find_next("table")
<table class="sortable wikitable">
<tbody><tr>
<th>Result
</th>
<th>Year
</th>
<th>Tournament
</th>
<th>Surface
</th>
<th>Opponents
</th>
<th class="unsortable">Score
</th></tr>
<tr style="background:#ccf;">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/1999_US_Open_%E2%80%93_Women%27s_singles" title="1999 US Open – Women's singles">1999</a></td>
<td><a href="/wiki/US_Open_(tennis)" title="US Open (tennis)">US Open</a></td>
<td><a class="mw-redirect" href="/wiki/Hard_court" title="Hard court">Hard</a></td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Switzerland" title="Switzerland"><img alt="Switzerland" class="mw-file-element" data-file-height="512" data-file-width="512" decoding="async" height="16" src="//upload.wikimedia.org/wikipedia/commons/thumb/0/08/Flag_of_Switzerland_%28Pantone%29.svg/16px-Flag_of_Switzerland_%28Pantone%29.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/0/08/Flag_of_Switzerland_%28Pantone%29.svg/24px-Flag_of_Switzerland_%28Pantone%29.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/0/08/Flag_of_Switzerland_%28Pantone%29.svg/32px-Flag_of_Switzerland_%28Pantone%29.svg.png 2x" width="16"/></a></span></span> <a href="/wiki/Martina_Hingis" title="Martina Hingis">Martina Hingis</a></td>
<td>6–3, 7–6<sup>(7–4)</sup>
</td></tr>
<tr style="background:#ccf;">
<td style="background:#ffa07a;">Loss</td>
<td><a href="/wiki/2001_US_Open_%E2%80%93_Women%27s_singles" title="2001 US Open – Women's singles">2001</a></td>
<td>US Open</td>
<td>Hard</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/United_States" title="United States"><img alt="United States" class="mw-file-element" data-file-height="650" data-file-width="1235" decoding="async" height="12" src="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/23px-Flag_of_the_United_States.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/35px-Flag_of_the_United_States.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/46px-Flag_of_the_United_States.svg.png 2x" width="23"/></a></span></span> <a href="/wiki/Venus_Williams" title="Venus Williams">Venus Williams</a></td>
<td>2–6, 4–6
</td></tr>
<tr bgcolor="#ebc2af" style="border: 2px solid blue">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2002_French_Open_%E2%80%93_Women%27s_singles" title="2002 French Open – Women's singles">2002</a></td>
<td>French Open</td>
<td><a href="/wiki/Clay_court" title="Clay court">Clay</a></td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/United_States" title="United States"><img alt="United States" class="mw-file-element" data-file-height="650" data-file-width="1235" decoding="async" height="12" src="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/23px-Flag_of_the_United_States.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/35px-Flag_of_the_United_States.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/46px-Flag_of_the_United_States.svg.png 2x" width="23"/></a></span></span> Venus Williams</td>
<td>7–5, 6–3
</td></tr>
<tr bgcolor="#cfc" style="border: 2px solid blue">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2002_Wimbledon_Championships_%E2%80%93_Women%27s_singles" title="2002 Wimbledon Championships – Women's singles">2002</a></td>
<td><a class="mw-redirect" href="/wiki/The_Championships,_Wimbledon" title="The Championships, Wimbledon">Wimbledon</a></td>
<td><a href="/wiki/Grass_court" title="Grass court">Grass</a></td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/United_States" title="United States"><img alt="United States" class="mw-file-element" data-file-height="650" data-file-width="1235" decoding="async" height="12" src="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/23px-Flag_of_the_United_States.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/35px-Flag_of_the_United_States.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/46px-Flag_of_the_United_States.svg.png 2x" width="23"/></a></span></span> Venus Williams</td>
<td>7–6<sup>(7–4)</sup>, 6–3
</td></tr>
<tr bgcolor="#ccf" style="border: 2px solid blue">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2002_US_Open_%E2%80%93_Women%27s_singles" title="2002 US Open – Women's singles">2002</a></td>
<td>US Open <small>(2)</small></td>
<td>Hard</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/United_States" title="United States"><img alt="United States" class="mw-file-element" data-file-height="650" data-file-width="1235" decoding="async" height="12" src="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/23px-Flag_of_the_United_States.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/35px-Flag_of_the_United_States.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/46px-Flag_of_the_United_States.svg.png 2x" width="23"/></a></span></span> Venus Williams</td>
<td>6–4, 6–3
</td></tr>
<tr bgcolor="#ffc" style="border: 2px solid blue">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2003_Australian_Open_%E2%80%93_Women%27s_singles" title="2003 Australian Open – Women's singles">2003</a></td>
<td>Australian Open</td>
<td>Hard</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/United_States" title="United States"><img alt="United States" class="mw-file-element" data-file-height="650" data-file-width="1235" decoding="async" height="12" src="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/23px-Flag_of_the_United_States.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/35px-Flag_of_the_United_States.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/46px-Flag_of_the_United_States.svg.png 2x" width="23"/></a></span></span> Venus Williams</td>
<td>7–6<sup>(7–4)</sup>, 3–6, 6–4
</td></tr>
<tr style="background:#cfc;">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2003_Wimbledon_Championships_%E2%80%93_Women%27s_singles" title="2003 Wimbledon Championships – Women's singles">2003</a></td>
<td>Wimbledon <small>(2)</small></td>
<td>Grass</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/United_States" title="United States"><img alt="United States" class="mw-file-element" data-file-height="650" data-file-width="1235" decoding="async" height="12" src="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/23px-Flag_of_the_United_States.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/35px-Flag_of_the_United_States.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/46px-Flag_of_the_United_States.svg.png 2x" width="23"/></a></span></span> Venus Williams</td>
<td>4–6, 6–4, 6–2
</td></tr>
<tr style="background:#cfc;">
<td style="background:#ffa07a;">Loss</td>
<td><a href="/wiki/2004_Wimbledon_Championships_%E2%80%93_Women%27s_singles" title="2004 Wimbledon Championships – Women's singles">2004</a></td>
<td>Wimbledon</td>
<td>Grass</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Russia" title="Russia"><img alt="Russia" class="mw-file-element" data-file-height="600" data-file-width="900" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/en/thumb/f/f3/Flag_of_Russia.svg/23px-Flag_of_Russia.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/f/f3/Flag_of_Russia.svg/35px-Flag_of_Russia.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/f/f3/Flag_of_Russia.svg/45px-Flag_of_Russia.svg.png 2x" width="23"/></a></span></span> <a href="/wiki/Maria_Sharapova" title="Maria Sharapova">Maria Sharapova</a></td>
<td>1–6, 4–6
</td></tr>
<tr style="background:#ffc;">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2005_Australian_Open_%E2%80%93_Women%27s_singles" title="2005 Australian Open – Women's singles">2005</a></td>
<td>Australian Open <small>(2)</small></td>
<td>Hard</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/United_States" title="United States"><img alt="United States" class="mw-file-element" data-file-height="650" data-file-width="1235" decoding="async" height="12" src="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/23px-Flag_of_the_United_States.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/35px-Flag_of_the_United_States.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/46px-Flag_of_the_United_States.svg.png 2x" width="23"/></a></span></span> <a href="/wiki/Lindsay_Davenport" title="Lindsay Davenport">Lindsay Davenport</a></td>
<td>2–6, 6–3, 6–0
</td></tr>
<tr style="background:#ffc;">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2007_Australian_Open_%E2%80%93_Women%27s_singles" title="2007 Australian Open – Women's singles">2007</a></td>
<td>Australian Open <small>(3)</small></td>
<td>Hard</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Russia" title="Russia"><img alt="Russia" class="mw-file-element" data-file-height="600" data-file-width="900" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/en/thumb/f/f3/Flag_of_Russia.svg/23px-Flag_of_Russia.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/f/f3/Flag_of_Russia.svg/35px-Flag_of_Russia.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/f/f3/Flag_of_Russia.svg/45px-Flag_of_Russia.svg.png 2x" width="23"/></a></span></span> Maria Sharapova</td>
<td>6–1, 6–2
</td></tr>
<tr style="background:#cfc;">
<td style="background:#ffa07a;">Loss</td>
<td><a href="/wiki/2008_Wimbledon_Championships_%E2%80%93_Women%27s_singles" title="2008 Wimbledon Championships – Women's singles">2008</a></td>
<td>Wimbledon</td>
<td>Grass</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/United_States" title="United States"><img alt="United States" class="mw-file-element" data-file-height="650" data-file-width="1235" decoding="async" height="12" src="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/23px-Flag_of_the_United_States.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/35px-Flag_of_the_United_States.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/46px-Flag_of_the_United_States.svg.png 2x" width="23"/></a></span></span> Venus Williams</td>
<td>5–7, 4–6
</td></tr>
<tr style="background:#ccf;">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2008_US_Open_%E2%80%93_Women%27s_singles" title="2008 US Open – Women's singles">2008</a></td>
<td>US Open <small>(3)</small></td>
<td>Hard</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Serbia" title="Serbia"><img alt="Serbia" class="mw-file-element" data-file-height="600" data-file-width="900" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/0/00/Flag_of_Serbia_%282004%E2%80%932010%29.svg/23px-Flag_of_Serbia_%282004%E2%80%932010%29.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/0/00/Flag_of_Serbia_%282004%E2%80%932010%29.svg/35px-Flag_of_Serbia_%282004%E2%80%932010%29.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/0/00/Flag_of_Serbia_%282004%E2%80%932010%29.svg/45px-Flag_of_Serbia_%282004%E2%80%932010%29.svg.png 2x" width="23"/></a></span></span> <a href="/wiki/Jelena_Jankovi%C4%87" title="Jelena Janković">Jelena Janković</a></td>
<td>6–4, 7–5
</td></tr>
<tr style="background:#ffc;">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2009_Australian_Open_%E2%80%93_Women%27s_singles" title="2009 Australian Open – Women's singles">2009</a></td>
<td>Australian Open <small>(4)</small></td>
<td>Hard</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Russia" title="Russia"><img alt="Russia" class="mw-file-element" data-file-height="600" data-file-width="900" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/en/thumb/f/f3/Flag_of_Russia.svg/23px-Flag_of_Russia.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/f/f3/Flag_of_Russia.svg/35px-Flag_of_Russia.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/f/f3/Flag_of_Russia.svg/45px-Flag_of_Russia.svg.png 2x" width="23"/></a></span></span> <a href="/wiki/Dinara_Safina" title="Dinara Safina">Dinara Safina</a></td>
<td>6–0, 6–3
</td></tr>
<tr style="background:#cfc;">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2009_Wimbledon_Championships_%E2%80%93_Women%27s_singles" title="2009 Wimbledon Championships – Women's singles">2009</a></td>
<td>Wimbledon <small>(3)</small></td>
<td>Grass</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/United_States" title="United States"><img alt="United States" class="mw-file-element" data-file-height="650" data-file-width="1235" decoding="async" height="12" src="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/23px-Flag_of_the_United_States.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/35px-Flag_of_the_United_States.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/46px-Flag_of_the_United_States.svg.png 2x" width="23"/></a></span></span> Venus Williams</td>
<td>7–6<sup>(7–3)</sup>, 6–2
</td></tr>
<tr style="background:#ffc;">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2010_Australian_Open_%E2%80%93_Women%27s_singles" title="2010 Australian Open – Women's singles">2010</a></td>
<td>Australian Open <small>(5)</small></td>
<td>Hard</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Belgium" title="Belgium"><img alt="Belgium" class="mw-file-element" data-file-height="600" data-file-width="900" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/9/92/Flag_of_Belgium_%28civil%29.svg/23px-Flag_of_Belgium_%28civil%29.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/9/92/Flag_of_Belgium_%28civil%29.svg/35px-Flag_of_Belgium_%28civil%29.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/9/92/Flag_of_Belgium_%28civil%29.svg/45px-Flag_of_Belgium_%28civil%29.svg.png 2x" width="23"/></a></span></span> <a href="/wiki/Justine_Henin" title="Justine Henin">Justine Henin</a></td>
<td>6–4, 3–6, 6–2
</td></tr>
<tr style="background:#cfc;">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2010_Wimbledon_Championships_%E2%80%93_Women%27s_singles" title="2010 Wimbledon Championships – Women's singles">2010</a></td>
<td>Wimbledon <small>(4)</small></td>
<td>Grass</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Russia" title="Russia"><img alt="Russia" class="mw-file-element" data-file-height="600" data-file-width="900" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/en/thumb/f/f3/Flag_of_Russia.svg/23px-Flag_of_Russia.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/f/f3/Flag_of_Russia.svg/35px-Flag_of_Russia.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/f/f3/Flag_of_Russia.svg/45px-Flag_of_Russia.svg.png 2x" width="23"/></a></span></span> <a href="/wiki/Vera_Zvonareva" title="Vera Zvonareva">Vera Zvonareva</a></td>
<td>6–3, 6–2
</td></tr>
<tr style="background:#ccf;">
<td style="background:#ffa07a;">Loss</td>
<td><a href="/wiki/2011_US_Open_%E2%80%93_Women%27s_singles" title="2011 US Open – Women's singles">2011</a></td>
<td>US Open</td>
<td>Hard</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Australia" title="Australia"><img alt="Australia" class="mw-file-element" data-file-height="640" data-file-width="1280" decoding="async" height="12" src="//upload.wikimedia.org/wikipedia/commons/thumb/8/88/Flag_of_Australia_%28converted%29.svg/23px-Flag_of_Australia_%28converted%29.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/8/88/Flag_of_Australia_%28converted%29.svg/35px-Flag_of_Australia_%28converted%29.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/8/88/Flag_of_Australia_%28converted%29.svg/46px-Flag_of_Australia_%28converted%29.svg.png 2x" width="23"/></a></span></span> <a href="/wiki/Samantha_Stosur" title="Samantha Stosur">Samantha Stosur</a></td>
<td>2–6, 3–6
</td></tr>
<tr style="background:#cfc;">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2012_Wimbledon_Championships_%E2%80%93_Women%27s_singles" title="2012 Wimbledon Championships – Women's singles">2012</a></td>
<td>Wimbledon <small>(5)</small></td>
<td>Grass</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Poland" title="Poland"><img alt="Poland" class="mw-file-element" data-file-height="800" data-file-width="1280" decoding="async" height="14" src="//upload.wikimedia.org/wikipedia/en/thumb/1/12/Flag_of_Poland.svg/23px-Flag_of_Poland.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/1/12/Flag_of_Poland.svg/35px-Flag_of_Poland.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/1/12/Flag_of_Poland.svg/46px-Flag_of_Poland.svg.png 2x" width="23"/></a></span></span> <a href="/wiki/Agnieszka_Radwa%C5%84ska" title="Agnieszka Radwańska">Agnieszka Radwańska</a></td>
<td>6–1, 5–7, 6–2
</td></tr>
<tr style="background:#ccf;">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2012_US_Open_%E2%80%93_Women%27s_singles" title="2012 US Open – Women's singles">2012</a></td>
<td>US Open <small>(4)</small></td>
<td>Hard</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Belarus" title="Belarus"><img alt="Belarus" class="mw-file-element" data-file-height="600" data-file-width="1200" decoding="async" height="12" src="//upload.wikimedia.org/wikipedia/commons/thumb/8/85/Flag_of_Belarus.svg/23px-Flag_of_Belarus.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/8/85/Flag_of_Belarus.svg/35px-Flag_of_Belarus.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/8/85/Flag_of_Belarus.svg/46px-Flag_of_Belarus.svg.png 2x" width="23"/></a></span></span> <a href="/wiki/Victoria_Azarenka" title="Victoria Azarenka">Victoria Azarenka</a></td>
<td>6–2, 2–6, 7–5
</td></tr>
<tr style="background:#ebc2af;">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2013_French_Open_%E2%80%93_Women%27s_singles" title="2013 French Open – Women's singles">2013</a></td>
<td>French Open <small>(2)</small></td>
<td>Clay</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Russia" title="Russia"><img alt="Russia" class="mw-file-element" data-file-height="600" data-file-width="900" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/en/thumb/f/f3/Flag_of_Russia.svg/23px-Flag_of_Russia.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/f/f3/Flag_of_Russia.svg/35px-Flag_of_Russia.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/f/f3/Flag_of_Russia.svg/45px-Flag_of_Russia.svg.png 2x" width="23"/></a></span></span> Maria Sharapova</td>
<td>6–4, 6–4
</td></tr>
<tr style="background:#ccf;">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2013_US_Open_%E2%80%93_Women%27s_singles" title="2013 US Open – Women's singles">2013</a></td>
<td>US Open <small>(5)</small></td>
<td>Hard</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Belarus" title="Belarus"><img alt="Belarus" class="mw-file-element" data-file-height="600" data-file-width="1200" decoding="async" height="12" src="//upload.wikimedia.org/wikipedia/commons/thumb/8/85/Flag_of_Belarus.svg/23px-Flag_of_Belarus.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/8/85/Flag_of_Belarus.svg/35px-Flag_of_Belarus.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/8/85/Flag_of_Belarus.svg/46px-Flag_of_Belarus.svg.png 2x" width="23"/></a></span></span> Victoria Azarenka</td>
<td>7–5, 6–7<sup>(6–8)</sup>, 6–1
</td></tr>
<tr bgcolor="#ccf" style="border: 2px solid blue">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2014_US_Open_%E2%80%93_Women%27s_singles" title="2014 US Open – Women's singles">2014</a></td>
<td>US Open <small>(6)</small></td>
<td>Hard</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Denmark" title="Denmark"><img alt="Denmark" class="mw-file-element" data-file-height="387" data-file-width="512" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/9/9c/Flag_of_Denmark.svg/20px-Flag_of_Denmark.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/9/9c/Flag_of_Denmark.svg/31px-Flag_of_Denmark.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/9/9c/Flag_of_Denmark.svg/40px-Flag_of_Denmark.svg.png 2x" width="20"/></a></span></span> <a href="/wiki/Caroline_Wozniacki" title="Caroline Wozniacki">Caroline Wozniacki</a></td>
<td>6–3, 6–3
</td></tr>
<tr bgcolor="#ffc" style="border: 2px solid blue">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2015_Australian_Open_%E2%80%93_Women%27s_singles" title="2015 Australian Open – Women's singles">2015</a></td>
<td>Australian Open <small>(6)</small></td>
<td>Hard</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Russia" title="Russia"><img alt="Russia" class="mw-file-element" data-file-height="600" data-file-width="900" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/en/thumb/f/f3/Flag_of_Russia.svg/23px-Flag_of_Russia.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/f/f3/Flag_of_Russia.svg/35px-Flag_of_Russia.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/f/f3/Flag_of_Russia.svg/45px-Flag_of_Russia.svg.png 2x" width="23"/></a></span></span> Maria Sharapova</td>
<td>6–3, 7–6<sup>(7–5)</sup>
</td></tr>
<tr bgcolor="#ebc2af" style="border: 2px solid blue">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2015_French_Open_%E2%80%93_Women%27s_singles" title="2015 French Open – Women's singles">2015</a></td>
<td>French Open <small>(3)</small></td>
<td>Clay</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Czech_Republic" title="Czech Republic"><img alt="Czech Republic" class="mw-file-element" data-file-height="600" data-file-width="900" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/c/cb/Flag_of_the_Czech_Republic.svg/23px-Flag_of_the_Czech_Republic.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/c/cb/Flag_of_the_Czech_Republic.svg/35px-Flag_of_the_Czech_Republic.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/c/cb/Flag_of_the_Czech_Republic.svg/45px-Flag_of_the_Czech_Republic.svg.png 2x" width="23"/></a></span></span> <a href="/wiki/Lucie_%C5%A0af%C3%A1%C5%99ov%C3%A1" title="Lucie Šafářová">Lucie Šafářová</a></td>
<td>6–3, 6–7<sup>(2–7)</sup>, 6–2
</td></tr>
<tr bgcolor="#cfc" style="border: 2px solid blue">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2015_Wimbledon_Championships_%E2%80%93_Women%27s_singles" title="2015 Wimbledon Championships – Women's singles">2015</a></td>
<td>Wimbledon <small>(6)</small></td>
<td>Grass</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Spain" title="Spain"><img alt="Spain" class="mw-file-element" data-file-height="500" data-file-width="750" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/en/thumb/9/9a/Flag_of_Spain.svg/23px-Flag_of_Spain.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/9/9a/Flag_of_Spain.svg/35px-Flag_of_Spain.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/9/9a/Flag_of_Spain.svg/45px-Flag_of_Spain.svg.png 2x" width="23"/></a></span></span> <a href="/wiki/Garbi%C3%B1e_Muguruza" title="Garbiñe Muguruza">Garbiñe Muguruza</a></td>
<td>6–4, 6–4
</td></tr>
<tr style="background:#ffc;">
<td style="background:#ffa07a;">Loss</td>
<td><a href="/wiki/2016_Australian_Open_%E2%80%93_Women%27s_singles" title="2016 Australian Open – Women's singles">2016</a></td>
<td>Australian Open</td>
<td>Hard</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Germany" title="Germany"><img alt="Germany" class="mw-file-element" data-file-height="600" data-file-width="1000" decoding="async" height="14" src="//upload.wikimedia.org/wikipedia/en/thumb/b/ba/Flag_of_Germany.svg/23px-Flag_of_Germany.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/b/ba/Flag_of_Germany.svg/35px-Flag_of_Germany.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/b/ba/Flag_of_Germany.svg/46px-Flag_of_Germany.svg.png 2x" width="23"/></a></span></span> <a href="/wiki/Angelique_Kerber" title="Angelique Kerber">Angelique Kerber</a></td>
<td>4–6, 6–3, 4–6
</td></tr>
<tr style="background:#ebc2af;">
<td style="background:#ffa07a;">Loss</td>
<td><a href="/wiki/2016_French_Open_%E2%80%93_Women%27s_singles" title="2016 French Open – Women's singles">2016</a></td>
<td>French Open</td>
<td>Clay</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Spain" title="Spain"><img alt="Spain" class="mw-file-element" data-file-height="500" data-file-width="750" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/en/thumb/9/9a/Flag_of_Spain.svg/23px-Flag_of_Spain.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/9/9a/Flag_of_Spain.svg/35px-Flag_of_Spain.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/9/9a/Flag_of_Spain.svg/45px-Flag_of_Spain.svg.png 2x" width="23"/></a></span></span> Garbiñe Muguruza</td>
<td>5–7, 4–6
</td></tr>
<tr style="background:#cfc;">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2016_Wimbledon_Championships_%E2%80%93_Women%27s_singles" title="2016 Wimbledon Championships – Women's singles">2016</a></td>
<td>Wimbledon <small>(7)</small></td>
<td>Grass</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Germany" title="Germany"><img alt="Germany" class="mw-file-element" data-file-height="600" data-file-width="1000" decoding="async" height="14" src="//upload.wikimedia.org/wikipedia/en/thumb/b/ba/Flag_of_Germany.svg/23px-Flag_of_Germany.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/b/ba/Flag_of_Germany.svg/35px-Flag_of_Germany.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/b/ba/Flag_of_Germany.svg/46px-Flag_of_Germany.svg.png 2x" width="23"/></a></span></span> Angelique Kerber</td>
<td>7–5, 6–3
</td></tr>
<tr style="background:#ffc;">
<td style="background:#98fb98;">Win</td>
<td><a href="/wiki/2017_Australian_Open" title="2017 Australian Open">2017</a></td>
<td>Australian Open <small>(7)</small></td>
<td>Hard</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/United_States" title="United States"><img alt="United States" class="mw-file-element" data-file-height="650" data-file-width="1235" decoding="async" height="12" src="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/23px-Flag_of_the_United_States.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/35px-Flag_of_the_United_States.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/a/a4/Flag_of_the_United_States.svg/46px-Flag_of_the_United_States.svg.png 2x" width="23"/></a></span></span> Venus Williams</td>
<td>6–4, 6–4
</td></tr>
<tr style="background:#cfc;">
<td style="background:#ffa07a;">Loss</td>
<td><a href="/wiki/2018_Wimbledon_Championships_%E2%80%93_Women%27s_singles" title="2018 Wimbledon Championships – Women's singles">2018</a></td>
<td>Wimbledon</td>
<td>Grass</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Germany" title="Germany"><img alt="Germany" class="mw-file-element" data-file-height="600" data-file-width="1000" decoding="async" height="14" src="//upload.wikimedia.org/wikipedia/en/thumb/b/ba/Flag_of_Germany.svg/23px-Flag_of_Germany.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/b/ba/Flag_of_Germany.svg/35px-Flag_of_Germany.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/b/ba/Flag_of_Germany.svg/46px-Flag_of_Germany.svg.png 2x" width="23"/></a></span></span> Angelique Kerber</td>
<td>3–6, 3–6
</td></tr>
<tr style="background:#ccf;">
<td style="background:#ffa07a;">Loss</td>
<td><a href="/wiki/2018_US_Open_%E2%80%93_Women%27s_singles" title="2018 US Open – Women's singles">2018</a></td>
<td>US Open</td>
<td>Hard</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Japan" title="Japan"><img alt="Japan" class="mw-file-element" data-file-height="600" data-file-width="900" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/en/thumb/9/9e/Flag_of_Japan.svg/23px-Flag_of_Japan.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/9/9e/Flag_of_Japan.svg/35px-Flag_of_Japan.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/9/9e/Flag_of_Japan.svg/45px-Flag_of_Japan.svg.png 2x" width="23"/></a></span></span> <a href="/wiki/Naomi_Osaka" title="Naomi Osaka">Naomi Osaka</a></td>
<td><a href="/wiki/2018_US_Open_%E2%80%93_Women%27s_singles_final" title="2018 US Open – Women's singles final">2–6, 4–6</a>
</td></tr>
<tr style="background:#cfc;">
<td style="background:#ffa07a;">Loss</td>
<td><a href="/wiki/2019_Wimbledon_Championships_%E2%80%93_Women%27s_singles" title="2019 Wimbledon Championships – Women's singles">2019</a></td>
<td>Wimbledon</td>
<td>Grass</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Romania" title="Romania"><img alt="Romania" class="mw-file-element" data-file-height="400" data-file-width="600" decoding="async" height="15" src="//upload.wikimedia.org/wikipedia/commons/thumb/7/73/Flag_of_Romania.svg/23px-Flag_of_Romania.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/7/73/Flag_of_Romania.svg/35px-Flag_of_Romania.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/7/73/Flag_of_Romania.svg/45px-Flag_of_Romania.svg.png 2x" width="23"/></a></span></span> <a href="/wiki/Simona_Halep" title="Simona Halep">Simona Halep</a></td>
<td>2–6, 2–6
</td></tr>
<tr style="background:#ccf;">
<td style="background:#ffa07a;">Loss</td>
<td><a href="/wiki/2019_US_Open_%E2%80%93_Women%27s_singles" title="2019 US Open – Women's singles">2019</a></td>
<td>US Open</td>
<td>Hard</td>
<td><span class="flagicon"><span class="mw-image-border" typeof="mw:File"><a href="/wiki/Canada" title="Canada"><img alt="Canada" class="mw-file-element" data-file-height="600" data-file-width="1200" decoding="async" height="12" src="//upload.wikimedia.org/wikipedia/commons/thumb/d/d9/Flag_of_Canada_%28Pantone%29.svg/23px-Flag_of_Canada_%28Pantone%29.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/d/d9/Flag_of_Canada_%28Pantone%29.svg/35px-Flag_of_Canada_%28Pantone%29.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/d/d9/Flag_of_Canada_%28Pantone%29.svg/46px-Flag_of_Canada_%28Pantone%29.svg.png 2x" width="23"/></a></span></span> <a href="/wiki/Bianca_Andreescu" title="Bianca Andreescu">Bianca Andreescu</a></td>
<td>3–6, 5–7
</td></tr></tbody></table>
Now, our tables of interest are the first two result tables for “Singles” and “Women’s doubles”. We write a small helper function that returns a table with a given heading:
def find_table_with_heading(document, heading_pat):
heading_element = document.find(class_="mw-headline", string=heading_pat)
table = heading_element.find_next("table")
return table
singles_table = find_table_with_heading(document, re.compile("^Singles"))
# print headers
headings = singles_table.find_all("th")
[th.get_text(strip=True) for th in headings]
['Result', 'Year', 'Tournament', 'Surface', 'Opponents', 'Score']
Next, we can find the table after the heading “Women’s doubles”
doubles_table = find_table_with_heading(document, re.compile(r"^Women's doubles"))
# print headers
headings = doubles_table.find_all("th")
[th.get_text(strip=True) for th in headings]
['Result', 'Year', 'Tournament', 'Surface', 'Partner', 'Opponents', 'Score']
Exercise:#
Iterate through the rows
convert year to integer (or date)
strip note ‘(12)’ from event, so the same event has the same string
load into pandas DataFrame (more on pandas in a later lecture)
re.sub?
data = []
for row in singles_table.find_all("tr"):
cells = row.find_all("td")
if not cells:
continue
values = [cell.get_text(strip=True) for cell in cells]
values[1] = int(values[1])
values[2] = re.sub(r"\s*\(.+\)", "", values[2])
print(values)
data.append(values)
['Win', 1999, 'US Open', 'Hard', 'Martina Hingis', '6–3, 7–6(7–4)']
['Loss', 2001, 'US Open', 'Hard', 'Venus Williams', '2–6, 4–6']
['Win', 2002, 'French Open', 'Clay', 'Venus Williams', '7–5, 6–3']
['Win', 2002, 'Wimbledon', 'Grass', 'Venus Williams', '7–6(7–4), 6–3']
['Win', 2002, 'US Open', 'Hard', 'Venus Williams', '6–4, 6–3']
['Win', 2003, 'Australian Open', 'Hard', 'Venus Williams', '7–6(7–4), 3–6, 6–4']
['Win', 2003, 'Wimbledon', 'Grass', 'Venus Williams', '4–6, 6–4, 6–2']
['Loss', 2004, 'Wimbledon', 'Grass', 'Maria Sharapova', '1–6, 4–6']
['Win', 2005, 'Australian Open', 'Hard', 'Lindsay Davenport', '2–6, 6–3, 6–0']
['Win', 2007, 'Australian Open', 'Hard', 'Maria Sharapova', '6–1, 6–2']
['Loss', 2008, 'Wimbledon', 'Grass', 'Venus Williams', '5–7, 4–6']
['Win', 2008, 'US Open', 'Hard', 'Jelena Janković', '6–4, 7–5']
['Win', 2009, 'Australian Open', 'Hard', 'Dinara Safina', '6–0, 6–3']
['Win', 2009, 'Wimbledon', 'Grass', 'Venus Williams', '7–6(7–3), 6–2']
['Win', 2010, 'Australian Open', 'Hard', 'Justine Henin', '6–4, 3–6, 6–2']
['Win', 2010, 'Wimbledon', 'Grass', 'Vera Zvonareva', '6–3, 6–2']
['Loss', 2011, 'US Open', 'Hard', 'Samantha Stosur', '2–6, 3–6']
['Win', 2012, 'Wimbledon', 'Grass', 'Agnieszka Radwańska', '6–1, 5–7, 6–2']
['Win', 2012, 'US Open', 'Hard', 'Victoria Azarenka', '6–2, 2–6, 7–5']
['Win', 2013, 'French Open', 'Clay', 'Maria Sharapova', '6–4, 6–4']
['Win', 2013, 'US Open', 'Hard', 'Victoria Azarenka', '7–5, 6–7(6–8), 6–1']
['Win', 2014, 'US Open', 'Hard', 'Caroline Wozniacki', '6–3, 6–3']
['Win', 2015, 'Australian Open', 'Hard', 'Maria Sharapova', '6–3, 7–6(7–5)']
['Win', 2015, 'French Open', 'Clay', 'Lucie Šafářová', '6–3, 6–7(2–7), 6–2']
['Win', 2015, 'Wimbledon', 'Grass', 'Garbiñe Muguruza', '6–4, 6–4']
['Loss', 2016, 'Australian Open', 'Hard', 'Angelique Kerber', '4–6, 6–3, 4–6']
['Loss', 2016, 'French Open', 'Clay', 'Garbiñe Muguruza', '5–7, 4–6']
['Win', 2016, 'Wimbledon', 'Grass', 'Angelique Kerber', '7–5, 6–3']
['Win', 2017, 'Australian Open', 'Hard', 'Venus Williams', '6–4, 6–4']
['Loss', 2018, 'Wimbledon', 'Grass', 'Angelique Kerber', '3–6, 3–6']
['Loss', 2018, 'US Open', 'Hard', 'Naomi Osaka', '2–6, 4–6']
['Loss', 2019, 'Wimbledon', 'Grass', 'Simona Halep', '2–6, 2–6']
['Loss', 2019, 'US Open', 'Hard', 'Bianca Andreescu', '3–6, 5–7']
When data is in this form, we can convert it into a DataFrame with pandas.
You’ll learn more about pandas next week.
import pandas as pd
headings = [th.get_text(strip=True) for th in singles_table.find_all("th")]
df = pd.DataFrame(data, columns=headings)
df
Result | Year | Tournament | Surface | Opponents | Score | |
---|---|---|---|---|---|---|
0 | Win | 1999 | US Open | Hard | Martina Hingis | 6–3, 7–6(7–4) |
1 | Loss | 2001 | US Open | Hard | Venus Williams | 2–6, 4–6 |
2 | Win | 2002 | French Open | Clay | Venus Williams | 7–5, 6–3 |
3 | Win | 2002 | Wimbledon | Grass | Venus Williams | 7–6(7–4), 6–3 |
4 | Win | 2002 | US Open | Hard | Venus Williams | 6–4, 6–3 |
5 | Win | 2003 | Australian Open | Hard | Venus Williams | 7–6(7–4), 3–6, 6–4 |
6 | Win | 2003 | Wimbledon | Grass | Venus Williams | 4–6, 6–4, 6–2 |
7 | Loss | 2004 | Wimbledon | Grass | Maria Sharapova | 1–6, 4–6 |
8 | Win | 2005 | Australian Open | Hard | Lindsay Davenport | 2–6, 6–3, 6–0 |
9 | Win | 2007 | Australian Open | Hard | Maria Sharapova | 6–1, 6–2 |
10 | Loss | 2008 | Wimbledon | Grass | Venus Williams | 5–7, 4–6 |
11 | Win | 2008 | US Open | Hard | Jelena Janković | 6–4, 7–5 |
12 | Win | 2009 | Australian Open | Hard | Dinara Safina | 6–0, 6–3 |
13 | Win | 2009 | Wimbledon | Grass | Venus Williams | 7–6(7–3), 6–2 |
14 | Win | 2010 | Australian Open | Hard | Justine Henin | 6–4, 3–6, 6–2 |
15 | Win | 2010 | Wimbledon | Grass | Vera Zvonareva | 6–3, 6–2 |
16 | Loss | 2011 | US Open | Hard | Samantha Stosur | 2–6, 3–6 |
17 | Win | 2012 | Wimbledon | Grass | Agnieszka Radwańska | 6–1, 5–7, 6–2 |
18 | Win | 2012 | US Open | Hard | Victoria Azarenka | 6–2, 2–6, 7–5 |
19 | Win | 2013 | French Open | Clay | Maria Sharapova | 6–4, 6–4 |
20 | Win | 2013 | US Open | Hard | Victoria Azarenka | 7–5, 6–7(6–8), 6–1 |
21 | Win | 2014 | US Open | Hard | Caroline Wozniacki | 6–3, 6–3 |
22 | Win | 2015 | Australian Open | Hard | Maria Sharapova | 6–3, 7–6(7–5) |
23 | Win | 2015 | French Open | Clay | Lucie Šafářová | 6–3, 6–7(2–7), 6–2 |
24 | Win | 2015 | Wimbledon | Grass | Garbiñe Muguruza | 6–4, 6–4 |
25 | Loss | 2016 | Australian Open | Hard | Angelique Kerber | 4–6, 6–3, 4–6 |
26 | Loss | 2016 | French Open | Clay | Garbiñe Muguruza | 5–7, 4–6 |
27 | Win | 2016 | Wimbledon | Grass | Angelique Kerber | 7–5, 6–3 |
28 | Win | 2017 | Australian Open | Hard | Venus Williams | 6–4, 6–4 |
29 | Loss | 2018 | Wimbledon | Grass | Angelique Kerber | 3–6, 3–6 |
30 | Loss | 2018 | US Open | Hard | Naomi Osaka | 2–6, 4–6 |
31 | Loss | 2019 | Wimbledon | Grass | Simona Halep | 2–6, 2–6 |
32 | Loss | 2019 | US Open | Hard | Bianca Andreescu | 3–6, 5–7 |
With pandas, we can filter this data, group it, and plot interesting relationships.
Pandas groupby
is an interesting operation for performing aggregations,
e.g. counting the wins/losses by year and result:
df.Result.value_counts()
Result
Win 23
Loss 10
Name: count, dtype: int64
results_by_year = df.groupby(["Year", "Result"]).Tournament.count().unstack().fillna(0)
results_by_year
Result | Loss | Win |
---|---|---|
Year | ||
1999 | 0.0 | 1.0 |
2001 | 1.0 | 0.0 |
2002 | 0.0 | 3.0 |
2003 | 0.0 | 2.0 |
2004 | 1.0 | 0.0 |
2005 | 0.0 | 1.0 |
2007 | 0.0 | 1.0 |
2008 | 1.0 | 1.0 |
2009 | 0.0 | 2.0 |
2010 | 0.0 | 2.0 |
2011 | 1.0 | 0.0 |
2012 | 0.0 | 2.0 |
2013 | 0.0 | 2.0 |
2014 | 0.0 | 1.0 |
2015 | 0.0 | 3.0 |
2016 | 2.0 | 1.0 |
2017 | 0.0 | 1.0 |
2018 | 2.0 | 0.0 |
2019 | 2.0 | 0.0 |
Which we can now plot
results_by_year.plot(kind="bar", grid=False)
<Axes: xlabel='Year'>
Is there any significance to the court?
results_by_surface = df.groupby(["Surface", "Result"]).Tournament.count().unstack()
results_by_surface
Result | Loss | Win |
---|---|---|
Tournament | ||
Australian Open | 1 | 7 |
French Open | 1 | 3 |
US Open | 4 | 6 |
Wimbledon | 4 | 7 |
results_by_surface.plot(kind="bar")
<Axes: xlabel='Tournament'>
We can even filter to e.g. select opponents who Williams faced at least twice
results_by_op = df.groupby(["Opponents", "Result"]).Tournament.count().unstack()
results_by_op
Result | Loss | Win |
---|---|---|
Opponents | ||
Agnieszka Radwańska | NaN | 1.0 |
Angelique Kerber | 2.0 | 1.0 |
Bianca Andreescu | 1.0 | NaN |
Caroline Wozniacki | NaN | 1.0 |
Dinara Safina | NaN | 1.0 |
Garbiñe Muguruza | 1.0 | 1.0 |
Jelena Janković | NaN | 1.0 |
Justine Henin | NaN | 1.0 |
Lindsay Davenport | NaN | 1.0 |
Lucie Šafářová | NaN | 1.0 |
Maria Sharapova | 1.0 | 3.0 |
Martina Hingis | NaN | 1.0 |
Naomi Osaka | 1.0 | NaN |
Samantha Stosur | 1.0 | NaN |
Simona Halep | 1.0 | NaN |
Venus Williams | 2.0 | 7.0 |
Vera Zvonareva | NaN | 1.0 |
Victoria Azarenka | NaN | 2.0 |
# we can exclude opponents only met once:
results_by_op = results_by_op.fillna(0)
results_by_op
Result | Loss | Win |
---|---|---|
Opponents | ||
Agnieszka Radwańska | 0.0 | 1.0 |
Angelique Kerber | 2.0 | 1.0 |
Bianca Andreescu | 1.0 | 0.0 |
Caroline Wozniacki | 0.0 | 1.0 |
Dinara Safina | 0.0 | 1.0 |
Garbiñe Muguruza | 1.0 | 1.0 |
Jelena Janković | 0.0 | 1.0 |
Justine Henin | 0.0 | 1.0 |
Lindsay Davenport | 0.0 | 1.0 |
Lucie Šafářová | 0.0 | 1.0 |
Maria Sharapova | 1.0 | 3.0 |
Martina Hingis | 0.0 | 1.0 |
Naomi Osaka | 1.0 | 0.0 |
Samantha Stosur | 1.0 | 0.0 |
Simona Halep | 1.0 | 0.0 |
Venus Williams | 2.0 | 7.0 |
Vera Zvonareva | 0.0 | 1.0 |
Victoria Azarenka | 0.0 | 2.0 |
(results_by_op.Win + results_by_op.Loss) > 1
Opponents
Agnieszka Radwańska False
Angelique Kerber True
Bianca Andreescu False
Caroline Wozniacki False
Dinara Safina False
Garbiñe Muguruza True
Jelena Janković False
Justine Henin False
Lindsay Davenport False
Lucie Šafářová False
Maria Sharapova True
Martina Hingis False
Naomi Osaka False
Samantha Stosur False
Simona Halep False
Venus Williams True
Vera Zvonareva False
Victoria Azarenka True
dtype: bool
multiple_meetings = results_by_op[(results_by_op.Win + results_by_op.Loss) > 1]
multiple_meetings.plot(kind="bar")
<Axes: xlabel='Opponents'>
Exercise:#
Find images on the UiO page
Download the content from the site using BeautifulSoup and requests
Search for all images (using
images = document.find_all('img')
) and print out the contentInclude only images with the attribute
class_="mw-file-element"
in your list of images.Print out a list of the value of the “src” attribute for the images in 4.
See if you can display an image by pasting a result from 5 into your web-browser.
r = requests.get("https://no.wikipedia.org/wiki/Universitetet_i_Oslo")
html = r.text
print(html[:400])
<!DOCTYPE html>
<html class="client-nojs" lang="nb" dir="ltr">
<head>
<meta charset="UTF-8">
<title>Universitetet i Oslo – Wikipedia</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":false,"wgSeparatorTransformTable":[",\t."," \t,"],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","januar","februar","mars","april","mai","juni","jul
document = BeautifulSoup(html, "html.parser")
images = document.find_all("img", class_="mw-file-element")
len(images)
17
for image in images:
print(image["src"])
<img alt="Rediger på Wikidata" class="mw-file-element" data-file-height="20" data-file-width="20" decoding="async" height="10" src="//upload.wikimedia.org/wikipedia/commons/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/10px-OOjs_UI_icon_edit-ltr-progressive.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/15px-OOjs_UI_icon_edit-ltr-progressive.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/20px-OOjs_UI_icon_edit-ltr-progressive.svg.png 2x" width="10"/>
<img alt="Rediger på Wikidata" class="mw-file-element" data-file-height="20" data-file-width="20" decoding="async" height="10" src="//upload.wikimedia.org/wikipedia/commons/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/10px-OOjs_UI_icon_edit-ltr-progressive.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/15px-OOjs_UI_icon_edit-ltr-progressive.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/20px-OOjs_UI_icon_edit-ltr-progressive.svg.png 2x" width="10"/>
<img alt="Rediger på Wikidata" class="mw-file-element" data-file-height="20" data-file-width="20" decoding="async" height="10" src="//upload.wikimedia.org/wikipedia/commons/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/10px-OOjs_UI_icon_edit-ltr-progressive.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/15px-OOjs_UI_icon_edit-ltr-progressive.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/20px-OOjs_UI_icon_edit-ltr-progressive.svg.png 2x" width="10"/>
from IPython.display import HTML, display
for image in images:
url = image["src"]
if "://" in url:
pass
elif url.startswith("//"):
# add 'scheme' or 'protocol'
url = "https:" + url
elif url.startswith("/"):
url = "https://no.wikipedia.org" + url
else:
# not an understood URL
raise ValueError(f"I don't understand this url: {url}")
html = HTML(f'<img src="{url}">')
display(html)