Summary#
In this video, we reproduce the approach that predicts Survivor winners and apply it to Economic Competition Networks to better understand world trade and economic leaders. We build a country to country competition network based on the Export Similarity Index (ESI), and we use several techniques from network science, like PageRank, community detection, weak component analysis, or the recent common out-neighbor (CON) score, to better understand how countries compete with each other within the world economy, identifying dominating or leading economies, as well as their counterpart weaker or smaller economies.
Dataset#
We use The Atlas of Economic Complexity dataset, which is summarized in the following table. We only provide a top-level overview of the data here. For an in-depth detailed description, click the Download button in each table row of the link above—that will open a popup with detailed information on the fields for each CSV file.
Title | Description | |
---|---|---|
Complexity Rankings & Growth Projections | Economic Complexity Index (ECI) and partial growth projections for world economies from 1995 to 2023. | |
Country Trade by Product | Exports and imports, per country and product, over the years. Different files provide a different product category granularity based on the number of HS92, HS12 or SITC digits. Different files are also provided for services, using a non-standard classification internal to Growth Labs that also provides different digit-based granularities. | |
Total Trade by Country | Total exports and imports, per country, over the years. Different files provide data about products and services. How big is the economy for a country? How did it progress over the last 28 years? | |
Total Trade by Product | Total exports and imports, per product, over the years. Again, this is provided at different product granularities based on HS92, HS12 or SITC digits. Different files are also provided for services, using a non-standard classification internal to Growth Labs that also provides different digit-based granularities. How big is the market for a product? How did it progress over the last 28 years? | |
Country Trade by Partner | Bilateral exports and imports between pairs of countries, over the years. | |
✅ | Country Trade by Partner and Product | Bilateral exports and imports between pairs of countries, for a given product, over the years. This is provided at 6-digit granularity based on HS92, HS12 or SITC digits. This is partitioned into multiple files in blocks of 10 years (or 5 years only for 1995-1999). A granularity of 4 digits would be enough to distinguish between main product types (e.g., beef vs pork vs poultry, fresh vs frozen; gasoline engines vs diesel engines). With 6 digits we get a lot more detail (e.g., carcasses and half-carcasses of bovine animals, fresh or chilled; engines for aircraft). We use the HS92 data with 6 digits—the only one available, but also ideal to capture trade competition between countries, as true competition is only uncovered at a smaller scale. We only look at the 2020-2023 period, for recency, aggregating totals for those three years. |
✅ | Country Classification | Country metadata. |
Regional Classification | Regional classification for countries—continent it belongs to, political region (e.g., European Union), subregion (e.g., Central America, Western Africa), trade regions (e.g., NAFTA, OPEC), etc. | |
HS12 Product Classification | Product metadata according to HS12 codes. | |
✅ | HS92 Product Classification | Product metadata according to HS92 codes. We use this to inspect products traded by salient countries during the analysis. |
Services Product Classification | Services metadata according to a non-standard classification internal to Growth Labs. We use this to inspect services traded by salient countries during the analysis. | |
SITC Product Classification | Product metadata according to SITC codes. | |
Product Space Related Edges | HS92 4-digit codes for source and target products in the same space (e.g., women’s coats ⇄ sweaters). | |
Product Space Layout | HS92 4-digit codes for products along with their 2D embedding, where close products are co-exported by countries. |
Here are the citations for the datasets that we use:
Country Trade by Partner and Product:
The Growth Lab at Harvard University, 2025, “International Trade Data (HS92)”, https://doi.org/10.7910/DVN/T4CHWJ, Harvard Dataverse
Country Classification & HS92 Product Classification:
The Growth Lab at Harvard University, 2025, “Classifications Data”, https://doi.org/10.7910/DVN/3BAL1O, Harvard Dataverse
Graph Schema#
Out of the three CSV files that we identified above as being used, we produce the following nodes and relationship labels:
- Nodes
Country
node_id
– globally unique node identifier –INT64
- Properties from all Country Classification columns
Product
node_id
– globally unique node identifier –INT64
- Properties from all HS92 Product Classification columns
- Relationships
(:Country)-[:CompetesWith]->(:Country)
ESI
– Export Similarity Index –DOUBLE
(:Country)-[:Exports]->(:Product)
amount_usd
– exports dollar amount (2020-2023) –INT128
(:Country)<-[:Imports]->(:Product)
amount_usd
– imports dollar amount (2020-2023) –INT128
Take a look at the following diagram, where rectangles represent the raw CSV files, with dashed arrows illustrating the data source, and circles represent nodes, with solid arrows representing relationships.
Jupyter Notebook#
The following sections are an adaptation of the Jupyter Notebook that we created to analyze the Economic Competition Network.
Setup#
ETL#
For ETL, we directly call the appropriate dlctl
commands for:
- Ingesting the dataset
- Transforming using SQL on top of DuckLake
- Exporting from the data lakehouse into Parquet
- Loading the graph into Kuzu
- Computing general analytics scores
Be sure to uncomment the cell below and run it once.
!dlctl ingest dataset -t atlas \
"The Atlas of Economic Complexity"
!dlctl transform -m +marts.graphs.econ_comp
!dlctl export dataset graphs econ_comp
!dlctl graph load econ_comp
!dlctl graph compute con-score econ_comp Country CompetesWith
Imports#
from pathlib import Path
from string import Template
from textwrap import dedent
from typing import Any, Literal, Optional
import kuzu
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import pandas as pd
from scipy.special import expit
import graph.visualization as vis
from shared.settings import LOCAL_DIR, env
Globals#
We setup access to the appropriate Kuzu path, based on the shared .env
configuration, ensuring the graph exists before running the notebook. Once setup, conn
will be used to query the graph directly throughout this notebook.
db_path = Path(LOCAL_DIR) / env.str("ECON_COMP_GRAPH_DB")
assert db_path.exists(), \
"You need to create the graph DB using dlctl first"
db = kuzu.Database(db_path)
conn = kuzu.Connection(db)
Constants#
In order to ensure color consistency for our plots, we extract the color palette from matplotlib
into MPL_PALETTE
.
MPL_PALETTE = (
plt.rcParams["axes.prop_cycle"]
.by_key()["color"]
)
We also map a display attribute for each of our note labels, Country
and Product
. We’ll use the short names for both when plotting graph visualizations or related charts.
LABEL_PROPS = {
"Country": "country_name_short",
"Product": "product_name_short",
}
Functions#
We create a few reusable functions, where we run Kuzu queries. In a few cases, it was helpful to debug the query with parameters (e.g., using Kuzu Explorer), so we created a helper function for this (note that this doesn’t support string parameters, as we didn’t ned them).
def print_query(query: str, params: dict[str, Any]):
dbg_query = dedent(query).strip()
dbg_query = Template(dbg_query)
dbg_query = dbg_query.substitute(params)
print(dbg_query)
We’ll also cluster nodes using different strategies and compare groups, so we implement a basic Jaccard similarity function.
def jaccard_sim(a: pd.Series, b: pd.Series) -> float:
a = set(a)
b = set(b)
return len(a & b) / len(a | b)
We might want to look at the top x% of traded products, based ona USD. The following function will help filter this.
def top_frac(df: pd.DataFrame, col: str, frac: float = 0.25):
mask = (df[col] / df[col].sum()).cumsum() <= frac
return df[mask]
Analysis#
We focus on the CompetesWith
projection, a relationship given by the Export Similarity Index (ESI). Our graph analysis includes:
- Dynamic competition analysis.
- Dominating and weaker economy identification, based on the CON score for each country.
- Trade basket overlap analysis for top and bottom economies.
- Competition network analysis.
- Community analysis, including community mapping, top traded product identification, and trade alignment study (self-sufficiency, external competitiveness).
- Weak component analysis, following a similar approach to the community analysis—weak components widen community reach.
- Community and weak component comparison.
- Economic pressure analysis.
Dynamic Competition Analysis#
Top 10 Dominating Economies#
These are highly spread economies, able to compete with several other countries, i.e., with a high number of common out-neighbors (CON).
dom_econ_df = conn.execute(
"""
MATCH (c:Country)
RETURN
c,
c.node_id AS node_id,
c.country_name_short AS country
ORDER BY c.con_score DESC
LIMIT 10
"""
).get_as_df()[["node_id", "country"]]
dom_econ_df.index = pd.RangeIndex(
start=1,
stop=len(dom_econ_df) + 1,
name="rank"
)
dom_econ_df
node_id | country | |
---|---|---|
rank | ||
1 | 206 | United States of America |
2 | 55 | Canada |
3 | 34 | United Arab Emirates |
4 | 107 | Netherlands |
5 | 132 | United Kingdom |
6 | 175 | Belgium |
7 | 134 | Italy |
8 | 223 | Spain |
9 | 131 | France |
10 | 145 | Thailand |
Top 3 Exports#
Looking at the top exports will help contextualize these economies. We only look at the top 3 products, to keep the visualization clean and readable.
dom_econ_g = conn.execute(
"""
MATCH (c:Country)
WITH c
ORDER BY c.con_score DESC
LIMIT 10
MATCH (c)-[e:Exports]->(p:Product)
MATCH (c2:Country)-[:Exports]->(p)
WITH c, e, p, count(DISTINCT c2) AS exporters
WHERE exporters > 1
WITH c, e, p
ORDER BY c.node_id, e.amount_usd DESC
SKIP 0
WITH c, collect({p: p, e: e}) AS export_list
UNWIND list_slice(export_list, 0, 3) AS r
RETURN c, r.e, r.p
ORDER BY c.node_id, r.p.node_id
"""
).get_as_networkx()
vis.set_labels(dom_econ_g, LABEL_PROPS)
vis.plot(dom_econ_g, scale=1.25, seed=3)
Bottom 10 Weaker Economies#
These are smaller or weaker economies, in the sense that they have a lower competition power. We also find the Undeclared
special country node at rank 1, showing that only a small number of products are undeclared worldwide.
weak_econ_df = conn.execute(
"""
MATCH (c:Country)
RETURN
c,
c.node_id AS node_id,
c.country_name_short AS country
ORDER BY c.con_score ASC
LIMIT 10
"""
).get_as_df()[["node_id", "country"]]
weak_econ_df.index = pd.RangeIndex(
start=1,
stop=len(weak_econ_df) + 1,
name="rank"
)
weak_econ_df
node_id | country | |
---|---|---|
rank | ||
1 | 193 | Undeclared |
2 | 72 | Bouvet Island |
3 | 170 | Wallis and Futuna |
4 | 106 | Norfolk Island |
5 | 167 | Saint Pierre and Miquelon |
6 | 216 | Niue |
7 | 66 | South Georgia and South Sandwich Islds. |
8 | 121 | Northern Mariana Islands |
9 | 161 | Heard and McDonald Islands |
10 | 100 | Western Sahara |
Top 3 Exports#
If we look at the top 3 exports for each competing country in the bottom of the ranking according CON scores, as expected we find that these are more disconnected economies, mostly focusing on raw materials, or components and machinery.
weak_econ_g = conn.execute(
"""
MATCH (c:Country)
WITH c
ORDER BY c.con_score ASC
LIMIT 10
MATCH (c)-[e:Exports]->(p:Product)
MATCH (c2:Country)-[:Exports]->(p)
WITH c, e, p, count(DISTINCT c2) AS exporters
WHERE exporters > 1
WITH c, e, p
ORDER BY c.node_id, e.amount_usd DESC
SKIP 0
WITH c, collect({p: p, e: e}) AS export_list
UNWIND list_slice(export_list, 0, 3) AS r
RETURN c, r.e, r.p
ORDER BY c.node_id, r.p.node_id
"""
).get_as_networkx()
vis.set_labels(weak_econ_g, LABEL_PROPS)
vis.plot(weak_econ_g, scale=1.25, seed=3)
Dominating vs Weaker Economies#
- Do dominating economies compete in the same markets as weaker economies?
- If so, maybe that’s why those weaker economies are being pushed to the bottom. ☑️
- If not, maybe the products exported by those weaker economies are not the most competitive.
Here, we find that, due to the small export diversity, weaker economies are being crushed by dominating economies. Their position of vulnerability comes mostly from geographical isolation and limited area, leading to a lower amount of competition opportunities, where any competitor becomes a risk to the economy.
Below, country node classes are visually translated to a colored node border and label text. We assign two classes, for the top and bottom 10 economies, with top economies in the center, and the products and bottom economies in the surrounding area. This forms a star layout, where each arm is a weaker economy or a small cluster of weaker economies.
We look at the top 3 most exported products in weaker economies, but relaxing the filter on number of exported products for the weaker economies and looking at more than 3 exported products will reproduce the displayed behavior, with dominating economies still competing for the same products. This doesn’t necessarily mean that both dominating and weaker economies produce the same products, as some of them can simply be re-exported.
dom_vs_weak_econ_g = conn.execute(
"""
MATCH (wea)-[we:Exports]->(p:Product)
MATCH (dom)-[de:Exports]->(p)
WHERE dom.node_id IN $dominating_node_ids
AND wea.node_id IN $weaker_node_ids
WITH wea, we, p, count(DISTINCT dom) AS dom_competitors
WHERE dom_competitors > 0
WITH wea, we, p
ORDER BY wea.node_id, we.amount_usd DESC
SKIP 0
WITH wea, collect({p: p, e: we}) AS export_list
UNWIND list_slice(export_list, 0, 3) AS r
WITH wea, r.p.node_id AS prod_node_id
MATCH (wea)-[we:Exports]
->(prod:Product { node_id: prod_node_id })
MATCH (dom:Country)-[de:Exports]->(prod)
WHERE dom.node_id IN $dominating_node_ids
RETURN wea, we, prod, de, dom
ORDER BY wea.node_id, prod.node_id, dom.node_id
""",
dict(
dominating_node_ids=dom_econ_df.node_id.to_list(),
weaker_node_ids=weak_econ_df.node_id.to_list(),
),
).get_as_networkx()
node_classes = dict(
dominating=dom_econ_df.node_id.to_list(),
weaker=weak_econ_df.node_id.to_list(),
)
# This adjusts the visualization edge weights
# to improve readability
for u, v, data in dom_vs_weak_econ_g.edges(data=True):
if (
dom_vs_weak_econ_g.nodes[u]["node_id"]
in node_classes["dominating"]
and dom_vs_weak_econ_g.nodes[v]["_label"]
== "Product"
):
data["vis_weight"] = 1e-5
if (
dom_vs_weak_econ_g.nodes[u]["node_id"]
in node_classes["weaker"]
and dom_vs_weak_econ_g.nodes[v]["_label"]
== "Product"
):
data["vis_weight"] = 1e-3
vis.set_labels(dom_vs_weak_econ_g, LABEL_PROPS)
vis.plot(
dom_vs_weak_econ_g,
node_classes=node_classes,
scale=1.25,
seed=5,
)
Competition Network#
Let’s look at the competition network projection for Country
nodes and CompetesWith
edges. We first install the algo
extension for Kuzu and create the compnet
projection and NetworkX graph for it.
try:
conn.execute(
"""
INSTALL algo;
LOAD algo;
"""
)
except Exception as e:
print(e)
try:
conn.execute(
"""
CALL drop_projected_graph("compnet")
"""
)
except Exception as e:
print(e)
conn.execute(
"""
CALL project_graph(
"compnet",
{"Country": "n.country_name_short <> 'Undeclared'"},
{"CompetesWith": "true"}
)
"""
)
compnet_g = conn.execute(
"""
MATCH (a:Country)-[cw:CompetesWith]->(b:Country)
WHERE a.country_name_short <> "Undeclared"
AND b.country_name_short <> "Undeclared"
RETURN a, cw, b
""",
).get_as_networkx()
Inspection Functions#
The following functions will be useful to plot the cluster and analyze the top exports for a specific cluster ID property:
def plot_cluster(
prop_name: str,
prop_value: int,
kind: Literal["graph", "map"] = "graph",
):
match kind:
case "graph":
compnet_cluster_g = conn.execute(
f"""
MATCH (a:Country)-[cw:CompetesWith]->
(b:Country)
WHERE a.country_name_short <> "Undeclared"
AND b.country_name_short <> "Undeclared"
AND a.`{prop_name}` = $prop_value
AND b.`{prop_name}` = $prop_value
RETURN a, cw, b
""",
dict(prop_value=prop_value),
).get_as_networkx()
vis.set_labels(compnet_cluster_g, LABEL_PROPS)
vis.plot(compnet_cluster_g)
case "map":
compnet_cluster_df = conn.execute(
f"""
MATCH (c:Country)
WHERE c.country_name_short <> "Undeclared"
AND c.`{prop_name}` = $prop_value
RETURN
c.country_iso3_code AS iso3_code,
c.`{prop_name}` AS `{prop_name}`
""",
dict(prop_value=prop_value),
).get_as_df()
vis.plot_map(
compnet_cluster_df,
code_col="iso3_code",
class_col=prop_name,
)
def trade_per_cluster(
prop_name: str,
prop_value: int,
method: Literal["imports", "exports"],
n: Optional[int] = None,
debug: bool = False,
) -> pd.DataFrame:
match method:
case "exports":
match_stmt = """
MATCH (c:Country)-[ie:Exports]->(p:Product)
"""
case "imports":
match_stmt = """
MATCH (c:Country)<-[ie:Imports]-(p:Product)
"""
if n is None:
limit_stmt = ""
limit_param = dict()
else:
limit_stmt = "LIMIT $n"
limit_param = dict(n=n)
query = f"""
{match_stmt}
WHERE c.country_name_short <> "Undeclared"
AND c.`{prop_name}` = $prop_value
RETURN
p.product_name_short AS product,
sum(ie.amount_usd) AS total_amount_usd
ORDER BY total_amount_usd DESC
{limit_stmt}
"""
params = dict(prop_value=prop_value) | limit_param
if debug:
print_query(query, params)
products_df = conn.execute(query, params).get_as_df()
return products_df
Partner clusters are clusters that import what a cluster is exporting. These are likely to match all clusters due to high connectivity in the world economy, but it might not always be the case, depending on the clustering criteria.
def partner_clusters(
prop_name: str,
prop_value: int,
include_self: bool = True,
debug: bool = False,
) -> list[int]:
include_self_stmt = (
"" if include_self
else f"AND c2.`{prop_name}` <> $prop_value"
)
query = f"""
MATCH (c:Country)-[:Exports]-(p:Product)
MATCH (c2:Country)<-[:Imports]-(p)
WHERE c.country_name_short <> "Undeclared"
AND c.`{prop_name}` = $prop_value
AND c2.`{prop_name}` IS NOT NULL
{include_self_stmt}
RETURN DISTINCT c2.`{prop_name}` AS cid
"""
params = dict(prop_value=prop_value)
if debug:
print_query(query, params)
result = conn.execute(query, params)
partner_cluster_ids = sorted(
c[0] for c in result.get_all()
)
return partner_cluster_ids
The following functions will help us compute the intra-cluster and inter-cluster trade alignments, i.e., self-sufficiency and external competitiveness, based on cluster-aggregated market share.
def trade_alignment_by_cluster(
prop_name: str,
prop_value: int,
method: Literal["intra", "inter"],
) -> pd.DataFrame:
exports_df = trade_per_cluster(
prop_name,
prop_value,
method="exports",
)
match method:
case "intra":
imports_df = trade_per_cluster(
prop_name,
prop_value,
method="imports",
)
case "inter":
imports_df = []
for partner_cid in partner_clusters(
prop_name,
prop_value,
):
partner_imports_df = trade_per_cluster(
prop_name,
partner_cid,
method="imports",
)
imports_df.append(partner_imports_df)
imports_df = (
pd.concat(imports_df)
.groupby(["product"])
.sum()
)
case _:
raise ValueError(
f"method not supported: {method}"
)
trade_df = exports_df.merge(
imports_df,
on="product",
how="right" if method == "intra" else "left",
suffixes=("_exports", "_imports"),
).fillna(0)
trade_df["sdr"] = (
trade_df.total_amount_usd_exports
/ trade_df.total_amount_usd_imports
)
trade_df = trade_df.sort_values("sdr", ascending=False)
return trade_df
As a score for measuring either self-sufficiency or external competitiveness, we use weighted average of the Supply-Demand Ration (SDR), where weights are the total export amount (USD) for a given cluster.
def global_sdr_score(
trade_df: pd.DataFrame,
eps=1e-9,
) -> float:
df = trade_df[~np.isinf(trade_df.sdr)]
df["log_sdr"] = np.log(np.clip(df.sdr, eps, None))
weights = df.total_amount_usd_exports
score = expit(
(weights * df.log_sdr).sum() / weights.sum()
)
return score.item()
Competing Communities#
- Are there any communities representing closely tied competitor clusters?
- If so, maybe there are specific products per cluster? ☑️
- If not, we have a global economy that is fairly homogenous and diverse.
For each property computed with the algo
extension, we’ll alter the corresponding node table, recreating the property each time.
conn.execute(
"""
ALTER TABLE Country DROP IF EXISTS louvain_id;
ALTER TABLE Country ADD IF NOT EXISTS louvain_id INT64;
CALL louvain("compnet")
WITH node, louvain_id
SET node.louvain_id = louvain_id;
"""
)
The Louvain method partitions the network by optimizing modularity, which essentially means it will find the best partition of communities within the graph, a community being a dense subgraph, i.e., a subgraph where connections among members are more frequent than to outside nodes.
compnet_louvain_df = conn.execute(
"""
MATCH (c:Country)
WHERE c.country_name_short <> "Undeclared"
RETURN
c.node_id AS node_id,
c.country_name_short AS label,
c.louvain_id AS louvain_id
"""
).get_as_df()
node_classes = {
k: g.node_id.to_list()
for k, g in compnet_louvain_df.groupby("louvain_id")
}
vis.set_labels(compnet_g, LABEL_PROPS)
vis.plot(
compnet_g,
node_classes=node_classes,
hide_edges=True,
)
In complex networks, it is not uncommon for a huge community to emerge, along with a low number of moderately large communities, and then a lot of smaller communities. This behavior is not particularly exacerbated here, but it’s still visible. Below, we inspect the community size distribution.
comm_sizes_df = (
compnet_louvain_df[["louvain_id", "node_id"]]
.groupby("louvain_id")
.count()
.rename(columns=dict(node_id="num_nodes"))
)
comm_sizes_df = comm_sizes_df.reindex(
comm_sizes_df.num_nodes.sort_values(ascending=False).index
)
comm_sizes_df
num_nodes | |
---|---|
louvain_id | |
5 | 67 |
6 | 33 |
4 | 30 |
8 | 26 |
7 | 19 |
1 | 14 |
2 | 14 |
0 | 13 |
3 | 8 |
9 | 7 |
10 | 3 |
fig, ax = plt.subplots(figsize=(18, 3))
comm_sizes_df.plot.bar(xlabel="Community ID", rot=0, ax=ax)
plt.legend(["No. Nodes"])
plt.show()
Let’s also take a look at the members of each community, from largest to smallest.
for louvain_id in comm_sizes_df.index:
display(f"LOUVAIN ID: {louvain_id}")
display(
compnet_louvain_df[
compnet_louvain_df.louvain_id == louvain_id
]
.drop(columns="louvain_id")
.sort_values("label")
)
'LOUVAIN ID: 5'
node_id | label | |
---|---|---|
115 | 116 | Albania |
205 | 207 | Andorra |
114 | 115 | Anguilla |
173 | 174 | Austria |
126 | 127 | Belarus |
... | ... | ... |
49 | 50 | Tunisia |
232 | 234 | Turkiye |
123 | 124 | Turks and Caicos Islands |
131 | 132 | United Kingdom |
204 | 206 | United States of America |
67 rows × 2 columns
'LOUVAIN ID: 6'
node_id | label | |
---|---|---|
209 | 211 | Algeria |
171 | 172 | Angola |
51 | 52 | Aruba |
11 | 12 | Azerbaijan |
177 | 178 | Cameroon |
54 | 55 | Canada |
154 | 155 | Chad |
55 | 56 | Colombia |
15 | 16 | Democratic Republic of the Congo |
220 | 222 | Ecuador |
57 | 58 | Egypt |
158 | 159 | Equatorial Guinea |
100 | 101 | Fiji |
75 | 76 | Gabon |
223 | 225 | Greenland |
211 | 213 | Guyana |
17 | 18 | Iran |
89 | 90 | Iraq |
184 | 185 | Kazakhstan |
152 | 153 | Kuwait |
135 | 136 | Libya |
138 | 139 | Nigeria |
139 | 140 | Norway |
26 | 27 | Oman |
219 | 221 | Republic of the Congo |
63 | 64 | Russia |
229 | 231 | Sao Tome and Principe |
64 | 65 | Saudi Arabia |
28 | 29 | South Sudan |
7 | 8 | Timor-Leste |
31 | 32 | Trinidad and Tobago |
112 | 113 | Venezuela |
67 | 68 | Yemen |
'LOUVAIN ID: 4'
node_id | label | |
---|---|---|
170 | 171 | Afghanistan |
206 | 208 | Australia |
12 | 13 | Benin |
24 | 25 | Bhutan |
207 | 209 | Bolivia |
52 | 53 | Burundi |
116 | 117 | Central African Republic |
101 | 102 | Guinea |
163 | 164 | Kyrgyzstan |
185 | 186 | Liberia |
77 | 78 | Mali |
137 | 138 | Mauritania |
4 | 5 | Mozambique |
165 | 166 | Niger |
5 | 6 | Papua New Guinea |
78 | 79 | Rwanda |
141 | 142 | Senegal |
202 | 204 | Sierra Leone |
142 | 143 | Solomon Islands |
109 | 110 | Somalia |
187 | 188 | Sudan |
230 | 232 | Suriname |
29 | 30 | Syria |
124 | 125 | Tajikistan |
111 | 112 | Tanzania |
110 | 111 | Togo |
167 | 168 | Turkmenistan |
8 | 9 | US Minor Outlying Islands |
99 | 100 | Western Sahara |
216 | 218 | Zambia |
'LOUVAIN ID: 8'
node_id | label | |
---|---|---|
96 | 97 | Antarctica |
125 | 126 | Bahrain |
176 | 177 | Botswana |
97 | 98 | China |
14 | 15 | Cocos (Keeling) Islands |
210 | 212 | Guam |
160 | 161 | Heard and McDonald Islands |
87 | 88 | Hong Kong |
162 | 163 | Israel |
134 | 135 | Japan |
102 | 103 | Lesotho |
195 | 197 | Macao |
46 | 47 | Malaysia |
62 | 63 | Malta |
164 | 165 | Namibia |
120 | 121 | Northern Mariana Islands |
215 | 217 | Philippines |
47 | 48 | Pitcairn |
233 | 235 | Samoa |
48 | 49 | Singapore |
65 | 66 | South Georgia and South Sandwich Islds. |
76 | 77 | South Korea |
113 | 114 | Taiwan |
79 | 80 | Vatican City |
95 | 96 | Vietnam |
169 | 170 | Wallis and Futuna |
'LOUVAIN ID: 7'
node_id | label | |
---|---|---|
10 | 11 | Argentina |
127 | 128 | Belize |
217 | 219 | Brazil |
34 | 35 | Burkina Faso |
156 | 157 | Côte d'Ivoire |
94 | 95 | Eswatini |
129 | 130 | Ethiopia |
38 | 39 | Ghana |
86 | 87 | Guatemala |
117 | 118 | Honduras |
151 | 152 | Kenya |
91 | 92 | Malawi |
108 | 109 | New Zealand |
197 | 199 | Nicaragua |
93 | 94 | Paraguay |
50 | 51 | Uganda |
168 | 169 | Uruguay |
146 | 147 | Uzbekistan |
68 | 69 | Zimbabwe |
'LOUVAIN ID: 1'
node_id | label | |
---|---|---|
69 | 70 | Bangladesh |
16 | 17 | Cabo Verde |
143 | 144 | El Salvador |
1 | 2 | Falkland Islands |
39 | 40 | Haiti |
59 | 60 | Kiribati |
43 | 44 | Maldives |
19 | 20 | Mauritius |
74 | 75 | Micronesia |
107 | 108 | Nauru |
122 | 123 | Seychelles |
136 | 137 | Sri Lanka |
66 | 67 | Tuvalu |
191 | 192 | Vanuatu |
'LOUVAIN ID: 2'
node_id | label | |
---|---|---|
172 | 173 | Armenia |
218 | 220 | Chile |
181 | 182 | Eritrea |
2 | 3 | Georgia |
150 | 151 | Jordan |
41 | 42 | Lebanon |
42 | 43 | Moldova |
25 | 26 | Mongolia |
225 | 227 | North Macedonia |
140 | 141 | Panama |
92 | 93 | Peru |
32 | 33 | South Africa |
190 | 191 | Ukraine |
33 | 34 | United Arab Emirates |
'LOUVAIN ID: 0'
node_id | label | |
---|---|---|
0 | 1 | American Samoa |
80 | 81 | Antigua and Barbuda |
13 | 14 | Barbados |
82 | 83 | Curaçao |
179 | 180 | Cyprus |
159 | 160 | Greece |
85 | 86 | Grenada |
58 | 59 | Jamaica |
119 | 120 | Marshall Islands |
214 | 216 | Niue |
60 | 61 | Saint Lucia |
9 | 10 | Saint Vincent and the Grenadines |
35 | 36 | The Bahamas |
'LOUVAIN ID: 3'
node_id | label | |
---|---|---|
212 | 214 | Cambodia |
56 | 57 | Comoros |
3 | 4 | Laos |
90 | 91 | Madagascar |
226 | 228 | Montenegro |
44 | 45 | Myanmar |
227 | 229 | Pakistan |
186 | 187 | Palau |
'LOUVAIN ID: 9'
node_id | label | |
---|---|---|
132 | 133 | British Indian Ocean Territory |
36 | 37 | Cook Islands |
182 | 183 | Faroe Islands |
155 | 156 | French Southern and Antarctic Lands |
222 | 224 | Guinea-Bissau |
161 | 162 | Iceland |
201 | 203 | Saint Helena, Ascension and Tristan da Cunha |
'LOUVAIN ID: 10'
node_id | label | |
---|---|---|
192 | 194 | Costa Rica |
83 | 84 | Dominica |
157 | 158 | Dominican Republic |
largest_louvain_id = comm_sizes_df.index[0].item()
largest_louvain_id
5
smallest_louvain_id = comm_sizes_df.index[-1].item()
smallest_louvain_id
10
Community Subgraphs#
Community subgraphs illustrates clusters where competition is more prevalent among its members than countries outside of the community. For this graph (our Econ CompNet, or compnet
), they are almost always (if not always) complete subgraphs. We can plot any cluster by its ID.
plot_cluster("louvain_id", largest_louvain_id)
Community Mapping#
Network visualization is not always the best approach to understand your data. This is a good example of this. Since we’re working with a complete (or nearly complete) subgraph, looking at relationships is less helpful, but looking at a map for a community is a lot more helpful, as we can see below.
plot_cluster("louvain_id", largest_louvain_id, kind="map")
Top Exported Products#
- Is there any export overlap between large and small communities?
largest_comm_top_exported = top_frac(
trade_per_cluster(
"louvain_id",
largest_louvain_id,
method="exports"
),
"total_amount_usd",
)
largest_comm_top_exported
product | total_amount_usd | |
---|---|---|
0 | Commodities not specified, according to kind | 1.506978e+12 |
1 | Oils petroleum, bituminous, distillates | 1.382886e+12 |
2 | Medicaments, doses, nes | 1.110865e+12 |
3 | Blood | 7.572479e+11 |
4 | Petroleum oils, crude | 5.977950e+11 |
5 | Automobiles nes, gas turbine powered | 5.668137e+11 |
6 | Gold in unwrought forms | 5.482768e+11 |
7 | Automobiles, spark ignition, 1500-3000cc | 5.231797e+11 |
8 | Transmit-receive apparatus for radio, TV | 4.899130e+11 |
9 | Monolithic integrated circuits, digital | 4.366876e+11 |
10 | Trade data discrepancies | 3.521978e+11 |
11 | Parts of data processing equipment | 3.358087e+11 |
12 | Automobiles, spark ignition, 1000-1500cc | 2.657018e+11 |
13 | Fixed wing aircraft, >15,000kg | 2.641497e+11 |
14 | Motor vehicle parts nes | 2.565987e+11 |
15 | Vaccines, human | 2.412159e+11 |
16 | Natural gas, liquefied | 2.406005e+11 |
17 | Gold, semi-manufactured forms | 2.394518e+11 |
smallest_comm_top_exported = top_frac(
trade_per_cluster(
"louvain_id",
smallest_louvain_id,
method="exports",
),
"total_amount_usd",
)
smallest_comm_top_exported
product | total_amount_usd | |
---|---|---|
0 | Instruments for medical science, nes | 1.032487e+10 |
1 | Medical needles, catheters | 8.305035e+09 |
2 | Trade data discrepancies | 7.981437e+09 |
jaccard_sim(
largest_comm_top_exported["product"],
smallest_comm_top_exported["product"]
)
0.05
Top Imported Products#
- Is there any import overlap between large and small communities?
largest_comm_top_imported = top_frac(
trade_per_cluster(
"louvain_id",
largest_louvain_id,
method="imports",
),
"total_amount_usd",
)
largest_comm_top_imported
product | total_amount_usd | |
---|---|---|
0 | Petroleum oils, crude | 1.933629e+12 |
1 | Commodities not specified, according to kind | 1.411868e+12 |
2 | Oils petroleum, bituminous, distillates | 1.197450e+12 |
3 | Transmit-receive apparatus for radio, TV | 9.829336e+11 |
4 | Medicaments, doses, nes | 8.788548e+11 |
5 | Trade data discrepancies | 7.574121e+11 |
6 | Gold in unwrought forms | 7.455694e+11 |
7 | Blood | 6.453376e+11 |
8 | Automobiles nes, gas turbine powered | 5.992672e+11 |
9 | Natural gas, as gas | 5.284708e+11 |
10 | Parts of data processing equipment | 5.119915e+11 |
11 | Automobiles, spark ignition, 1500-3000cc | 4.830173e+11 |
12 | Monolithic integrated circuits, digital | 4.750667e+11 |
smallest_comm_top_imported = top_frac(
trade_per_cluster(
"louvain_id",
smallest_louvain_id,
method="imports",
),
"total_amount_usd",
)
smallest_comm_top_imported
product | total_amount_usd | |
---|---|---|
0 | Oils petroleum, bituminous, distillates | 1.373957e+10 |
1 | Commodities not specified, according to kind | 7.539036e+09 |
2 | Transmit-receive apparatus for radio, TV | 3.136932e+09 |
3 | Automobiles, spark ignition, 1500-3000cc | 2.425662e+09 |
4 | Jewellery of precious metal | 2.326848e+09 |
5 | Instruments for medical science, nes | 2.297871e+09 |
6 | Monolithic integrated circuits, digital | 2.243517e+09 |
7 | Maize except seed corn | 2.138190e+09 |
8 | Natural gas, liquefied | 2.103650e+09 |
9 | Petroleum oils, crude | 2.089410e+09 |
10 | Propane, liquefied | 2.061530e+09 |
jaccard_sim(
largest_comm_top_imported["product"],
smallest_comm_top_imported["product"],
)
0.3333333333333333
Trade Alignment#
Trade alignment can be used to determine a cluster’s self-sufficiency by looking at internal country-country trade, or it can be used to determine a cluster’s external competitiveness by looking at inter-cluster country-country trade. We determine both dimensions of trade alignment (intra and inter cluster) based on the supply/demand ratio, more specifically the weighted average of log-SDR, with weights being total amounts (USD) of exports/imports, globally per cluster.
This score is scaled to a 0..1
range using a sigmoid transformation, so anything above 0.5 should be good. The log-transformation ensures the distribution is not skewed.
Self-Sufficiency#
Most communities are self-sufficient or nearly self-sufficient, with only community 5 showing a little more vulnerability.
comm_self_sufficiency_df = pd.DataFrame(
dict(
louvain_id=louvain_id,
score=global_sdr_score(
trade_alignment_by_cluster(
"louvain_id",
louvain_id,
method="intra",
)
),
)
for louvain_id in comm_sizes_df.index
).sort_values("score", ascending=False)
comm_self_sufficiency_df
louvain_id | score | |
---|---|---|
9 | 9 | 0.985065 |
2 | 4 | 0.970089 |
5 | 1 | 0.959150 |
8 | 3 | 0.939180 |
10 | 10 | 0.895896 |
1 | 6 | 0.869495 |
4 | 7 | 0.860508 |
6 | 2 | 0.742791 |
3 | 8 | 0.644520 |
7 | 0 | 0.564580 |
0 | 5 | 0.493976 |
colors = comm_self_sufficiency_df.score.apply(
lambda s: MPL_PALETTE[0] if s >= 0.5 else MPL_PALETTE[1]
)
fig, ax = plt.subplots(figsize=(18, 3))
comm_self_sufficiency_df.plot.bar(
x="louvain_id",
y="score",
xlabel="Community ID",
color=colors,
rot=0,
ax=ax,
)
plt.axhline(
y=0.5,
color=MPL_PALETTE[1],
linestyle="--",
linewidth=2,
)
plt.legend([
"Self-Sufficiency Threshold",
"Global Log-SDR Score"
])
plt.show()
compnet_louvain_df[compnet_louvain_df.louvain_id == 5]
node_id | label | louvain_id | |
---|---|---|---|
6 | 7 | Qatar | 5 |
18 | 19 | Lithuania | 5 |
20 | 21 | Portugal | 5 |
21 | 22 | Palestine | 5 |
22 | 23 | British Virgin Islands | 5 |
... | ... | ... | ... |
221 | 223 | Spain | 5 |
224 | 226 | India | 5 |
228 | 230 | Romania | 5 |
231 | 233 | Slovenia | 5 |
232 | 234 | Turkiye | 5 |
67 rows × 3 columns
External Competitiveness#
Most communities are not particularly competitive externally, but this was to be expected due to the criteria used to cluster—community dense subgraphs also point to higher internal competition.
comm_external_comp_df = pd.DataFrame(
dict(
louvain_id=louvain_id,
score=global_sdr_score(
trade_alignment_by_cluster(
"louvain_id",
louvain_id,
method="inter",
)
),
)
for louvain_id in comm_sizes_df.index
).sort_values("score", ascending=False)
comm_external_comp_df
louvain_id | score | |
---|---|---|
0 | 5 | 0.360300 |
3 | 8 | 0.304639 |
1 | 6 | 0.200342 |
2 | 4 | 0.121624 |
4 | 7 | 0.089482 |
6 | 2 | 0.072207 |
5 | 1 | 0.055127 |
8 | 3 | 0.033868 |
9 | 9 | 0.018567 |
10 | 10 | 0.012569 |
7 | 0 | 0.010781 |
colors = comm_external_comp_df.score.apply(
lambda s: MPL_PALETTE[0] if s >= 0.5 else MPL_PALETTE[1]
)
fig, ax = plt.subplots(figsize=(18, 3))
comm_external_comp_df.plot.bar(
x="louvain_id",
y="score",
xlabel="Community ID",
color=colors,
rot=0,
ax=ax,
)
plt.axhline(
y=0.5,
color=MPL_PALETTE[1],
linestyle="--",
linewidth=2,
)
plt.legend([
"External Competitiveness Threshold",
"Global SDR Score"
])
plt.show()
compnet_louvain_df[compnet_louvain_df.louvain_id == 8]
node_id | label | louvain_id | |
---|---|---|---|
14 | 15 | Cocos (Keeling) Islands | 8 |
46 | 47 | Malaysia | 8 |
47 | 48 | Pitcairn | 8 |
48 | 49 | Singapore | 8 |
62 | 63 | Malta | 8 |
65 | 66 | South Georgia and South Sandwich Islds. | 8 |
76 | 77 | South Korea | 8 |
79 | 80 | Vatican City | 8 |
87 | 88 | Hong Kong | 8 |
95 | 96 | Vietnam | 8 |
96 | 97 | Antarctica | 8 |
97 | 98 | China | 8 |
102 | 103 | Lesotho | 8 |
113 | 114 | Taiwan | 8 |
120 | 121 | Northern Mariana Islands | 8 |
125 | 126 | Bahrain | 8 |
134 | 135 | Japan | 8 |
160 | 161 | Heard and McDonald Islands | 8 |
162 | 163 | Israel | 8 |
164 | 165 | Namibia | 8 |
169 | 170 | Wallis and Futuna | 8 |
176 | 177 | Botswana | 8 |
195 | 197 | Macao | 8 |
210 | 212 | Guam | 8 |
215 | 217 | Philippines | 8 |
233 | 235 | Samoa | 8 |
Weakly Connected Competitors#
Strongly connected components in our graph would have capture mutual competition among peers, cyclical or balanced rivalries, or equivalent strategic positions. However, once we removed the “Undeclared” pseudo-country, we weren’t able to find any strongly connected components that were not singletons.
As such, we compute the weakly connected components, instead capturing the individual or isolated components of countries competing among themselves, regardless of export amount (which establishes direction, in our graph).
conn.execute(
"""
ALTER TABLE Country DROP IF EXISTS wcc_id;
ALTER TABLE Country ADD IF NOT EXISTS wcc_id INT64;
CALL weakly_connected_components("compnet")
WITH node, group_id
SET node.wcc_id = group_id;
"""
)
compnet_wcc_df = conn.execute(
"""
MATCH (c:Country)
WHERE c.country_name_short <> "Undeclared"
RETURN
c.node_id AS node_id,
c.country_name_short AS label,
c.wcc_id AS wcc_id
"""
).get_as_df()
node_classes = {
k: g.node_id.to_list()
for k, g in compnet_wcc_df.groupby("wcc_id")
}
vis.set_labels(compnet_g, LABEL_PROPS)
vis.plot(
compnet_g,
node_classes=node_classes,
hide_edges=True,
)
As we can see, there a multiple weakly connected competitors, but most of them are single nodes in their own SCC. Other than that, there is a large component of 64 countries, and then two other smaller components with over 20 nodes each, that we’ll inspect below.
wcc_sizes_df = (
compnet_wcc_df[["wcc_id", "node_id"]]
.groupby("wcc_id")
.count()
.rename(columns=dict(node_id="num_nodes"))
)
wcc_sizes_df = wcc_sizes_df.reindex(
wcc_sizes_df.num_nodes.sort_values(ascending=False).index
)
wcc_sizes_df
num_nodes | |
---|---|
wcc_id | |
0 | 64 |
1 | 28 |
4 | 24 |
5 | 11 |
2 | 9 |
... | ... |
209 | 1 |
215 | 1 |
226 | 1 |
228 | 1 |
230 | 1 |
68 rows × 1 columns
wcc_sizes_ord_df = wcc_sizes_df.reset_index(drop=True)
wcc_singleton_threshold = (
wcc_sizes_ord_df[wcc_sizes_ord_df.num_nodes <= 1]
.index[0]
.item()
)
fig, ax = plt.subplots(figsize=(30, 5))
wcc_sizes_df.plot.bar(rot=0, ax=ax)
plt.axvline(
x=wcc_singleton_threshold,
color=MPL_PALETTE[1],
linestyle="--",
linewidth=2,
)
plt.legend(["Singleton Threshold", "No. Nodes"])
plt.show()
Let’s take a look at the members of each weak component, from largest to smallest.
for wcc_id in wcc_sizes_df[wcc_sizes_df.num_nodes > 1].index:
display(f"WCC ID: {wcc_id}")
display(
compnet_wcc_df[compnet_wcc_df.wcc_id == wcc_id]
.drop(columns="wcc_id")
)
'WCC ID: 0'
node_id | label | |
---|---|---|
0 | 1 | American Samoa |
6 | 7 | Qatar |
11 | 12 | Azerbaijan |
14 | 15 | Cocos (Keeling) Islands |
18 | 19 | Lithuania |
... | ... | ... |
211 | 213 | Guyana |
215 | 217 | Philippines |
221 | 223 | Spain |
224 | 226 | India |
231 | 233 | Slovenia |
64 rows × 2 columns
'WCC ID: 1'
node_id | label | |
---|---|---|
1 | 2 | Falkland Islands |
16 | 17 | Cabo Verde |
19 | 20 | Mauritius |
26 | 27 | Oman |
39 | 40 | Haiti |
43 | 44 | Maldives |
47 | 48 | Pitcairn |
55 | 56 | Colombia |
59 | 60 | Kiribati |
66 | 67 | Tuvalu |
67 | 68 | Yemen |
69 | 70 | Bangladesh |
74 | 75 | Micronesia |
79 | 80 | Vatican City |
102 | 103 | Lesotho |
122 | 123 | Seychelles |
125 | 126 | Bahrain |
132 | 133 | British Indian Ocean Territory |
155 | 156 | French Southern and Antarctic Lands |
161 | 162 | Iceland |
162 | 163 | Israel |
176 | 177 | Botswana |
177 | 178 | Cameroon |
182 | 183 | Faroe Islands |
191 | 192 | Vanuatu |
201 | 203 | Saint Helena, Ascension and Tristan da Cunha |
220 | 222 | Ecuador |
223 | 225 | Greenland |
'WCC ID: 4'
node_id | label | |
---|---|---|
4 | 5 | Mozambique |
12 | 13 | Benin |
38 | 39 | Ghana |
50 | 51 | Uganda |
52 | 53 | Burundi |
73 | 74 | Finland |
75 | 76 | Gabon |
77 | 78 | Mali |
78 | 79 | Rwanda |
99 | 100 | Western Sahara |
101 | 102 | Guinea |
111 | 112 | Tanzania |
116 | 117 | Central African Republic |
121 | 122 | Sweden |
128 | 129 | Switzerland |
131 | 132 | United Kingdom |
163 | 164 | Kyrgyzstan |
165 | 166 | Niger |
170 | 171 | Afghanistan |
184 | 185 | Kazakhstan |
193 | 195 | Ireland |
205 | 207 | Andorra |
219 | 221 | Republic of the Congo |
230 | 232 | Suriname |
'WCC ID: 5'
node_id | label | |
---|---|---|
5 | 6 | Papua New Guinea |
8 | 9 | US Minor Outlying Islands |
58 | 59 | Jamaica |
60 | 61 | Saint Lucia |
83 | 84 | Dominica |
85 | 86 | Grenada |
120 | 121 | Northern Mariana Islands |
145 | 146 | Tonga |
157 | 158 | Dominican Republic |
192 | 194 | Costa Rica |
210 | 212 | Guam |
'WCC ID: 2'
node_id | label | |
---|---|---|
2 | 3 | Georgia |
3 | 4 | Laos |
27 | 28 | North Korea |
33 | 34 | United Arab Emirates |
44 | 45 | Myanmar |
172 | 173 | Armenia |
174 | 175 | Belgium |
212 | 214 | Cambodia |
226 | 228 | Montenegro |
'WCC ID: 10'
node_id | label | |
---|---|---|
10 | 11 | Argentina |
86 | 87 | Guatemala |
93 | 94 | Paraguay |
117 | 118 | Honduras |
168 | 169 | Uruguay |
197 | 199 | Nicaragua |
217 | 219 | Brazil |
'WCC ID: 9'
node_id | label | |
---|---|---|
9 | 10 | Saint Vincent and the Grenadines |
56 | 57 | Comoros |
90 | 91 | Madagascar |
119 | 120 | Marshall Islands |
186 | 187 | Palau |
'WCC ID: 15'
node_id | label | |
---|---|---|
15 | 16 | Democratic Republic of the Congo |
32 | 33 | South Africa |
92 | 93 | Peru |
181 | 182 | Eritrea |
218 | 220 | Chile |
'WCC ID: 42'
node_id | label | |
---|---|---|
42 | 43 | Moldova |
190 | 191 | Ukraine |
203 | 205 | Serbia |
228 | 230 | Romania |
'WCC ID: 22'
node_id | label | |
---|---|---|
22 | 23 | British Virgin Islands |
36 | 37 | Cook Islands |
81 | 82 | Bermuda |
178 | 179 | Cayman Islands |
'WCC ID: 17'
node_id | label | |
---|---|---|
17 | 18 | Iran |
31 | 32 | Trinidad and Tobago |
158 | 159 | Equatorial Guinea |
'WCC ID: 137'
node_id | label | |
---|---|---|
137 | 138 | Mauritania |
185 | 186 | Liberia |
206 | 208 | Australia |
'WCC ID: 40'
node_id | label | |
---|---|---|
40 | 41 | Indonesia |
232 | 234 | Turkiye |
'WCC ID: 24'
node_id | label | |
---|---|---|
24 | 25 | Bhutan |
216 | 218 | Zambia |
'WCC ID: 94'
node_id | label | |
---|---|---|
94 | 95 | Eswatini |
127 | 128 | Belize |
'WCC ID: 49'
node_id | label | |
---|---|---|
49 | 50 | Tunisia |
61 | 62 | Morocco |
'WCC ID: 110'
node_id | label | |
---|---|---|
110 | 111 | Togo |
141 | 142 | Senegal |
'WCC ID: 109'
node_id | label | |
---|---|---|
109 | 110 | Somalia |
187 | 188 | Sudan |
'WCC ID: 136'
node_id | label | |
---|---|---|
136 | 137 | Sri Lanka |
143 | 144 | El Salvador |
'WCC ID: 21'
node_id | label | |
---|---|---|
21 | 22 | Palestine |
199 | 201 | Poland |
'WCC ID: 167'
node_id | label | |
---|---|---|
167 | 168 | Turkmenistan |
207 | 209 | Bolivia |
'WCC ID: 160'
node_id | label | |
---|---|---|
160 | 161 | Heard and McDonald Islands |
166 | 167 | Saint Pierre and Miquelon |
'WCC ID: 223'
node_id | label | |
---|---|---|
222 | 224 | Guinea-Bissau |
233 | 235 | Samoa |
largest_wcc_id = wcc_sizes_df.index[0].item()
largest_wcc_id
0
smallest_wcc_id = wcc_sizes_df.index[-1].item()
smallest_wcc_id
230
Component Subgraphs#
plot_cluster("wcc_id", largest_wcc_id)
Component Mapping#
plot_cluster("wcc_id", largest_wcc_id, kind="map")
Top Exported Products#
- Is there any export overlap between large and small components?
largest_wcc_top_exported = top_frac(
trade_per_cluster("wcc_id", largest_wcc_id, "exports"),
"total_amount_usd",
)
largest_wcc_top_exported
product | total_amount_usd | |
---|---|---|
0 | Monolithic integrated circuits, digital | 2.880361e+12 |
1 | Petroleum oils, crude | 2.815121e+12 |
2 | Oils petroleum, bituminous, distillates | 2.185186e+12 |
3 | Commodities not specified, according to kind | 1.859842e+12 |
4 | Transmit-receive apparatus for radio, TV | 1.629116e+12 |
5 | Trade data discrepancies | 9.872075e+11 |
6 | Parts of data processing equipment | 7.646240e+11 |
7 | Medicaments, doses, nes | 7.542306e+11 |
smallest_wcc_top_exported = top_frac(
trade_per_cluster("wcc_id", smallest_wcc_id, "exports"),
"total_amount_usd",
)
smallest_wcc_top_exported
product | total_amount_usd | |
---|---|---|
0 | Petroleum oils, crude | 24676511.0 |
jaccard_sim(
largest_wcc_top_exported["product"],
smallest_wcc_top_exported["product"]
)
0.125
Top Imported Products#
- Is there any import overlap between large and small components?
largest_wcc_top_imported = top_frac(
trade_per_cluster("wcc_id", largest_wcc_id, "imports"),
"total_amount_usd",
)
largest_wcc_top_imported
product | total_amount_usd | |
---|---|---|
0 | Petroleum oils, crude | 3.075323e+12 |
1 | Monolithic integrated circuits, digital | 2.780868e+12 |
2 | Commodities not specified, according to kind | 1.510984e+12 |
3 | Oils petroleum, bituminous, distillates | 1.420618e+12 |
4 | Transmit-receive apparatus for radio, TV | 1.332514e+12 |
5 | Trade data discrepancies | 1.248542e+12 |
6 | Medicaments, doses, nes | 8.050835e+11 |
7 | Parts of data processing equipment | 6.992992e+11 |
8 | Automobiles, spark ignition, 1500-3000cc | 6.571986e+11 |
smallest_wcc_top_imported = top_frac(
trade_per_cluster("wcc_id", smallest_wcc_id, "imports"),
"total_amount_usd",
)
smallest_wcc_top_imported
product | total_amount_usd | |
---|---|---|
0 | Oils petroleum, bituminous, distillates | 88922018.0 |
1 | Cargo vessels, not tanker or refrigerated | 23292230.0 |
2 | Commodities not specified, according to kind | 21288342.0 |
3 | Rice, semi- or wholly-milled | 15654678.0 |
jaccard_sim(
largest_comm_top_imported["product"],
smallest_wcc_top_imported["product"]
)
0.13333333333333333
Trade Alignment#
Again, trade alignment can be used to determine a cluster’s self-sufficiency by looking at internal country-country trade, or it can be used to determine a cluster’s external competitiveness by looking at inter-cluster country-country trade. We determine both dimensions of trade alignment (intra and inter cluster) based on the supply/demand ratio, more specifically the weighted average of log-SDR, with weights being total amounts (USD) of exports/imports, globally per cluster.
This score is scaled to a 0..1
range using a sigmoid transformation, so anything above 0.5 should be good. The log-transformation ensures the distribution is not skewed.
Self-Sufficiency#
Most components are self-sufficient or nearly self-sufficient, with only three of them, components 209, 22 and 196, showing a little more vulnerability.
wcc_self_sufficiency_df = pd.DataFrame(
dict(
wcc_id=wcc_id,
score=global_sdr_score(
trade_alignment_by_cluster(
"wcc_id",
wcc_id,
method="intra",
)
),
)
for wcc_id in wcc_sizes_df.index
).sort_values("score", ascending=False)
wcc_self_sufficiency_df
wcc_id | score | |
---|---|---|
20 | 167 | 0.999962 |
57 | 197 | 0.999932 |
25 | 34 | 0.999926 |
23 | 25 | 0.999721 |
52 | 156 | 0.999568 |
... | ... | ... |
2 | 4 | 0.586408 |
0 | 0 | 0.518923 |
63 | 209 | 0.464203 |
9 | 22 | 0.300144 |
56 | 196 | 0.280055 |
68 rows × 2 columns
colors = wcc_self_sufficiency_df.score.apply(
lambda s: MPL_PALETTE[0] if s >= 0.5 else MPL_PALETTE[1]
)
fig, ax = plt.subplots(figsize=(30, 5))
wcc_self_sufficiency_df.plot.bar(
x="wcc_id",
y="score",
xlabel="Weak Component ID",
color=colors,
rot=0,
ax=ax,
)
plt.axhline(
y=0.5,
color=MPL_PALETTE[1],
linestyle="--",
linewidth=2,
)
plt.legend([
"Self-Sufficiency Threshold",
"Global SDR Score"
])
plt.show()
compnet_wcc_df[compnet_wcc_df.wcc_id == 0]
node_id | label | wcc_id | |
---|---|---|---|
0 | 1 | American Samoa | 0 |
6 | 7 | Qatar | 0 |
11 | 12 | Azerbaijan | 0 |
14 | 15 | Cocos (Keeling) Islands | 0 |
18 | 19 | Lithuania | 0 |
... | ... | ... | ... |
211 | 213 | Guyana | 0 |
215 | 217 | Philippines | 0 |
221 | 223 | Spain | 0 |
224 | 226 | India | 0 |
231 | 233 | Slovenia | 0 |
64 rows × 3 columns
External Competitiveness#
Most components are not particularly competitive externally, even less so than communities, with the large majority having a SDR-based score lower than 0.1.
wcc_external_comp_df = pd.DataFrame(
dict(
wcc_id=wcc_id,
score=global_sdr_score(
trade_alignment_by_cluster(
"wcc_id",
wcc_id,
method="inter",
)
),
)
for wcc_id in wcc_sizes_df.index
).sort_values("score", ascending=False)
wcc_external_comp_df
wcc_id | score | |
---|---|---|
0 | 0 | 0.424935 |
11 | 137 | 0.135049 |
2 | 4 | 0.112296 |
7 | 15 | 0.086413 |
5 | 10 | 0.084617 |
... | ... | ... |
59 | 195 | 0.000212 |
21 | 160 | 0.000073 |
67 | 230 | 0.000053 |
43 | 114 | 0.000023 |
26 | 65 | 0.000009 |
68 rows × 2 columns
colors = wcc_external_comp_df.score.apply(
lambda s: MPL_PALETTE[0] if s >= 0.5 else MPL_PALETTE[1]
)
fig, ax = plt.subplots(figsize=(30, 5))
wcc_external_comp_df.plot.bar(
x="wcc_id",
y="score",
xlabel="Weak Component ID",
color=colors,
rot=0,
ax=ax,
)
plt.axhline(
y=0.5,
color=MPL_PALETTE[1],
linestyle="--",
linewidth=2,
)
plt.legend([
"External Competitiveness Threshold",
"Global SDR Score"
])
plt.show()
compnet_louvain_df[compnet_louvain_df.louvain_id == 0]
node_id | label | louvain_id | |
---|---|---|---|
0 | 1 | American Samoa | 0 |
9 | 10 | Saint Vincent and the Grenadines | 0 |
13 | 14 | Barbados | 0 |
35 | 36 | The Bahamas | 0 |
58 | 59 | Jamaica | 0 |
60 | 61 | Saint Lucia | 0 |
80 | 81 | Antigua and Barbuda | 0 |
82 | 83 | Curaçao | 0 |
85 | 86 | Grenada | 0 |
119 | 120 | Marshall Islands | 0 |
159 | 160 | Greece | 0 |
179 | 180 | Cyprus | 0 |
214 | 216 | Niue | 0 |
Communities vs Components#
By matching the clustering (communities and weak components) with the highest number of clusters, and therefore smaller clusters, to the clustering with the lowest number of clusters, we can run a pairwise cluster comparison:
- Which countries belong to a community, but not the weak component?
- Which countries belong to a weak component, but not the community?
- Which countries belong to both?
- Is there a particular semantic to these countries?
len(wcc_sizes_df), len(comm_sizes_df)
(68, 11)
NN-Clusters#
We compute community to weak component similarities, selecting the nearest-neighbor community for each component. Given the higher number of components when compared to communities, we’ll necessarily have repeated nearest-neighbor communities.
cluster_sim_df = []
for wcc_id, wcc in compnet_wcc_df.groupby("wcc_id"):
for louvain_id, comm in (
compnet_louvain_df.groupby("louvain_id")
):
cluster_sim_df.append(
dict(
wcc_id=wcc_id,
louvain_id=louvain_id,
sim=jaccard_sim(wcc.label, comm.label),
)
)
cluster_sim_df = pd.DataFrame(cluster_sim_df)
cluster_sim_df = cluster_sim_df.loc[
cluster_sim_df
.groupby(["wcc_id"])
.idxmax()
.sim
]
cluster_sim_df
wcc_id | louvain_id | sim | |
---|---|---|---|
5 | 0 | 5 | 0.297030 |
12 | 1 | 1 | 0.354839 |
25 | 2 | 3 | 0.307692 |
37 | 4 | 4 | 0.317073 |
54 | 5 | 10 | 0.272727 |
... | ... | ... | ... |
693 | 215 | 0 | 0.076923 |
713 | 223 | 9 | 0.125000 |
717 | 226 | 2 | 0.071429 |
729 | 228 | 3 | 0.125000 |
743 | 230 | 6 | 0.030303 |
68 rows × 3 columns
For example, community 5 matches with 20 different weak components.
cluster_sim_df.louvain_id.value_counts()
louvain_id
5 20
7 10
4 10
2 7
6 6
8 4
1 3
3 3
0 2
9 2
10 1
Name: count, dtype: int64
cluster_sim_df[cluster_sim_df.louvain_id == 5]
wcc_id | louvain_id | sim | |
---|---|---|---|
5 | 0 | 5 | 0.297030 |
126 | 21 | 5 | 0.029851 |
192 | 40 | 5 | 0.029851 |
225 | 49 | 5 | 0.029851 |
269 | 72 | 5 | 0.014925 |
280 | 84 | 5 | 0.014925 |
313 | 98 | 5 | 0.014925 |
335 | 103 | 5 | 0.014925 |
401 | 114 | 5 | 0.014925 |
423 | 126 | 5 | 0.014925 |
489 | 144 | 5 | 0.014925 |
511 | 147 | 5 | 0.014925 |
544 | 153 | 5 | 0.014925 |
599 | 173 | 5 | 0.014925 |
610 | 195 | 5 | 0.014925 |
632 | 197 | 5 | 0.014925 |
643 | 199 | 5 | 0.014925 |
654 | 201 | 5 | 0.014925 |
676 | 209 | 5 | 0.014925 |
687 | 214 | 5 | 0.014925 |
Set Comparison#
Let’s select a weakest component and retrieve its NN community to compare.
## comp_wcc_id = largest_wcc_id
comp_wcc_id = compnet_wcc_df.loc[
compnet_wcc_df.label == "Australia",
"wcc_id"
].item()
comp_comm_id = cluster_sim_df.loc[
cluster_sim_df.wcc_id == comp_wcc_id,
"louvain_id",
].item()
comp_wcc_id, comp_comm_id
(137, 4)
comp_wcc_countries = set(
compnet_wcc_df.loc[
compnet_wcc_df.wcc_id == comp_wcc_id,
"label"
]
)
comp_louvain_countries = set(
compnet_louvain_df.loc[
compnet_louvain_df.louvain_id == comp_comm_id,
"label"
]
)
WCC Exclusive#
pd.Series(
list(comp_wcc_countries - comp_louvain_countries),
name="country",
).sort_values().to_frame()
country |
---|
Community Exclusive#
pd.Series(
list(comp_louvain_countries - comp_wcc_countries),
name="country",
).sort_values().to_frame()
country | |
---|---|
5 | Afghanistan |
26 | Benin |
20 | Bhutan |
14 | Bolivia |
24 | Burundi |
18 | Central African Republic |
4 | Guinea |
1 | Kyrgyzstan |
23 | Mali |
8 | Mozambique |
25 | Niger |
17 | Papua New Guinea |
7 | Rwanda |
21 | Senegal |
2 | Sierra Leone |
13 | Solomon Islands |
0 | Somalia |
15 | Sudan |
12 | Suriname |
10 | Syria |
19 | Tajikistan |
6 | Tanzania |
3 | Togo |
9 | Turkmenistan |
11 | US Minor Outlying Islands |
16 | Western Sahara |
22 | Zambia |
WCC and Community Overlap#
pd.Series(
list(comp_wcc_countries | comp_louvain_countries),
name="country",
).sort_values().to_frame()
country | |
---|---|
17 | Afghanistan |
9 | Australia |
14 | Benin |
10 | Bhutan |
5 | Bolivia |
29 | Burundi |
26 | Central African Republic |
2 | Guinea |
1 | Kyrgyzstan |
27 | Liberia |
28 | Mali |
8 | Mauritania |
19 | Mozambique |
13 | Niger |
6 | Papua New Guinea |
18 | Rwanda |
11 | Senegal |
15 | Sierra Leone |
4 | Solomon Islands |
0 | Somalia |
24 | Sudan |
23 | Suriname |
21 | Syria |
7 | Tajikistan |
3 | Tanzania |
16 | Togo |
20 | Turkmenistan |
22 | US Minor Outlying Islands |
25 | Western Sahara |
12 | Zambia |
Economic Pressure (PageRank)#
Economic pressure can easily be measured using PageRank, as it is a converging metric that aggregates the overall incoming competition strength, increasing its value as the contributing competing countries are themselves under economic pressure.
conn.execute(
"""
ALTER TABLE Country DROP IF EXISTS pagerank;
ALTER TABLE Country ADD IF NOT EXISTS pagerank DOUBLE;
CALL page_rank("compnet", maxIterations := 100)
WITH node, rank
SET node.pagerank = rank
"""
)
Most Pressured Countries#
most_pressured_df = conn.execute(
"""
MATCH (c:Country)
WHERE c.country_name_short <> "Undeclared"
RETURN
c.node_id AS node_id,
c.country_name_short AS label,
c.pagerank AS pagerank
ORDER BY c.pagerank DESC
LIMIT 25
"""
).get_as_df()
fig, ax = plt.subplots(figsize=(5, 8))
(
most_pressured_df.iloc[::-1]
.plot.barh(x="label", y="pagerank", ax=ax)
)
plt.ylabel(None)
plt.legend([])
plt.show()
Least Pressured Countries#
least_pressured_df = conn.execute(
"""
MATCH (c:Country)
WHERE c.country_name_short <> "Undeclared"
RETURN
c.node_id AS node_id,
c.country_name_short AS label,
c.pagerank AS pagerank
ORDER BY c.pagerank ASC
LIMIT 25
"""
).get_as_df()
fig, ax = plt.subplots(figsize=(5, 8))
(
least_pressured_df.iloc[::-1]
.plot.barh(x="label", y="pagerank", ax=ax)
)
plt.ylabel(None)
plt.legend([])
plt.show()
Closing Remarks#
Economies are complex systems, and the complex relations between markets can be captured using a graph. Determining which nodes and relationships to model is crucial to interpretation—our graph focused on competition relationships, and so our metrics and partition approaches illustrated this.
Network analysis tools are usually not as exotic as they want to make us believe. Useful graph data science is usually not that complex, particularly now that tooling is widely available, but it can certainly be extremely insightful, specially when the graph is correctly modeled.
This is only a small introduction to this topic, using world economy and trade as an example topic, which I have been particularly interested in.
The economy and the world overall is suffering. Graphs will help us find solution to complex problems, but it requires the commitment to always ask yourself: could I do this without a graph? When the answer is yes, then you should rethink your approach. If you’re not looking at complex relations, you’re just doing more of the same.
Bottom line, use graphs and use them correctly.