Economic Competition Networks

Summary
#

In this video, we reproduce the approach that predicts Survivor winners and apply it to Economic Competition Networks to better understand world trade and economic leaders. We build a country to country competition network based on the Export Similarity Index (ESI), and we use several techniques from network science, like PageRank, community detection, weak component analysis, or the recent common out-neighbor (CON) score, to better understand how countries compete with each other within the world economy, identifying dominating or leading economies, as well as their counterpart weaker or smaller economies.

Dataset
#

We use The Atlas of Economic Complexity dataset, which is summarized in the following table. We only provide a top-level overview of the data here. For an in-depth detailed description, click the Download button in each table row of the link above—that will open a popup with detailed information on the fields for each CSV file.

	Title	Description
	Complexity Rankings & Growth Projections	Economic Complexity Index (ECI) and partial growth projections for world economies from 1995 to 2023.
	Country Trade by Product	Exports and imports, per country and product, over the years. Different files provide a different product category granularity based on the number of HS92, HS12 or SITC digits. Different files are also provided for services, using a non-standard classification internal to Growth Labs that also provides different digit-based granularities.
	Total Trade by Country	Total exports and imports, per country, over the years. Different files provide data about products and services. How big is the economy for a country? How did it progress over the last 28 years?
	Total Trade by Product	Total exports and imports, per product, over the years. Again, this is provided at different product granularities based on HS92, HS12 or SITC digits. Different files are also provided for services, using a non-standard classification internal to Growth Labs that also provides different digit-based granularities. How big is the market for a product? How did it progress over the last 28 years?
	Country Trade by Partner	Bilateral exports and imports between pairs of countries, over the years.
✅	Country Trade by Partner and Product	Bilateral exports and imports between pairs of countries, for a given product, over the years. This is provided at 6-digit granularity based on HS92, HS12 or SITC digits. This is partitioned into multiple files in blocks of 10 years (or 5 years only for 1995-1999). A granularity of 4 digits would be enough to distinguish between main product types (e.g., beef vs pork vs poultry, fresh vs frozen; gasoline engines vs diesel engines). With 6 digits we get a lot more detail (e.g., carcasses and half-carcasses of bovine animals, fresh or chilled; engines for aircraft). We use the HS92 data with 6 digits—the only one available, but also ideal to capture trade competition between countries, as true competition is only uncovered at a smaller scale. We only look at the 2020-2023 period, for recency, aggregating totals for those three years.
✅	Country Classification	Country metadata.
	Regional Classification	Regional classification for countries—continent it belongs to, political region (e.g., European Union), subregion (e.g., Central America, Western Africa), trade regions (e.g., NAFTA, OPEC), etc.
	HS12 Product Classification	Product metadata according to HS12 codes.
✅	HS92 Product Classification	Product metadata according to HS92 codes. We use this to inspect products traded by salient countries during the analysis.
	Services Product Classification	Services metadata according to a non-standard classification internal to Growth Labs. We use this to inspect services traded by salient countries during the analysis.
	SITC Product Classification	Product metadata according to SITC codes.
	Product Space Related Edges	HS92 4-digit codes for source and target products in the same space (e.g., women’s coats ⇄ sweaters).
	Product Space Layout	HS92 4-digit codes for products along with their 2D embedding, where close products are co-exported by countries.

Here are the citations for the datasets that we use:

Country Trade by Partner and Product:

The Growth Lab at Harvard University, 2025, “International Trade Data (HS92)”, https://doi.org/10.7910/DVN/T4CHWJ, Harvard Dataverse

Country Classification & HS92 Product Classification:

The Growth Lab at Harvard University, 2025, “Classifications Data”, https://doi.org/10.7910/DVN/3BAL1O, Harvard Dataverse

Graph Schema
#

Out of the three CSV files that we identified above as being used, we produce the following nodes and relationship labels:

Nodes
- Country
  - node_id – globally unique node identifier – INT64
  - Properties from all Country Classification columns
- Product
  - node_id – globally unique node identifier – INT64
  - Properties from all HS92 Product Classification columns
Relationships
- (:Country)-[:CompetesWith]->(:Country)
  - ESI – Export Similarity Index – DOUBLE
- (:Country)-[:Exports]->(:Product)
  - amount_usd – exports dollar amount (2020-2023) – INT128
- (:Country)<-[:Imports]->(:Product)
  - amount_usd – imports dollar amount (2020-2023) – INT128

Take a look at the following diagram, where rectangles represent the raw CSV files, with dashed arrows illustrating the data source, and circles represent nodes, with solid arrows representing relationships.

Jupyter Notebook
#

The following sections are an adaptation of the Jupyter Notebook that we created to analyze the Economic Competition Network.

Setup
#

ETL
#

For ETL, we directly call the appropriate dlctl commands for:

Ingesting the dataset
Transforming using SQL on top of DuckLake
Exporting from the data lakehouse into Parquet
Loading the graph into Kuzu
Computing general analytics scores

Be sure to uncomment the cell below and run it once.

!dlctl ingest dataset -t atlas \
    "The Atlas of Economic Complexity"
!dlctl transform -m +marts.graphs.econ_comp
!dlctl export dataset graphs econ_comp
!dlctl graph load econ_comp
!dlctl graph compute con-score econ_comp Country CompetesWith

Imports
#

from pathlib import Path
from string import Template
from textwrap import dedent
from typing import Any, Literal, Optional

import kuzu
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import pandas as pd
from scipy.special import expit

import graph.visualization as vis
from shared.settings import LOCAL_DIR, env

Globals
#

We setup access to the appropriate Kuzu path, based on the shared .env configuration, ensuring the graph exists before running the notebook. Once setup, conn will be used to query the graph directly throughout this notebook.

db_path = Path(LOCAL_DIR) / env.str("ECON_COMP_GRAPH_DB")
assert db_path.exists(), \
    "You need to create the graph DB using dlctl first"

db = kuzu.Database(db_path)
conn = kuzu.Connection(db)

Constants
#

In order to ensure color consistency for our plots, we extract the color palette from matplotlib into MPL_PALETTE.

MPL_PALETTE = (
    plt.rcParams["axes.prop_cycle"]
    .by_key()["color"]
)

We also map a display attribute for each of our note labels, Country and Product. We’ll use the short names for both when plotting graph visualizations or related charts.

LABEL_PROPS = {
    "Country": "country_name_short",
    "Product": "product_name_short",
}

Functions
#

We create a few reusable functions, where we run Kuzu queries. In a few cases, it was helpful to debug the query with parameters (e.g., using Kuzu Explorer), so we created a helper function for this (note that this doesn’t support string parameters, as we didn’t ned them).

def print_query(query: str, params: dict[str, Any]):
    dbg_query = dedent(query).strip()
    dbg_query = Template(dbg_query)
    dbg_query = dbg_query.substitute(params)
    print(dbg_query)

We’ll also cluster nodes using different strategies and compare groups, so we implement a basic Jaccard similarity function.

def jaccard_sim(a: pd.Series, b: pd.Series) -> float:
    a = set(a)
    b = set(b)
    return len(a & b) / len(a | b)

We might want to look at the top x% of traded products, based ona USD. The following function will help filter this.

def top_frac(df: pd.DataFrame, col: str, frac: float = 0.25):
    mask = (df[col] / df[col].sum()).cumsum() <= frac
    return df[mask]

Analysis
#

We focus on the CompetesWith projection, a relationship given by the Export Similarity Index (ESI). Our graph analysis includes:

Dynamic competition analysis.
1. Dominating and weaker economy identification, based on the CON score for each country.
2. Trade basket overlap analysis for top and bottom economies.
Competition network analysis.
1. Community analysis, including community mapping, top traded product identification, and trade alignment study (self-sufficiency, external competitiveness).
2. Weak component analysis, following a similar approach to the community analysis—weak components widen community reach.
3. Community and weak component comparison.
4. Economic pressure analysis.

Dynamic Competition Analysis
#

Top 10 Dominating Economies
#

These are highly spread economies, able to compete with several other countries, i.e., with a high number of common out-neighbors (CON).

dom_econ_df = conn.execute(
    """
    MATCH (c:Country)
    RETURN
        c,
        c.node_id AS node_id,
        c.country_name_short AS country
    ORDER BY c.con_score DESC
    LIMIT 10
    """
).get_as_df()[["node_id", "country"]]

dom_econ_df.index = pd.RangeIndex(
    start=1,
    stop=len(dom_econ_df) + 1,
    name="rank"
)

dom_econ_df

	node_id	country
rank
1	206	United States of America
2	55	Canada
3	34	United Arab Emirates
4	107	Netherlands
5	132	United Kingdom
6	175	Belgium
7	134	Italy
8	223	Spain
9	131	France
10	145	Thailand

Top 3 Exports
#

Looking at the top exports will help contextualize these economies. We only look at the top 3 products, to keep the visualization clean and readable.

dom_econ_g = conn.execute(
    """
    MATCH (c:Country)
    WITH c
    ORDER BY c.con_score DESC
    LIMIT 10

    MATCH (c)-[e:Exports]->(p:Product)
    MATCH (c2:Country)-[:Exports]->(p)

    WITH c, e, p, count(DISTINCT c2) AS exporters
    WHERE exporters > 1
    WITH c, e, p
    ORDER BY c.node_id, e.amount_usd DESC
    SKIP 0

    WITH c, collect({p: p, e: e}) AS export_list

    UNWIND list_slice(export_list, 0, 3) AS r
    RETURN c, r.e, r.p
    ORDER BY c.node_id, r.p.node_id
    """
).get_as_networkx()

vis.set_labels(dom_econ_g, LABEL_PROPS)
vis.plot(dom_econ_g, scale=1.25, seed=3)

Bottom 10 Weaker Economies
#

These are smaller or weaker economies, in the sense that they have a lower competition power. We also find the Undeclared special country node at rank 1, showing that only a small number of products are undeclared worldwide.

weak_econ_df = conn.execute(
    """
    MATCH (c:Country)
    RETURN
        c,
        c.node_id AS node_id,
        c.country_name_short AS country
    ORDER BY c.con_score ASC
    LIMIT 10
    """
).get_as_df()[["node_id", "country"]]

weak_econ_df.index = pd.RangeIndex(
    start=1,
    stop=len(weak_econ_df) + 1,
    name="rank"
)

weak_econ_df

	node_id	country
rank
1	193	Undeclared
2	72	Bouvet Island
3	170	Wallis and Futuna
4	106	Norfolk Island
5	167	Saint Pierre and Miquelon
6	216	Niue
7	66	South Georgia and South Sandwich Islds.
8	121	Northern Mariana Islands
9	161	Heard and McDonald Islands
10	100	Western Sahara

Top 3 Exports
#

If we look at the top 3 exports for each competing country in the bottom of the ranking according CON scores, as expected we find that these are more disconnected economies, mostly focusing on raw materials, or components and machinery.

weak_econ_g = conn.execute(
    """
    MATCH (c:Country)
    WITH c
    ORDER BY c.con_score ASC
    LIMIT 10

    MATCH (c)-[e:Exports]->(p:Product)
    MATCH (c2:Country)-[:Exports]->(p)

    WITH c, e, p, count(DISTINCT c2) AS exporters
    WHERE exporters > 1
    WITH c, e, p
    ORDER BY c.node_id, e.amount_usd DESC
    SKIP 0

    WITH c, collect({p: p, e: e}) AS export_list

    UNWIND list_slice(export_list, 0, 3) AS r
    RETURN c, r.e, r.p
    ORDER BY c.node_id, r.p.node_id
    """
).get_as_networkx()

vis.set_labels(weak_econ_g, LABEL_PROPS)
vis.plot(weak_econ_g, scale=1.25, seed=3)

Dominating vs Weaker Economies
#

Do dominating economies compete in the same markets as weaker economies?
- If so, maybe that’s why those weaker economies are being pushed to the bottom. ☑️
- If not, maybe the products exported by those weaker economies are not the most competitive.

Here, we find that, due to the small export diversity, weaker economies are being crushed by dominating economies. Their position of vulnerability comes mostly from geographical isolation and limited area, leading to a lower amount of competition opportunities, where any competitor becomes a risk to the economy.

Below, country node classes are visually translated to a colored node border and label text. We assign two classes, for the top and bottom 10 economies, with top economies in the center, and the products and bottom economies in the surrounding area. This forms a star layout, where each arm is a weaker economy or a small cluster of weaker economies.

We look at the top 3 most exported products in weaker economies, but relaxing the filter on number of exported products for the weaker economies and looking at more than 3 exported products will reproduce the displayed behavior, with dominating economies still competing for the same products. This doesn’t necessarily mean that both dominating and weaker economies produce the same products, as some of them can simply be re-exported.

dom_vs_weak_econ_g = conn.execute(
    """
    MATCH (wea)-[we:Exports]->(p:Product)
    MATCH (dom)-[de:Exports]->(p)
    WHERE dom.node_id IN $dominating_node_ids
        AND wea.node_id IN $weaker_node_ids

    WITH wea, we, p, count(DISTINCT dom) AS dom_competitors
    WHERE dom_competitors > 0

    WITH wea, we, p
    ORDER BY wea.node_id, we.amount_usd DESC
    SKIP 0

    WITH wea, collect({p: p, e: we}) AS export_list
    UNWIND list_slice(export_list, 0, 3) AS r

    WITH wea, r.p.node_id AS prod_node_id
    MATCH (wea)-[we:Exports]
        ->(prod:Product { node_id: prod_node_id })
    MATCH (dom:Country)-[de:Exports]->(prod)
    WHERE dom.node_id IN $dominating_node_ids
    RETURN wea, we, prod, de, dom
    ORDER BY wea.node_id, prod.node_id, dom.node_id
    """,
    dict(
        dominating_node_ids=dom_econ_df.node_id.to_list(),
        weaker_node_ids=weak_econ_df.node_id.to_list(),
    ),
).get_as_networkx()

node_classes = dict(
    dominating=dom_econ_df.node_id.to_list(),
    weaker=weak_econ_df.node_id.to_list(),
)

# This adjusts the visualization edge weights
# to improve readability
for u, v, data in dom_vs_weak_econ_g.edges(data=True):
    if (
        dom_vs_weak_econ_g.nodes[u]["node_id"]
          in node_classes["dominating"]
        and dom_vs_weak_econ_g.nodes[v]["_label"]
          == "Product"
    ):
        data["vis_weight"] = 1e-5

    if (
        dom_vs_weak_econ_g.nodes[u]["node_id"]
          in node_classes["weaker"]
        and dom_vs_weak_econ_g.nodes[v]["_label"]
          == "Product"
    ):
        data["vis_weight"] = 1e-3

vis.set_labels(dom_vs_weak_econ_g, LABEL_PROPS)

vis.plot(
    dom_vs_weak_econ_g,
    node_classes=node_classes,
    scale=1.25,
    seed=5,
)

Competition Network
#

Let’s look at the competition network projection for Country nodes and CompetesWith edges. We first install the algo extension for Kuzu and create the compnet projection and NetworkX graph for it.

try:
    conn.execute(
        """
        INSTALL algo;
        LOAD algo;
        """
    )
except Exception as e:
    print(e)

try:
    conn.execute(
        """
        CALL drop_projected_graph("compnet")
        """
    )
except Exception as e:
    print(e)

conn.execute(
    """
    CALL project_graph(
        "compnet",
        {"Country": "n.country_name_short <> 'Undeclared'"},
        {"CompetesWith": "true"}
    )
    """
)

compnet_g = conn.execute(
    """
    MATCH (a:Country)-[cw:CompetesWith]->(b:Country)
    WHERE a.country_name_short <> "Undeclared"
        AND b.country_name_short <> "Undeclared"
    RETURN a, cw, b
    """,
).get_as_networkx()

Inspection Functions
#

The following functions will be useful to plot the cluster and analyze the top exports for a specific cluster ID property:

def plot_cluster(
    prop_name: str,
    prop_value: int,
    kind: Literal["graph", "map"] = "graph",
):
    match kind:
        case "graph":
            compnet_cluster_g = conn.execute(
                f"""
                MATCH (a:Country)-[cw:CompetesWith]->
                    (b:Country)
                WHERE a.country_name_short <> "Undeclared"
                    AND b.country_name_short <> "Undeclared"
                    AND a.`{prop_name}` = $prop_value
                    AND b.`{prop_name}` = $prop_value
                RETURN a, cw, b
                """,
                dict(prop_value=prop_value),
            ).get_as_networkx()

            vis.set_labels(compnet_cluster_g, LABEL_PROPS)
            vis.plot(compnet_cluster_g)

        case "map":
            compnet_cluster_df = conn.execute(
                f"""
                MATCH (c:Country)
                WHERE c.country_name_short <> "Undeclared"
                    AND c.`{prop_name}` = $prop_value
                RETURN
                    c.country_iso3_code AS iso3_code,
                    c.`{prop_name}` AS `{prop_name}`
                """,
                dict(prop_value=prop_value),
            ).get_as_df()

            vis.plot_map(
              compnet_cluster_df,
              code_col="iso3_code",
              class_col=prop_name,
            )

def trade_per_cluster(
    prop_name: str,
    prop_value: int,
    method: Literal["imports", "exports"],
    n: Optional[int] = None,
    debug: bool = False,
) -> pd.DataFrame:
    match method:
        case "exports":
            match_stmt = """
                MATCH (c:Country)-[ie:Exports]->(p:Product)
            """
        case "imports":
            match_stmt = """
                MATCH (c:Country)<-[ie:Imports]-(p:Product)
            """

    if n is None:
        limit_stmt = ""
        limit_param = dict()
    else:
        limit_stmt = "LIMIT $n"
        limit_param = dict(n=n)

    query = f"""
        {match_stmt}
        WHERE c.country_name_short <> "Undeclared"
            AND c.`{prop_name}` = $prop_value
        RETURN
            p.product_name_short AS product,
            sum(ie.amount_usd) AS total_amount_usd
        ORDER BY total_amount_usd DESC
        {limit_stmt}
    """

    params = dict(prop_value=prop_value) | limit_param

    if debug:
        print_query(query, params)

    products_df = conn.execute(query, params).get_as_df()

    return products_df

Partner clusters are clusters that import what a cluster is exporting. These are likely to match all clusters due to high connectivity in the world economy, but it might not always be the case, depending on the clustering criteria.

def partner_clusters(
    prop_name: str,
    prop_value: int,
    include_self: bool = True,
    debug: bool = False,
) -> list[int]:
    include_self_stmt = (
        "" if include_self
        else f"AND c2.`{prop_name}` <> $prop_value"
    )

    query = f"""
        MATCH (c:Country)-[:Exports]-(p:Product)
        MATCH (c2:Country)<-[:Imports]-(p)
        WHERE c.country_name_short <> "Undeclared"
            AND c.`{prop_name}` = $prop_value
            AND c2.`{prop_name}` IS NOT NULL
            {include_self_stmt}
        RETURN DISTINCT c2.`{prop_name}` AS cid
    """

    params = dict(prop_value=prop_value)

    if debug:
        print_query(query, params)

    result = conn.execute(query, params)
    partner_cluster_ids = sorted(
      c[0] for c in result.get_all()
    )

    return partner_cluster_ids

The following functions will help us compute the intra-cluster and inter-cluster trade alignments, i.e., self-sufficiency and external competitiveness, based on cluster-aggregated market share.

def trade_alignment_by_cluster(
    prop_name: str,
    prop_value: int,
    method: Literal["intra", "inter"],
) -> pd.DataFrame:
    exports_df = trade_per_cluster(
        prop_name,
        prop_value,
        method="exports",
    )

    match method:
        case "intra":
            imports_df = trade_per_cluster(
                prop_name,
                prop_value,
                method="imports",
            )

        case "inter":
            imports_df = []

            for partner_cid in partner_clusters(
                prop_name,
                prop_value,
            ):
                partner_imports_df = trade_per_cluster(
                    prop_name,
                    partner_cid,
                    method="imports",
                )
                imports_df.append(partner_imports_df)

            imports_df = (
                pd.concat(imports_df)
                .groupby(["product"])
                .sum()
            )
        case _:
            raise ValueError(
              f"method not supported: {method}"
            )

    trade_df = exports_df.merge(
        imports_df,
        on="product",
        how="right" if method == "intra" else "left",
        suffixes=("_exports", "_imports"),
    ).fillna(0)

    trade_df["sdr"] = (
        trade_df.total_amount_usd_exports
        / trade_df.total_amount_usd_imports
    )

    trade_df = trade_df.sort_values("sdr", ascending=False)

    return trade_df

As a score for measuring either self-sufficiency or external competitiveness, we use weighted average of the Supply-Demand Ration (SDR), where weights are the total export amount (USD) for a given cluster.

def global_sdr_score(
    trade_df: pd.DataFrame,
    eps=1e-9,
) -> float:
    df = trade_df[~np.isinf(trade_df.sdr)]

    df["log_sdr"] = np.log(np.clip(df.sdr, eps, None))

    weights = df.total_amount_usd_exports
    score = expit(
      (weights * df.log_sdr).sum() / weights.sum()
    )

    return score.item()

Competing Communities
#

Are there any communities representing closely tied competitor clusters?
- If so, maybe there are specific products per cluster? ☑️
- If not, we have a global economy that is fairly homogenous and diverse.

For each property computed with the algo extension, we’ll alter the corresponding node table, recreating the property each time.

conn.execute(
    """
    ALTER TABLE Country DROP IF EXISTS louvain_id;
    ALTER TABLE Country ADD IF NOT EXISTS louvain_id INT64;

    CALL louvain("compnet")
    WITH node, louvain_id
    SET node.louvain_id = louvain_id;
    """
)

The Louvain method partitions the network by optimizing modularity, which essentially means it will find the best partition of communities within the graph, a community being a dense subgraph, i.e., a subgraph where connections among members are more frequent than to outside nodes.

compnet_louvain_df = conn.execute(
    """
    MATCH (c:Country)
    WHERE c.country_name_short <> "Undeclared"
    RETURN
        c.node_id AS node_id,
        c.country_name_short AS label,
        c.louvain_id AS louvain_id
    """
).get_as_df()

node_classes = {
    k: g.node_id.to_list()
    for k, g in compnet_louvain_df.groupby("louvain_id")
}

vis.set_labels(compnet_g, LABEL_PROPS)

vis.plot(
    compnet_g,
    node_classes=node_classes,
    hide_edges=True,
)

In complex networks, it is not uncommon for a huge community to emerge, along with a low number of moderately large communities, and then a lot of smaller communities. This behavior is not particularly exacerbated here, but it’s still visible. Below, we inspect the community size distribution.

comm_sizes_df = (
    compnet_louvain_df[["louvain_id", "node_id"]]
    .groupby("louvain_id")
    .count()
    .rename(columns=dict(node_id="num_nodes"))
)

comm_sizes_df = comm_sizes_df.reindex(
    comm_sizes_df.num_nodes.sort_values(ascending=False).index
)

comm_sizes_df

	num_nodes
louvain_id
5	67
6	33
4	30
8	26
7	19
1	14
2	14
0	13
3	8
9	7
10	3

fig, ax = plt.subplots(figsize=(18, 3))
comm_sizes_df.plot.bar(xlabel="Community ID", rot=0, ax=ax)
plt.legend(["No. Nodes"])
plt.show()

Let’s also take a look at the members of each community, from largest to smallest.

for louvain_id in comm_sizes_df.index:
    display(f"LOUVAIN ID: {louvain_id}")
    display(
        compnet_louvain_df[
            compnet_louvain_df.louvain_id == louvain_id
        ]
        .drop(columns="louvain_id")
        .sort_values("label")
    )

'LOUVAIN ID: 5'

	node_id	label
115	116	Albania
205	207	Andorra
114	115	Anguilla
173	174	Austria
126	127	Belarus
...	...	...
49	50	Tunisia
232	234	Turkiye
123	124	Turks and Caicos Islands
131	132	United Kingdom
204	206	United States of America

67 rows × 2 columns

'LOUVAIN ID: 6'

	node_id	label
209	211	Algeria
171	172	Angola
51	52	Aruba
11	12	Azerbaijan
177	178	Cameroon
54	55	Canada
154	155	Chad
55	56	Colombia
15	16	Democratic Republic of the Congo
220	222	Ecuador
57	58	Egypt
158	159	Equatorial Guinea
100	101	Fiji
75	76	Gabon
223	225	Greenland
211	213	Guyana
17	18	Iran
89	90	Iraq
184	185	Kazakhstan
152	153	Kuwait
135	136	Libya
138	139	Nigeria
139	140	Norway
26	27	Oman
219	221	Republic of the Congo
63	64	Russia
229	231	Sao Tome and Principe
64	65	Saudi Arabia
28	29	South Sudan
7	8	Timor-Leste
31	32	Trinidad and Tobago
112	113	Venezuela
67	68	Yemen

'LOUVAIN ID: 4'

	node_id	label
170	171	Afghanistan
206	208	Australia
12	13	Benin
24	25	Bhutan
207	209	Bolivia
52	53	Burundi
116	117	Central African Republic
101	102	Guinea
163	164	Kyrgyzstan
185	186	Liberia
77	78	Mali
137	138	Mauritania
4	5	Mozambique
165	166	Niger
5	6	Papua New Guinea
78	79	Rwanda
141	142	Senegal
202	204	Sierra Leone
142	143	Solomon Islands
109	110	Somalia
187	188	Sudan
230	232	Suriname
29	30	Syria
124	125	Tajikistan
111	112	Tanzania
110	111	Togo
167	168	Turkmenistan
8	9	US Minor Outlying Islands
99	100	Western Sahara
216	218	Zambia

'LOUVAIN ID: 8'

	node_id	label
96	97	Antarctica
125	126	Bahrain
176	177	Botswana
97	98	China
14	15	Cocos (Keeling) Islands
210	212	Guam
160	161	Heard and McDonald Islands
87	88	Hong Kong
162	163	Israel
134	135	Japan
102	103	Lesotho
195	197	Macao
46	47	Malaysia
62	63	Malta
164	165	Namibia
120	121	Northern Mariana Islands
215	217	Philippines
47	48	Pitcairn
233	235	Samoa
48	49	Singapore
65	66	South Georgia and South Sandwich Islds.
76	77	South Korea
113	114	Taiwan
79	80	Vatican City
95	96	Vietnam
169	170	Wallis and Futuna

'LOUVAIN ID: 7'

	node_id	label
10	11	Argentina
127	128	Belize
217	219	Brazil
34	35	Burkina Faso
156	157	Côte d'Ivoire
94	95	Eswatini
129	130	Ethiopia
38	39	Ghana
86	87	Guatemala
117	118	Honduras
151	152	Kenya
91	92	Malawi
108	109	New Zealand
197	199	Nicaragua
93	94	Paraguay
50	51	Uganda
168	169	Uruguay
146	147	Uzbekistan
68	69	Zimbabwe

'LOUVAIN ID: 1'

	node_id	label
69	70	Bangladesh
16	17	Cabo Verde
143	144	El Salvador
1	2	Falkland Islands
39	40	Haiti
59	60	Kiribati
43	44	Maldives
19	20	Mauritius
74	75	Micronesia
107	108	Nauru
122	123	Seychelles
136	137	Sri Lanka
66	67	Tuvalu
191	192	Vanuatu

'LOUVAIN ID: 2'

	node_id	label
172	173	Armenia
218	220	Chile
181	182	Eritrea
2	3	Georgia
150	151	Jordan
41	42	Lebanon
42	43	Moldova
25	26	Mongolia
225	227	North Macedonia
140	141	Panama
92	93	Peru
32	33	South Africa
190	191	Ukraine
33	34	United Arab Emirates

'LOUVAIN ID: 0'

	node_id	label
0	1	American Samoa
80	81	Antigua and Barbuda
13	14	Barbados
82	83	Curaçao
179	180	Cyprus
159	160	Greece
85	86	Grenada
58	59	Jamaica
119	120	Marshall Islands
214	216	Niue
60	61	Saint Lucia
9	10	Saint Vincent and the Grenadines
35	36	The Bahamas

'LOUVAIN ID: 3'

	node_id	label
212	214	Cambodia
56	57	Comoros
3	4	Laos
90	91	Madagascar
226	228	Montenegro
44	45	Myanmar
227	229	Pakistan
186	187	Palau

'LOUVAIN ID: 9'

	node_id	label
132	133	British Indian Ocean Territory
36	37	Cook Islands
182	183	Faroe Islands
155	156	French Southern and Antarctic Lands
222	224	Guinea-Bissau
161	162	Iceland
201	203	Saint Helena, Ascension and Tristan da Cunha

'LOUVAIN ID: 10'

	node_id	label
192	194	Costa Rica
83	84	Dominica
157	158	Dominican Republic

largest_louvain_id = comm_sizes_df.index[0].item()
largest_louvain_id

smallest_louvain_id = comm_sizes_df.index[-1].item()
smallest_louvain_id

Community Subgraphs
#

Community subgraphs illustrates clusters where competition is more prevalent among its members than countries outside of the community. For this graph (our Econ CompNet, or compnet), they are almost always (if not always) complete subgraphs. We can plot any cluster by its ID.

plot_cluster("louvain_id", largest_louvain_id)

Community Mapping
#

Network visualization is not always the best approach to understand your data. This is a good example of this. Since we’re working with a complete (or nearly complete) subgraph, looking at relationships is less helpful, but looking at a map for a community is a lot more helpful, as we can see below.

plot_cluster("louvain_id", largest_louvain_id, kind="map")

Top Exported Products
#

Is there any export overlap between large and small communities?

largest_comm_top_exported = top_frac(
    trade_per_cluster(
        "louvain_id",
        largest_louvain_id,
        method="exports"
    ),
    "total_amount_usd",
)
largest_comm_top_exported

	product	total_amount_usd
0	Commodities not specified, according to kind	1.506978e+12
1	Oils petroleum, bituminous, distillates	1.382886e+12
2	Medicaments, doses, nes	1.110865e+12
3	Blood	7.572479e+11
4	Petroleum oils, crude	5.977950e+11
5	Automobiles nes, gas turbine powered	5.668137e+11
6	Gold in unwrought forms	5.482768e+11
7	Automobiles, spark ignition, 1500-3000cc	5.231797e+11
8	Transmit-receive apparatus for radio, TV	4.899130e+11
9	Monolithic integrated circuits, digital	4.366876e+11
10	Trade data discrepancies	3.521978e+11
11	Parts of data processing equipment	3.358087e+11
12	Automobiles, spark ignition, 1000-1500cc	2.657018e+11
13	Fixed wing aircraft, >15,000kg	2.641497e+11
14	Motor vehicle parts nes	2.565987e+11
15	Vaccines, human	2.412159e+11
16	Natural gas, liquefied	2.406005e+11
17	Gold, semi-manufactured forms	2.394518e+11

smallest_comm_top_exported = top_frac(
    trade_per_cluster(
        "louvain_id",
        smallest_louvain_id,
        method="exports",
    ),
    "total_amount_usd",
)
smallest_comm_top_exported

	product	total_amount_usd
0	Instruments for medical science, nes	1.032487e+10
1	Medical needles, catheters	8.305035e+09
2	Trade data discrepancies	7.981437e+09

jaccard_sim(
    largest_comm_top_exported["product"],
    smallest_comm_top_exported["product"]
)

0.05

Top Imported Products
#

Is there any import overlap between large and small communities?

largest_comm_top_imported = top_frac(
    trade_per_cluster(
        "louvain_id",
        largest_louvain_id,
        method="imports",
    ),
    "total_amount_usd",
)
largest_comm_top_imported

	product	total_amount_usd
0	Petroleum oils, crude	1.933629e+12
1	Commodities not specified, according to kind	1.411868e+12
2	Oils petroleum, bituminous, distillates	1.197450e+12
3	Transmit-receive apparatus for radio, TV	9.829336e+11
4	Medicaments, doses, nes	8.788548e+11
5	Trade data discrepancies	7.574121e+11
6	Gold in unwrought forms	7.455694e+11
7	Blood	6.453376e+11
8	Automobiles nes, gas turbine powered	5.992672e+11
9	Natural gas, as gas	5.284708e+11
10	Parts of data processing equipment	5.119915e+11
11	Automobiles, spark ignition, 1500-3000cc	4.830173e+11
12	Monolithic integrated circuits, digital	4.750667e+11

smallest_comm_top_imported = top_frac(
    trade_per_cluster(
        "louvain_id",
        smallest_louvain_id,
        method="imports",
    ),
    "total_amount_usd",
)
smallest_comm_top_imported

	product	total_amount_usd
0	Oils petroleum, bituminous, distillates	1.373957e+10
1	Commodities not specified, according to kind	7.539036e+09
2	Transmit-receive apparatus for radio, TV	3.136932e+09
3	Automobiles, spark ignition, 1500-3000cc	2.425662e+09
4	Jewellery of precious metal	2.326848e+09
5	Instruments for medical science, nes	2.297871e+09
6	Monolithic integrated circuits, digital	2.243517e+09
7	Maize except seed corn	2.138190e+09
8	Natural gas, liquefied	2.103650e+09
9	Petroleum oils, crude	2.089410e+09
10	Propane, liquefied	2.061530e+09

jaccard_sim(
    largest_comm_top_imported["product"],
    smallest_comm_top_imported["product"],
)

0.3333333333333333

Trade Alignment
#

Trade alignment can be used to determine a cluster’s self-sufficiency by looking at internal country-country trade, or it can be used to determine a cluster’s external competitiveness by looking at inter-cluster country-country trade. We determine both dimensions of trade alignment (intra and inter cluster) based on the supply/demand ratio, more specifically the weighted average of log-SDR, with weights being total amounts (USD) of exports/imports, globally per cluster.

This score is scaled to a 0..1 range using a sigmoid transformation, so anything above 0.5 should be good. The log-transformation ensures the distribution is not skewed.

Self-Sufficiency
#

Most communities are self-sufficient or nearly self-sufficient, with only community 5 showing a little more vulnerability.

comm_self_sufficiency_df = pd.DataFrame(
    dict(
        louvain_id=louvain_id,
        score=global_sdr_score(
            trade_alignment_by_cluster(
                "louvain_id",
                louvain_id,
                method="intra",
            )
        ),
    )
    for louvain_id in comm_sizes_df.index
).sort_values("score", ascending=False)

comm_self_sufficiency_df

	louvain_id	score
9	9	0.985065
2	4	0.970089
5	1	0.959150
8	3	0.939180
10	10	0.895896
1	6	0.869495
4	7	0.860508
6	2	0.742791
3	8	0.644520
7	0	0.564580
0	5	0.493976

colors = comm_self_sufficiency_df.score.apply(
    lambda s: MPL_PALETTE[0] if s >= 0.5 else MPL_PALETTE[1]
)

fig, ax = plt.subplots(figsize=(18, 3))

comm_self_sufficiency_df.plot.bar(
    x="louvain_id",
    y="score",
    xlabel="Community ID",
    color=colors,
    rot=0,
    ax=ax,
)

plt.axhline(
    y=0.5,
    color=MPL_PALETTE[1],
    linestyle="--",
    linewidth=2,
)
plt.legend([
    "Self-Sufficiency Threshold",
    "Global Log-SDR Score"
])
plt.show()

compnet_louvain_df[compnet_louvain_df.louvain_id == 5]

	node_id	label	louvain_id
6	7	Qatar	5
18	19	Lithuania	5
20	21	Portugal	5
21	22	Palestine	5
22	23	British Virgin Islands	5
...	...	...	...
221	223	Spain	5
224	226	India	5
228	230	Romania	5
231	233	Slovenia	5
232	234	Turkiye	5

67 rows × 3 columns

External Competitiveness
#

Most communities are not particularly competitive externally, but this was to be expected due to the criteria used to cluster—community dense subgraphs also point to higher internal competition.

comm_external_comp_df = pd.DataFrame(
    dict(
        louvain_id=louvain_id,
        score=global_sdr_score(
            trade_alignment_by_cluster(
                "louvain_id",
                louvain_id,
                method="inter",
            )
        ),
    )
    for louvain_id in comm_sizes_df.index
).sort_values("score", ascending=False)

comm_external_comp_df

	louvain_id	score
0	5	0.360300
3	8	0.304639
1	6	0.200342
2	4	0.121624
4	7	0.089482
6	2	0.072207
5	1	0.055127
8	3	0.033868
9	9	0.018567
10	10	0.012569
7	0	0.010781

colors = comm_external_comp_df.score.apply(
    lambda s: MPL_PALETTE[0] if s >= 0.5 else MPL_PALETTE[1]
)

fig, ax = plt.subplots(figsize=(18, 3))

comm_external_comp_df.plot.bar(
    x="louvain_id",
    y="score",
    xlabel="Community ID",
    color=colors,
    rot=0,
    ax=ax,
)

plt.axhline(
    y=0.5,
    color=MPL_PALETTE[1],
    linestyle="--",
    linewidth=2,
)
plt.legend([
    "External Competitiveness Threshold",
    "Global SDR Score"
])
plt.show()

compnet_louvain_df[compnet_louvain_df.louvain_id == 8]

	node_id	label	louvain_id
14	15	Cocos (Keeling) Islands	8
46	47	Malaysia	8
47	48	Pitcairn	8
48	49	Singapore	8
62	63	Malta	8
65	66	South Georgia and South Sandwich Islds.	8
76	77	South Korea	8
79	80	Vatican City	8
87	88	Hong Kong	8
95	96	Vietnam	8
96	97	Antarctica	8
97	98	China	8
102	103	Lesotho	8
113	114	Taiwan	8
120	121	Northern Mariana Islands	8
125	126	Bahrain	8
134	135	Japan	8
160	161	Heard and McDonald Islands	8
162	163	Israel	8
164	165	Namibia	8
169	170	Wallis and Futuna	8
176	177	Botswana	8
195	197	Macao	8
210	212	Guam	8
215	217	Philippines	8
233	235	Samoa	8

Weakly Connected Competitors
#

Strongly connected components in our graph would have capture mutual competition among peers, cyclical or balanced rivalries, or equivalent strategic positions. However, once we removed the “Undeclared” pseudo-country, we weren’t able to find any strongly connected components that were not singletons.

As such, we compute the weakly connected components, instead capturing the individual or isolated components of countries competing among themselves, regardless of export amount (which establishes direction, in our graph).

conn.execute(
    """
    ALTER TABLE Country DROP IF EXISTS wcc_id;
    ALTER TABLE Country ADD IF NOT EXISTS wcc_id INT64;

    CALL weakly_connected_components("compnet")
    WITH node, group_id
    SET node.wcc_id = group_id;
    """
)

compnet_wcc_df = conn.execute(
    """
    MATCH (c:Country)
    WHERE c.country_name_short <> "Undeclared"
    RETURN
        c.node_id AS node_id,
        c.country_name_short AS label,
        c.wcc_id AS wcc_id
    """
).get_as_df()

node_classes = {
  k: g.node_id.to_list()
  for k, g in compnet_wcc_df.groupby("wcc_id")
}

vis.set_labels(compnet_g, LABEL_PROPS)

vis.plot(
    compnet_g,
    node_classes=node_classes,
    hide_edges=True,
)

As we can see, there a multiple weakly connected competitors, but most of them are single nodes in their own SCC. Other than that, there is a large component of 64 countries, and then two other smaller components with over 20 nodes each, that we’ll inspect below.

wcc_sizes_df = (
    compnet_wcc_df[["wcc_id", "node_id"]]
    .groupby("wcc_id")
    .count()
    .rename(columns=dict(node_id="num_nodes"))
)

wcc_sizes_df = wcc_sizes_df.reindex(
    wcc_sizes_df.num_nodes.sort_values(ascending=False).index
)

wcc_sizes_df

	num_nodes
wcc_id
0	64
1	28
4	24
5	11
2	9
...	...
209	1
215	1
226	1
228	1
230	1

68 rows × 1 columns

wcc_sizes_ord_df = wcc_sizes_df.reset_index(drop=True)

wcc_singleton_threshold = (
    wcc_sizes_ord_df[wcc_sizes_ord_df.num_nodes <= 1]
    .index[0]
    .item()
)

fig, ax = plt.subplots(figsize=(30, 5))

wcc_sizes_df.plot.bar(rot=0, ax=ax)

plt.axvline(
    x=wcc_singleton_threshold,
    color=MPL_PALETTE[1],
    linestyle="--",
    linewidth=2,
)

plt.legend(["Singleton Threshold", "No. Nodes"])

plt.show()

Let’s take a look at the members of each weak component, from largest to smallest.

for wcc_id in wcc_sizes_df[wcc_sizes_df.num_nodes > 1].index:
    display(f"WCC ID: {wcc_id}")
    display(
        compnet_wcc_df[compnet_wcc_df.wcc_id == wcc_id]
        .drop(columns="wcc_id")
    )

'WCC ID: 0'

	node_id	label
0	1	American Samoa
6	7	Qatar
11	12	Azerbaijan
14	15	Cocos (Keeling) Islands
18	19	Lithuania
...	...	...
211	213	Guyana
215	217	Philippines
221	223	Spain
224	226	India
231	233	Slovenia

64 rows × 2 columns

'WCC ID: 1'

	node_id	label
1	2	Falkland Islands
16	17	Cabo Verde
19	20	Mauritius
26	27	Oman
39	40	Haiti
43	44	Maldives
47	48	Pitcairn
55	56	Colombia
59	60	Kiribati
66	67	Tuvalu
67	68	Yemen
69	70	Bangladesh
74	75	Micronesia
79	80	Vatican City
102	103	Lesotho
122	123	Seychelles
125	126	Bahrain
132	133	British Indian Ocean Territory
155	156	French Southern and Antarctic Lands
161	162	Iceland
162	163	Israel
176	177	Botswana
177	178	Cameroon
182	183	Faroe Islands
191	192	Vanuatu
201	203	Saint Helena, Ascension and Tristan da Cunha
220	222	Ecuador
223	225	Greenland

'WCC ID: 4'

	node_id	label
4	5	Mozambique
12	13	Benin
38	39	Ghana
50	51	Uganda
52	53	Burundi
73	74	Finland
75	76	Gabon
77	78	Mali
78	79	Rwanda
99	100	Western Sahara
101	102	Guinea
111	112	Tanzania
116	117	Central African Republic
121	122	Sweden
128	129	Switzerland
131	132	United Kingdom
163	164	Kyrgyzstan
165	166	Niger
170	171	Afghanistan
184	185	Kazakhstan
193	195	Ireland
205	207	Andorra
219	221	Republic of the Congo
230	232	Suriname

'WCC ID: 5'

	node_id	label
5	6	Papua New Guinea
8	9	US Minor Outlying Islands
58	59	Jamaica
60	61	Saint Lucia
83	84	Dominica
85	86	Grenada
120	121	Northern Mariana Islands
145	146	Tonga
157	158	Dominican Republic
192	194	Costa Rica
210	212	Guam

'WCC ID: 2'

	node_id	label
2	3	Georgia
3	4	Laos
27	28	North Korea
33	34	United Arab Emirates
44	45	Myanmar
172	173	Armenia
174	175	Belgium
212	214	Cambodia
226	228	Montenegro

'WCC ID: 10'

	node_id	label
10	11	Argentina
86	87	Guatemala
93	94	Paraguay
117	118	Honduras
168	169	Uruguay
197	199	Nicaragua
217	219	Brazil

'WCC ID: 9'

	node_id	label
9	10	Saint Vincent and the Grenadines
56	57	Comoros
90	91	Madagascar
119	120	Marshall Islands
186	187	Palau

'WCC ID: 15'

	node_id	label
15	16	Democratic Republic of the Congo
32	33	South Africa
92	93	Peru
181	182	Eritrea
218	220	Chile

'WCC ID: 42'

	node_id	label
42	43	Moldova
190	191	Ukraine
203	205	Serbia
228	230	Romania

'WCC ID: 22'

	node_id	label
22	23	British Virgin Islands
36	37	Cook Islands
81	82	Bermuda
178	179	Cayman Islands

'WCC ID: 17'

	node_id	label
17	18	Iran
31	32	Trinidad and Tobago
158	159	Equatorial Guinea

'WCC ID: 137'

	node_id	label
137	138	Mauritania
185	186	Liberia
206	208	Australia

'WCC ID: 40'

	node_id	label
40	41	Indonesia
232	234	Turkiye

'WCC ID: 24'

	node_id	label
24	25	Bhutan
216	218	Zambia

'WCC ID: 94'

	node_id	label
94	95	Eswatini
127	128	Belize

'WCC ID: 49'

	node_id	label
49	50	Tunisia
61	62	Morocco

'WCC ID: 110'

	node_id	label
110	111	Togo
141	142	Senegal

'WCC ID: 109'

	node_id	label
109	110	Somalia
187	188	Sudan

'WCC ID: 136'

	node_id	label
136	137	Sri Lanka
143	144	El Salvador

'WCC ID: 21'

	node_id	label
21	22	Palestine
199	201	Poland

'WCC ID: 167'

	node_id	label
167	168	Turkmenistan
207	209	Bolivia

'WCC ID: 160'

	node_id	label
160	161	Heard and McDonald Islands
166	167	Saint Pierre and Miquelon

'WCC ID: 223'

	node_id	label
222	224	Guinea-Bissau
233	235	Samoa

largest_wcc_id = wcc_sizes_df.index[0].item()
largest_wcc_id

smallest_wcc_id = wcc_sizes_df.index[-1].item()
smallest_wcc_id

Component Subgraphs
#

plot_cluster("wcc_id", largest_wcc_id)

Component Mapping
#

plot_cluster("wcc_id", largest_wcc_id, kind="map")

Top Exported Products
#

Is there any export overlap between large and small components?

largest_wcc_top_exported = top_frac(
    trade_per_cluster("wcc_id", largest_wcc_id, "exports"),
    "total_amount_usd",
)
largest_wcc_top_exported

	product	total_amount_usd
0	Monolithic integrated circuits, digital	2.880361e+12
1	Petroleum oils, crude	2.815121e+12
2	Oils petroleum, bituminous, distillates	2.185186e+12
3	Commodities not specified, according to kind	1.859842e+12
4	Transmit-receive apparatus for radio, TV	1.629116e+12
5	Trade data discrepancies	9.872075e+11
6	Parts of data processing equipment	7.646240e+11
7	Medicaments, doses, nes	7.542306e+11

smallest_wcc_top_exported = top_frac(
    trade_per_cluster("wcc_id", smallest_wcc_id, "exports"),
    "total_amount_usd",
)
smallest_wcc_top_exported

	product	total_amount_usd
0	Petroleum oils, crude	24676511.0

jaccard_sim(
    largest_wcc_top_exported["product"],
    smallest_wcc_top_exported["product"]
)

0.125

Top Imported Products
#

Is there any import overlap between large and small components?

largest_wcc_top_imported = top_frac(
    trade_per_cluster("wcc_id", largest_wcc_id, "imports"),
    "total_amount_usd",
)
largest_wcc_top_imported

	product	total_amount_usd
0	Petroleum oils, crude	3.075323e+12
1	Monolithic integrated circuits, digital	2.780868e+12
2	Commodities not specified, according to kind	1.510984e+12
3	Oils petroleum, bituminous, distillates	1.420618e+12
4	Transmit-receive apparatus for radio, TV	1.332514e+12
5	Trade data discrepancies	1.248542e+12
6	Medicaments, doses, nes	8.050835e+11
7	Parts of data processing equipment	6.992992e+11
8	Automobiles, spark ignition, 1500-3000cc	6.571986e+11

smallest_wcc_top_imported = top_frac(
    trade_per_cluster("wcc_id", smallest_wcc_id, "imports"),
    "total_amount_usd",
)
smallest_wcc_top_imported

	product	total_amount_usd
0	Oils petroleum, bituminous, distillates	88922018.0
1	Cargo vessels, not tanker or refrigerated	23292230.0
2	Commodities not specified, according to kind	21288342.0
3	Rice, semi- or wholly-milled	15654678.0

jaccard_sim(
    largest_comm_top_imported["product"],
    smallest_wcc_top_imported["product"]
)

0.13333333333333333

Trade Alignment
#

Again, trade alignment can be used to determine a cluster’s self-sufficiency by looking at internal country-country trade, or it can be used to determine a cluster’s external competitiveness by looking at inter-cluster country-country trade. We determine both dimensions of trade alignment (intra and inter cluster) based on the supply/demand ratio, more specifically the weighted average of log-SDR, with weights being total amounts (USD) of exports/imports, globally per cluster.

This score is scaled to a 0..1 range using a sigmoid transformation, so anything above 0.5 should be good. The log-transformation ensures the distribution is not skewed.

Self-Sufficiency
#

Most components are self-sufficient or nearly self-sufficient, with only three of them, components 209, 22 and 196, showing a little more vulnerability.

wcc_self_sufficiency_df = pd.DataFrame(
    dict(
        wcc_id=wcc_id,
        score=global_sdr_score(
            trade_alignment_by_cluster(
                "wcc_id",
                wcc_id,
                method="intra",
            )
        ),
    )
    for wcc_id in wcc_sizes_df.index
).sort_values("score", ascending=False)

wcc_self_sufficiency_df

	wcc_id	score
20	167	0.999962
57	197	0.999932
25	34	0.999926
23	25	0.999721
52	156	0.999568
...	...	...
2	4	0.586408
0	0	0.518923
63	209	0.464203
9	22	0.300144
56	196	0.280055

68 rows × 2 columns

colors = wcc_self_sufficiency_df.score.apply(
    lambda s: MPL_PALETTE[0] if s >= 0.5 else MPL_PALETTE[1]
)

fig, ax = plt.subplots(figsize=(30, 5))

wcc_self_sufficiency_df.plot.bar(
    x="wcc_id",
    y="score",
    xlabel="Weak Component ID",
    color=colors,
    rot=0,
    ax=ax,
)

plt.axhline(
    y=0.5,
    color=MPL_PALETTE[1],
    linestyle="--",
    linewidth=2,
)
plt.legend([
    "Self-Sufficiency Threshold",
    "Global SDR Score"
])
plt.show()

compnet_wcc_df[compnet_wcc_df.wcc_id == 0]

	node_id	label	wcc_id
0	1	American Samoa	0
6	7	Qatar	0
11	12	Azerbaijan	0
14	15	Cocos (Keeling) Islands	0
18	19	Lithuania	0
...	...	...	...
211	213	Guyana	0
215	217	Philippines	0
221	223	Spain	0
224	226	India	0
231	233	Slovenia	0

64 rows × 3 columns

External Competitiveness
#

Most components are not particularly competitive externally, even less so than communities, with the large majority having a SDR-based score lower than 0.1.

wcc_external_comp_df = pd.DataFrame(
    dict(
        wcc_id=wcc_id,
        score=global_sdr_score(
            trade_alignment_by_cluster(
                "wcc_id",
                wcc_id,
                method="inter",
            )
        ),
    )
    for wcc_id in wcc_sizes_df.index
).sort_values("score", ascending=False)

wcc_external_comp_df

	wcc_id	score
0	0	0.424935
11	137	0.135049
2	4	0.112296
7	15	0.086413
5	10	0.084617
...	...	...
59	195	0.000212
21	160	0.000073
67	230	0.000053
43	114	0.000023
26	65	0.000009

68 rows × 2 columns

colors = wcc_external_comp_df.score.apply(
    lambda s: MPL_PALETTE[0] if s >= 0.5 else MPL_PALETTE[1]
)

fig, ax = plt.subplots(figsize=(30, 5))

wcc_external_comp_df.plot.bar(
    x="wcc_id",
    y="score",
    xlabel="Weak Component ID",
    color=colors,
    rot=0,
    ax=ax,
)

plt.axhline(
    y=0.5,
    color=MPL_PALETTE[1],
    linestyle="--",
    linewidth=2,
)
plt.legend([
    "External Competitiveness Threshold",
    "Global SDR Score"
])
plt.show()

compnet_louvain_df[compnet_louvain_df.louvain_id == 0]

	node_id	label
0	1	American Samoa
9	10	Saint Vincent and the Grenadines
13	14	Barbados
35	36	The Bahamas
58	59	Jamaica
60	61	Saint Lucia
80	81	Antigua and Barbuda
82	83	Curaçao
85	86	Grenada
119	120	Marshall Islands
159	160	Greece
179	180	Cyprus
214	216	Niue

Communities vs Components
#

By matching the clustering (communities and weak components) with the highest number of clusters, and therefore smaller clusters, to the clustering with the lowest number of clusters, we can run a pairwise cluster comparison:

Which countries belong to a community, but not the weak component?
Which countries belong to a weak component, but not the community?
Which countries belong to both?
Is there a particular semantic to these countries?

len(wcc_sizes_df), len(comm_sizes_df)

(68, 11)

NN-Clusters
#

We compute community to weak component similarities, selecting the nearest-neighbor community for each component. Given the higher number of components when compared to communities, we’ll necessarily have repeated nearest-neighbor communities.

cluster_sim_df = []

for wcc_id, wcc in compnet_wcc_df.groupby("wcc_id"):
    for louvain_id, comm in (
        compnet_louvain_df.groupby("louvain_id")
    ):
        cluster_sim_df.append(
            dict(
                wcc_id=wcc_id,
                louvain_id=louvain_id,
                sim=jaccard_sim(wcc.label, comm.label),
            )
        )

cluster_sim_df = pd.DataFrame(cluster_sim_df)
cluster_sim_df = cluster_sim_df.loc[
    cluster_sim_df
    .groupby(["wcc_id"])
    .idxmax()
    .sim
]
cluster_sim_df

	wcc_id	louvain_id	sim
5	0	5	0.297030
12	1	1	0.354839
25	2	3	0.307692
37	4	4	0.317073
54	5	10	0.272727
...	...	...	...
693	215	0	0.076923
713	223	9	0.125000
717	226	2	0.071429
729	228	3	0.125000
743	230	6	0.030303

68 rows × 3 columns

For example, community 5 matches with 20 different weak components.

cluster_sim_df.louvain_id.value_counts()

louvain_id
5     20
7     10
4     10
2      7
6      6
8      4
1      3
3      3
0      2
9      2
10     1
Name: count, dtype: int64

cluster_sim_df[cluster_sim_df.louvain_id == 5]

	wcc_id	louvain_id	sim
5	0	5	0.297030
126	21	5	0.029851
192	40	5	0.029851
225	49	5	0.029851
269	72	5	0.014925
280	84	5	0.014925
313	98	5	0.014925
335	103	5	0.014925
401	114	5	0.014925
423	126	5	0.014925
489	144	5	0.014925
511	147	5	0.014925
544	153	5	0.014925
599	173	5	0.014925
610	195	5	0.014925
632	197	5	0.014925
643	199	5	0.014925
654	201	5	0.014925
676	209	5	0.014925
687	214	5	0.014925

Set Comparison
#

Let’s select a weakest component and retrieve its NN community to compare.

## comp_wcc_id = largest_wcc_id
comp_wcc_id = compnet_wcc_df.loc[
    compnet_wcc_df.label == "Australia",
    "wcc_id"
].item()

comp_comm_id = cluster_sim_df.loc[
    cluster_sim_df.wcc_id == comp_wcc_id,
    "louvain_id",
].item()

comp_wcc_id, comp_comm_id

(137, 4)

comp_wcc_countries = set(
    compnet_wcc_df.loc[
        compnet_wcc_df.wcc_id == comp_wcc_id,
        "label"
    ]
)

comp_louvain_countries = set(
    compnet_louvain_df.loc[
        compnet_louvain_df.louvain_id == comp_comm_id,
        "label"
    ]
)

WCC Exclusive
#

pd.Series(
    list(comp_wcc_countries - comp_louvain_countries),
    name="country",
).sort_values().to_frame()

	country

Community Exclusive
#

pd.Series(
    list(comp_louvain_countries - comp_wcc_countries),
    name="country",
).sort_values().to_frame()

	country
5	Afghanistan
26	Benin
20	Bhutan
14	Bolivia
24	Burundi
18	Central African Republic
4	Guinea
1	Kyrgyzstan
23	Mali
8	Mozambique
25	Niger
17	Papua New Guinea
7	Rwanda
21	Senegal
2	Sierra Leone
13	Solomon Islands
0	Somalia
15	Sudan
12	Suriname
10	Syria
19	Tajikistan
6	Tanzania
3	Togo
9	Turkmenistan
11	US Minor Outlying Islands
16	Western Sahara
22	Zambia

WCC and Community Overlap
#

pd.Series(
    list(comp_wcc_countries | comp_louvain_countries),
    name="country",
).sort_values().to_frame()

	country
17	Afghanistan
9	Australia
14	Benin
10	Bhutan
5	Bolivia
29	Burundi
26	Central African Republic
2	Guinea
1	Kyrgyzstan
27	Liberia
28	Mali
8	Mauritania
19	Mozambique
13	Niger
6	Papua New Guinea
18	Rwanda
11	Senegal
15	Sierra Leone
4	Solomon Islands
0	Somalia
24	Sudan
23	Suriname
21	Syria
7	Tajikistan
3	Tanzania
16	Togo
20	Turkmenistan
22	US Minor Outlying Islands
25	Western Sahara
12	Zambia

Economic Pressure (PageRank)
#

Economic pressure can easily be measured using PageRank, as it is a converging metric that aggregates the overall incoming competition strength, increasing its value as the contributing competing countries are themselves under economic pressure.

conn.execute(
    """
    ALTER TABLE Country DROP IF EXISTS pagerank;
    ALTER TABLE Country ADD IF NOT EXISTS pagerank DOUBLE;

    CALL page_rank("compnet", maxIterations := 100)
    WITH node, rank
    SET node.pagerank = rank
    """
)

Most Pressured Countries
#

most_pressured_df = conn.execute(
    """
    MATCH (c:Country)
    WHERE c.country_name_short <> "Undeclared"
    RETURN
        c.node_id AS node_id,
        c.country_name_short AS label,
        c.pagerank AS pagerank
    ORDER BY c.pagerank DESC
    LIMIT 25
    """
).get_as_df()

fig, ax = plt.subplots(figsize=(5, 8))
(
  most_pressured_df.iloc[::-1]
  .plot.barh(x="label", y="pagerank", ax=ax)
)
plt.ylabel(None)
plt.legend([])
plt.show()

Least Pressured Countries
#

least_pressured_df = conn.execute(
    """
    MATCH (c:Country)
    WHERE c.country_name_short <> "Undeclared"
    RETURN
        c.node_id AS node_id,
        c.country_name_short AS label,
        c.pagerank AS pagerank
    ORDER BY c.pagerank ASC
    LIMIT 25
    """
).get_as_df()

fig, ax = plt.subplots(figsize=(5, 8))
(
  least_pressured_df.iloc[::-1]
  .plot.barh(x="label", y="pagerank", ax=ax)
)
plt.ylabel(None)
plt.legend([])
plt.show()

Closing Remarks
#

Economies are complex systems, and the complex relations between markets can be captured using a graph. Determining which nodes and relationships to model is crucial to interpretation—our graph focused on competition relationships, and so our metrics and partition approaches illustrated this.

Network analysis tools are usually not as exotic as they want to make us believe. Useful graph data science is usually not that complex, particularly now that tooling is widely available, but it can certainly be extremely insightful, specially when the graph is correctly modeled.

This is only a small introduction to this topic, using world economy and trade as an example topic, which I have been particularly interested in.

The economy and the world overall is suffering. Graphs will help us find solution to complex problems, but it requires the commitment to always ask yourself: could I do this without a graph? When the answer is yes, then you should rethink your approach. If you’re not looking at complex relations, you’re just doing more of the same.

Bottom line, use graphs and use them correctly.

Author

Data Lab Tech

https://youtube.com/@DataLabTechTV

Summary#

Dataset#

Graph Schema#

Jupyter Notebook#

Setup#

ETL#

Imports#

Globals#

Constants#

Functions#

Analysis#

Dynamic Competition Analysis#

Top 10 Dominating Economies#

Top 3 Exports#

Bottom 10 Weaker Economies#

Top 3 Exports#

Dominating vs Weaker Economies#

Competition Network#

Inspection Functions#

Competing Communities#

Community Subgraphs#

Community Mapping#

Top Exported Products#

Top Imported Products#

Trade Alignment#

Self-Sufficiency#

External Competitiveness#

Weakly Connected Competitors#

Component Subgraphs#

Component Mapping#

Top Exported Products#

Top Imported Products#

Trade Alignment#

Self-Sufficiency#

External Competitiveness#

Communities vs Components#

NN-Clusters#

Set Comparison#

WCC Exclusive#

Community Exclusive#

WCC and Community Overlap#

Economic Pressure (PageRank)#

Most Pressured Countries#

Least Pressured Countries#

Closing Remarks#

Summary
#

Dataset
#

Graph Schema
#

Jupyter Notebook
#

Setup
#

ETL
#

Imports
#

Globals
#

Constants
#

Functions
#

Analysis
#

Dynamic Competition Analysis
#

Top 10 Dominating Economies
#

Top 3 Exports
#

Bottom 10 Weaker Economies
#

Top 3 Exports
#

Dominating vs Weaker Economies
#

Competition Network
#

Inspection Functions
#

Competing Communities
#

Community Subgraphs
#

Community Mapping
#

Top Exported Products
#

Top Imported Products
#

Trade Alignment
#

Self-Sufficiency
#

External Competitiveness
#

Weakly Connected Competitors
#

Component Subgraphs
#

Component Mapping
#

Top Exported Products
#

Top Imported Products
#

Trade Alignment
#

Self-Sufficiency
#

External Competitiveness
#

Communities vs Components
#

NN-Clusters
#

Set Comparison
#

WCC Exclusive
#

Community Exclusive
#

WCC and Community Overlap
#

Economic Pressure (PageRank)
#

Most Pressured Countries
#

Least Pressured Countries
#

Closing Remarks
#