Skip to main content

Economic Competition Networks

·7168 words·34 mins
Data Science network-science network-analysis economy world-trade market video

Summary
#

In this video, we reproduce the approach that predicts Survivor winners and apply it to Economic Competition Networks to better understand world trade and economic leaders. We build a country to country competition network based on the Export Similarity Index (ESI), and we use several techniques from network science, like PageRank, community detection, weak component analysis, or the recent common out-neighbor (CON) score, to better understand how countries compete with each other within the world economy, identifying dominating or leading economies, as well as their counterpart weaker or smaller economies.

Dataset
#

We use The Atlas of Economic Complexity dataset, which is summarized in the following table. We only provide a top-level overview of the data here. For an in-depth detailed description, click the Download button in each table row of the link above—that will open a popup with detailed information on the fields for each CSV file.

TitleDescription
Complexity Rankings & Growth ProjectionsEconomic Complexity Index (ECI) and partial growth projections for world economies from 1995 to 2023.
Country Trade by ProductExports and imports, per country and product, over the years. Different files provide a different product category granularity based on the number of HS92, HS12 or SITC digits. Different files are also provided for services, using a non-standard classification internal to Growth Labs that also provides different digit-based granularities.
Total Trade by CountryTotal exports and imports, per country, over the years. Different files provide data about products and services.

How big is the economy for a country?
How did it progress over the last 28 years?
Total Trade by ProductTotal exports and imports, per product, over the years. Again, this is provided at different product granularities based on HS92, HS12 or SITC digits. Different files are also provided for services, using a non-standard classification internal to Growth Labs that also provides different digit-based granularities.

How big is the market for a product?
How did it progress over the last 28 years?
Country Trade by PartnerBilateral exports and imports between pairs of countries, over the years.
Country Trade by Partner and ProductBilateral exports and imports between pairs of countries, for a given product, over the years. This is provided at 6-digit granularity based on HS92, HS12 or SITC digits. This is partitioned into multiple files in blocks of 10 years (or 5 years only for 1995-1999).

A granularity of 4 digits would be enough to distinguish between main product types (e.g., beef vs pork vs poultry, fresh vs frozen; gasoline engines vs diesel engines). With 6 digits we get a lot more detail (e.g., carcasses and half-carcasses of bovine animals, fresh or chilled; engines for aircraft). We use the HS92 data with 6 digits—the only one available, but also ideal to capture trade competition between countries, as true competition is only uncovered at a smaller scale. We only look at the 2020-2023 period, for recency, aggregating totals for those three years.
Country ClassificationCountry metadata.
Regional ClassificationRegional classification for countries—continent it belongs to, political region (e.g., European Union), subregion (e.g., Central America, Western Africa), trade regions (e.g., NAFTA, OPEC), etc.
HS12 Product ClassificationProduct metadata according to HS12 codes.
HS92 Product ClassificationProduct metadata according to HS92 codes.

We use this to inspect products traded by salient countries during the analysis.
Services Product ClassificationServices metadata according to a non-standard classification internal to Growth Labs.

We use this to inspect services traded by salient countries during the analysis.
SITC Product ClassificationProduct metadata according to SITC codes.
Product Space Related EdgesHS92 4-digit codes for source and target products in the same space (e.g., women’s coats ⇄ sweaters).
Product Space LayoutHS92 4-digit codes for products along with their 2D embedding, where close products are co-exported by countries.

Here are the citations for the datasets that we use:

Country Trade by Partner and Product:

The Growth Lab at Harvard University, 2025, “International Trade Data (HS92)”, https://doi.org/10.7910/DVN/T4CHWJ, Harvard Dataverse

Country Classification & HS92 Product Classification:

The Growth Lab at Harvard University, 2025, “Classifications Data”, https://doi.org/10.7910/DVN/3BAL1O, Harvard Dataverse

Graph Schema
#

Out of the three CSV files that we identified above as being used, we produce the following nodes and relationship labels:

  • Nodes
    • Country
      • node_id – globally unique node identifier – INT64
      • Properties from all Country Classification columns
    • Product
      • node_id – globally unique node identifier – INT64
      • Properties from all HS92 Product Classification columns
  • Relationships
    • (:Country)-[:CompetesWith]->(:Country)
      • ESI – Export Similarity Index – DOUBLE
    • (:Country)-[:Exports]->(:Product)
      • amount_usd – exports dollar amount (2020-2023) – INT128
    • (:Country)<-[:Imports]->(:Product)
      • amount_usd – imports dollar amount (2020-2023) – INT128

Take a look at the following diagram, where rectangles represent the raw CSV files, with dashed arrows illustrating the data source, and circles represent nodes, with solid arrows representing relationships.

Econ CompNet Graph Schema


Jupyter Notebook
#

The following sections are an adaptation of the Jupyter Notebook that we created to analyze the Economic Competition Network.


Setup
#

ETL
#

For ETL, we directly call the appropriate dlctl commands for:

  1. Ingesting the dataset
  2. Transforming using SQL on top of DuckLake
  3. Exporting from the data lakehouse into Parquet
  4. Loading the graph into Kuzu
  5. Computing general analytics scores

Be sure to uncomment the cell below and run it once.

!dlctl ingest dataset -t atlas \
    "The Atlas of Economic Complexity"
!dlctl transform -m +marts.graphs.econ_comp
!dlctl export dataset graphs econ_comp
!dlctl graph load econ_comp
!dlctl graph compute con-score econ_comp Country CompetesWith

Imports
#

from pathlib import Path
from string import Template
from textwrap import dedent
from typing import Any, Literal, Optional

import kuzu
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import pandas as pd
from scipy.special import expit

import graph.visualization as vis
from shared.settings import LOCAL_DIR, env

Globals
#

We setup access to the appropriate Kuzu path, based on the shared .env configuration, ensuring the graph exists before running the notebook. Once setup, conn will be used to query the graph directly throughout this notebook.

db_path = Path(LOCAL_DIR) / env.str("ECON_COMP_GRAPH_DB")
assert db_path.exists(), \
    "You need to create the graph DB using dlctl first"
db = kuzu.Database(db_path)
conn = kuzu.Connection(db)

Constants
#

In order to ensure color consistency for our plots, we extract the color palette from matplotlib into MPL_PALETTE.

MPL_PALETTE = (
    plt.rcParams["axes.prop_cycle"]
    .by_key()["color"]
)

We also map a display attribute for each of our note labels, Country and Product. We’ll use the short names for both when plotting graph visualizations or related charts.

LABEL_PROPS = {
    "Country": "country_name_short",
    "Product": "product_name_short",
}

Functions
#

We create a few reusable functions, where we run Kuzu queries. In a few cases, it was helpful to debug the query with parameters (e.g., using Kuzu Explorer), so we created a helper function for this (note that this doesn’t support string parameters, as we didn’t ned them).

def print_query(query: str, params: dict[str, Any]):
    dbg_query = dedent(query).strip()
    dbg_query = Template(dbg_query)
    dbg_query = dbg_query.substitute(params)
    print(dbg_query)

We’ll also cluster nodes using different strategies and compare groups, so we implement a basic Jaccard similarity function.

def jaccard_sim(a: pd.Series, b: pd.Series) -> float:
    a = set(a)
    b = set(b)
    return len(a & b) / len(a | b)

We might want to look at the top x% of traded products, based ona USD. The following function will help filter this.

def top_frac(df: pd.DataFrame, col: str, frac: float = 0.25):
    mask = (df[col] / df[col].sum()).cumsum() <= frac
    return df[mask]

Analysis
#

We focus on the CompetesWith projection, a relationship given by the Export Similarity Index (ESI). Our graph analysis includes:

  1. Dynamic competition analysis.
    1. Dominating and weaker economy identification, based on the CON score for each country.
    2. Trade basket overlap analysis for top and bottom economies.
  2. Competition network analysis.
    1. Community analysis, including community mapping, top traded product identification, and trade alignment study (self-sufficiency, external competitiveness).
    2. Weak component analysis, following a similar approach to the community analysis—weak components widen community reach.
    3. Community and weak component comparison.
    4. Economic pressure analysis.

Dynamic Competition Analysis
#

Top 10 Dominating Economies
#

These are highly spread economies, able to compete with several other countries, i.e., with a high number of common out-neighbors (CON).

dom_econ_df = conn.execute(
    """
    MATCH (c:Country)
    RETURN
        c,
        c.node_id AS node_id,
        c.country_name_short AS country
    ORDER BY c.con_score DESC
    LIMIT 10
    """
).get_as_df()[["node_id", "country"]]

dom_econ_df.index = pd.RangeIndex(
    start=1,
    stop=len(dom_econ_df) + 1,
    name="rank"
)

dom_econ_df

node_idcountry
rank
1206United States of America
255Canada
334United Arab Emirates
4107Netherlands
5132United Kingdom
6175Belgium
7134Italy
8223Spain
9131France
10145Thailand
Top 3 Exports
#

Looking at the top exports will help contextualize these economies. We only look at the top 3 products, to keep the visualization clean and readable.

dom_econ_g = conn.execute(
    """
    MATCH (c:Country)
    WITH c
    ORDER BY c.con_score DESC
    LIMIT 10

    MATCH (c)-[e:Exports]->(p:Product)
    MATCH (c2:Country)-[:Exports]->(p)

    WITH c, e, p, count(DISTINCT c2) AS exporters
    WHERE exporters > 1
    WITH c, e, p
    ORDER BY c.node_id, e.amount_usd DESC
    SKIP 0

    WITH c, collect({p: p, e: e}) AS export_list

    UNWIND list_slice(export_list, 0, 3) AS r
    RETURN c, r.e, r.p
    ORDER BY c.node_id, r.p.node_id
    """
).get_as_networkx()

vis.set_labels(dom_econ_g, LABEL_PROPS)
vis.plot(dom_econ_g, scale=1.25, seed=3)

Top 10 Dominating Economies

Bottom 10 Weaker Economies
#

These are smaller or weaker economies, in the sense that they have a lower competition power. We also find the Undeclared special country node at rank 1, showing that only a small number of products are undeclared worldwide.

weak_econ_df = conn.execute(
    """
    MATCH (c:Country)
    RETURN
        c,
        c.node_id AS node_id,
        c.country_name_short AS country
    ORDER BY c.con_score ASC
    LIMIT 10
    """
).get_as_df()[["node_id", "country"]]

weak_econ_df.index = pd.RangeIndex(
    start=1,
    stop=len(weak_econ_df) + 1,
    name="rank"
)

weak_econ_df

node_idcountry
rank
1193Undeclared
272Bouvet Island
3170Wallis and Futuna
4106Norfolk Island
5167Saint Pierre and Miquelon
6216Niue
766South Georgia and South Sandwich Islds.
8121Northern Mariana Islands
9161Heard and McDonald Islands
10100Western Sahara
Top 3 Exports
#

If we look at the top 3 exports for each competing country in the bottom of the ranking according CON scores, as expected we find that these are more disconnected economies, mostly focusing on raw materials, or components and machinery.

weak_econ_g = conn.execute(
    """
    MATCH (c:Country)
    WITH c
    ORDER BY c.con_score ASC
    LIMIT 10

    MATCH (c)-[e:Exports]->(p:Product)
    MATCH (c2:Country)-[:Exports]->(p)

    WITH c, e, p, count(DISTINCT c2) AS exporters
    WHERE exporters > 1
    WITH c, e, p
    ORDER BY c.node_id, e.amount_usd DESC
    SKIP 0

    WITH c, collect({p: p, e: e}) AS export_list

    UNWIND list_slice(export_list, 0, 3) AS r
    RETURN c, r.e, r.p
    ORDER BY c.node_id, r.p.node_id
    """
).get_as_networkx()

vis.set_labels(weak_econ_g, LABEL_PROPS)
vis.plot(weak_econ_g, scale=1.25, seed=3)

Bottom 10 Weaker Economies

Dominating vs Weaker Economies
#

  • Do dominating economies compete in the same markets as weaker economies?
    • If so, maybe that’s why those weaker economies are being pushed to the bottom. ☑️
    • If not, maybe the products exported by those weaker economies are not the most competitive.

Here, we find that, due to the small export diversity, weaker economies are being crushed by dominating economies. Their position of vulnerability comes mostly from geographical isolation and limited area, leading to a lower amount of competition opportunities, where any competitor becomes a risk to the economy.

Below, country node classes are visually translated to a colored node border and label text. We assign two classes, for the top and bottom 10 economies, with top economies in the center, and the products and bottom economies in the surrounding area. This forms a star layout, where each arm is a weaker economy or a small cluster of weaker economies.

We look at the top 3 most exported products in weaker economies, but relaxing the filter on number of exported products for the weaker economies and looking at more than 3 exported products will reproduce the displayed behavior, with dominating economies still competing for the same products. This doesn’t necessarily mean that both dominating and weaker economies produce the same products, as some of them can simply be re-exported.

dom_vs_weak_econ_g = conn.execute(
    """
    MATCH (wea)-[we:Exports]->(p:Product)
    MATCH (dom)-[de:Exports]->(p)
    WHERE dom.node_id IN $dominating_node_ids
        AND wea.node_id IN $weaker_node_ids

    WITH wea, we, p, count(DISTINCT dom) AS dom_competitors
    WHERE dom_competitors > 0

    WITH wea, we, p
    ORDER BY wea.node_id, we.amount_usd DESC
    SKIP 0

    WITH wea, collect({p: p, e: we}) AS export_list
    UNWIND list_slice(export_list, 0, 3) AS r

    WITH wea, r.p.node_id AS prod_node_id
    MATCH (wea)-[we:Exports]
        ->(prod:Product { node_id: prod_node_id })
    MATCH (dom:Country)-[de:Exports]->(prod)
    WHERE dom.node_id IN $dominating_node_ids
    RETURN wea, we, prod, de, dom
    ORDER BY wea.node_id, prod.node_id, dom.node_id
    """,
    dict(
        dominating_node_ids=dom_econ_df.node_id.to_list(),
        weaker_node_ids=weak_econ_df.node_id.to_list(),
    ),
).get_as_networkx()

node_classes = dict(
    dominating=dom_econ_df.node_id.to_list(),
    weaker=weak_econ_df.node_id.to_list(),
)

# This adjusts the visualization edge weights
# to improve readability
for u, v, data in dom_vs_weak_econ_g.edges(data=True):
    if (
        dom_vs_weak_econ_g.nodes[u]["node_id"]
          in node_classes["dominating"]
        and dom_vs_weak_econ_g.nodes[v]["_label"]
          == "Product"
    ):
        data["vis_weight"] = 1e-5

    if (
        dom_vs_weak_econ_g.nodes[u]["node_id"]
          in node_classes["weaker"]
        and dom_vs_weak_econ_g.nodes[v]["_label"]
          == "Product"
    ):
        data["vis_weight"] = 1e-3

vis.set_labels(dom_vs_weak_econ_g, LABEL_PROPS)

vis.plot(
    dom_vs_weak_econ_g,
    node_classes=node_classes,
    scale=1.25,
    seed=5,
)

Dominating vs Weaker Economies

Competition Network
#

Let’s look at the competition network projection for Country nodes and CompetesWith edges. We first install the algo extension for Kuzu and create the compnet projection and NetworkX graph for it.

try:
    conn.execute(
        """
        INSTALL algo;
        LOAD algo;
        """
    )
except Exception as e:
    print(e)
try:
    conn.execute(
        """
        CALL drop_projected_graph("compnet")
        """
    )
except Exception as e:
    print(e)

conn.execute(
    """
    CALL project_graph(
        "compnet",
        {"Country": "n.country_name_short <> 'Undeclared'"},
        {"CompetesWith": "true"}
    )
    """
)
compnet_g = conn.execute(
    """
    MATCH (a:Country)-[cw:CompetesWith]->(b:Country)
    WHERE a.country_name_short <> "Undeclared"
        AND b.country_name_short <> "Undeclared"
    RETURN a, cw, b
    """,
).get_as_networkx()

Inspection Functions
#

The following functions will be useful to plot the cluster and analyze the top exports for a specific cluster ID property:

def plot_cluster(
    prop_name: str,
    prop_value: int,
    kind: Literal["graph", "map"] = "graph",
):
    match kind:
        case "graph":
            compnet_cluster_g = conn.execute(
                f"""
                MATCH (a:Country)-[cw:CompetesWith]->
                    (b:Country)
                WHERE a.country_name_short <> "Undeclared"
                    AND b.country_name_short <> "Undeclared"
                    AND a.`{prop_name}` = $prop_value
                    AND b.`{prop_name}` = $prop_value
                RETURN a, cw, b
                """,
                dict(prop_value=prop_value),
            ).get_as_networkx()

            vis.set_labels(compnet_cluster_g, LABEL_PROPS)
            vis.plot(compnet_cluster_g)

        case "map":
            compnet_cluster_df = conn.execute(
                f"""
                MATCH (c:Country)
                WHERE c.country_name_short <> "Undeclared"
                    AND c.`{prop_name}` = $prop_value
                RETURN
                    c.country_iso3_code AS iso3_code,
                    c.`{prop_name}` AS `{prop_name}`
                """,
                dict(prop_value=prop_value),
            ).get_as_df()

            vis.plot_map(
              compnet_cluster_df,
              code_col="iso3_code",
              class_col=prop_name,
            )
def trade_per_cluster(
    prop_name: str,
    prop_value: int,
    method: Literal["imports", "exports"],
    n: Optional[int] = None,
    debug: bool = False,
) -> pd.DataFrame:
    match method:
        case "exports":
            match_stmt = """
                MATCH (c:Country)-[ie:Exports]->(p:Product)
            """
        case "imports":
            match_stmt = """
                MATCH (c:Country)<-[ie:Imports]-(p:Product)
            """

    if n is None:
        limit_stmt = ""
        limit_param = dict()
    else:
        limit_stmt = "LIMIT $n"
        limit_param = dict(n=n)

    query = f"""
        {match_stmt}
        WHERE c.country_name_short <> "Undeclared"
            AND c.`{prop_name}` = $prop_value
        RETURN
            p.product_name_short AS product,
            sum(ie.amount_usd) AS total_amount_usd
        ORDER BY total_amount_usd DESC
        {limit_stmt}
    """

    params = dict(prop_value=prop_value) | limit_param

    if debug:
        print_query(query, params)

    products_df = conn.execute(query, params).get_as_df()

    return products_df

Partner clusters are clusters that import what a cluster is exporting. These are likely to match all clusters due to high connectivity in the world economy, but it might not always be the case, depending on the clustering criteria.

def partner_clusters(
    prop_name: str,
    prop_value: int,
    include_self: bool = True,
    debug: bool = False,
) -> list[int]:
    include_self_stmt = (
        "" if include_self
        else f"AND c2.`{prop_name}` <> $prop_value"
    )

    query = f"""
        MATCH (c:Country)-[:Exports]-(p:Product)
        MATCH (c2:Country)<-[:Imports]-(p)
        WHERE c.country_name_short <> "Undeclared"
            AND c.`{prop_name}` = $prop_value
            AND c2.`{prop_name}` IS NOT NULL
            {include_self_stmt}
        RETURN DISTINCT c2.`{prop_name}` AS cid
    """

    params = dict(prop_value=prop_value)

    if debug:
        print_query(query, params)

    result = conn.execute(query, params)
    partner_cluster_ids = sorted(
      c[0] for c in result.get_all()
    )

    return partner_cluster_ids

The following functions will help us compute the intra-cluster and inter-cluster trade alignments, i.e., self-sufficiency and external competitiveness, based on cluster-aggregated market share.

def trade_alignment_by_cluster(
    prop_name: str,
    prop_value: int,
    method: Literal["intra", "inter"],
) -> pd.DataFrame:
    exports_df = trade_per_cluster(
        prop_name,
        prop_value,
        method="exports",
    )

    match method:
        case "intra":
            imports_df = trade_per_cluster(
                prop_name,
                prop_value,
                method="imports",
            )

        case "inter":
            imports_df = []

            for partner_cid in partner_clusters(
                prop_name,
                prop_value,
            ):
                partner_imports_df = trade_per_cluster(
                    prop_name,
                    partner_cid,
                    method="imports",
                )
                imports_df.append(partner_imports_df)

            imports_df = (
                pd.concat(imports_df)
                .groupby(["product"])
                .sum()
            )
        case _:
            raise ValueError(
              f"method not supported: {method}"
            )

    trade_df = exports_df.merge(
        imports_df,
        on="product",
        how="right" if method == "intra" else "left",
        suffixes=("_exports", "_imports"),
    ).fillna(0)

    trade_df["sdr"] = (
        trade_df.total_amount_usd_exports
        / trade_df.total_amount_usd_imports
    )

    trade_df = trade_df.sort_values("sdr", ascending=False)

    return trade_df

As a score for measuring either self-sufficiency or external competitiveness, we use weighted average of the Supply-Demand Ration (SDR), where weights are the total export amount (USD) for a given cluster.

def global_sdr_score(
    trade_df: pd.DataFrame,
    eps=1e-9,
) -> float:
    df = trade_df[~np.isinf(trade_df.sdr)]

    df["log_sdr"] = np.log(np.clip(df.sdr, eps, None))

    weights = df.total_amount_usd_exports
    score = expit(
      (weights * df.log_sdr).sum() / weights.sum()
    )

    return score.item()

Competing Communities
#

  • Are there any communities representing closely tied competitor clusters?
    • If so, maybe there are specific products per cluster? ☑️
    • If not, we have a global economy that is fairly homogenous and diverse.

For each property computed with the algo extension, we’ll alter the corresponding node table, recreating the property each time.

conn.execute(
    """
    ALTER TABLE Country DROP IF EXISTS louvain_id;
    ALTER TABLE Country ADD IF NOT EXISTS louvain_id INT64;

    CALL louvain("compnet")
    WITH node, louvain_id
    SET node.louvain_id = louvain_id;
    """
)

The Louvain method partitions the network by optimizing modularity, which essentially means it will find the best partition of communities within the graph, a community being a dense subgraph, i.e., a subgraph where connections among members are more frequent than to outside nodes.

compnet_louvain_df = conn.execute(
    """
    MATCH (c:Country)
    WHERE c.country_name_short <> "Undeclared"
    RETURN
        c.node_id AS node_id,
        c.country_name_short AS label,
        c.louvain_id AS louvain_id
    """
).get_as_df()

node_classes = {
    k: g.node_id.to_list()
    for k, g in compnet_louvain_df.groupby("louvain_id")
}

vis.set_labels(compnet_g, LABEL_PROPS)

vis.plot(
    compnet_g,
    node_classes=node_classes,
    hide_edges=True,
)

Competition Communities

In complex networks, it is not uncommon for a huge community to emerge, along with a low number of moderately large communities, and then a lot of smaller communities. This behavior is not particularly exacerbated here, but it’s still visible. Below, we inspect the community size distribution.

comm_sizes_df = (
    compnet_louvain_df[["louvain_id", "node_id"]]
    .groupby("louvain_id")
    .count()
    .rename(columns=dict(node_id="num_nodes"))
)

comm_sizes_df = comm_sizes_df.reindex(
    comm_sizes_df.num_nodes.sort_values(ascending=False).index
)

comm_sizes_df

num_nodes
louvain_id
567
633
430
826
719
114
214
013
38
97
103
fig, ax = plt.subplots(figsize=(18, 3))
comm_sizes_df.plot.bar(xlabel="Community ID", rot=0, ax=ax)
plt.legend(["No. Nodes"])
plt.show()

Community Size Distribution

Let’s also take a look at the members of each community, from largest to smallest.

for louvain_id in comm_sizes_df.index:
    display(f"LOUVAIN ID: {louvain_id}")
    display(
        compnet_louvain_df[
            compnet_louvain_df.louvain_id == louvain_id
        ]
        .drop(columns="louvain_id")
        .sort_values("label")
    )
'LOUVAIN ID: 5'

node_idlabel
115116Albania
205207Andorra
114115Anguilla
173174Austria
126127Belarus
.........
4950Tunisia
232234Turkiye
123124Turks and Caicos Islands
131132United Kingdom
204206United States of America

67 rows × 2 columns

'LOUVAIN ID: 6'

node_idlabel
209211Algeria
171172Angola
5152Aruba
1112Azerbaijan
177178Cameroon
5455Canada
154155Chad
5556Colombia
1516Democratic Republic of the Congo
220222Ecuador
5758Egypt
158159Equatorial Guinea
100101Fiji
7576Gabon
223225Greenland
211213Guyana
1718Iran
8990Iraq
184185Kazakhstan
152153Kuwait
135136Libya
138139Nigeria
139140Norway
2627Oman
219221Republic of the Congo
6364Russia
229231Sao Tome and Principe
6465Saudi Arabia
2829South Sudan
78Timor-Leste
3132Trinidad and Tobago
112113Venezuela
6768Yemen
'LOUVAIN ID: 4'

node_idlabel
170171Afghanistan
206208Australia
1213Benin
2425Bhutan
207209Bolivia
5253Burundi
116117Central African Republic
101102Guinea
163164Kyrgyzstan
185186Liberia
7778Mali
137138Mauritania
45Mozambique
165166Niger
56Papua New Guinea
7879Rwanda
141142Senegal
202204Sierra Leone
142143Solomon Islands
109110Somalia
187188Sudan
230232Suriname
2930Syria
124125Tajikistan
111112Tanzania
110111Togo
167168Turkmenistan
89US Minor Outlying Islands
99100Western Sahara
216218Zambia
'LOUVAIN ID: 8'

node_idlabel
9697Antarctica
125126Bahrain
176177Botswana
9798China
1415Cocos (Keeling) Islands
210212Guam
160161Heard and McDonald Islands
8788Hong Kong
162163Israel
134135Japan
102103Lesotho
195197Macao
4647Malaysia
6263Malta
164165Namibia
120121Northern Mariana Islands
215217Philippines
4748Pitcairn
233235Samoa
4849Singapore
6566South Georgia and South Sandwich Islds.
7677South Korea
113114Taiwan
7980Vatican City
9596Vietnam
169170Wallis and Futuna
'LOUVAIN ID: 7'

node_idlabel
1011Argentina
127128Belize
217219Brazil
3435Burkina Faso
156157Côte d'Ivoire
9495Eswatini
129130Ethiopia
3839Ghana
8687Guatemala
117118Honduras
151152Kenya
9192Malawi
108109New Zealand
197199Nicaragua
9394Paraguay
5051Uganda
168169Uruguay
146147Uzbekistan
6869Zimbabwe
'LOUVAIN ID: 1'

node_idlabel
6970Bangladesh
1617Cabo Verde
143144El Salvador
12Falkland Islands
3940Haiti
5960Kiribati
4344Maldives
1920Mauritius
7475Micronesia
107108Nauru
122123Seychelles
136137Sri Lanka
6667Tuvalu
191192Vanuatu
'LOUVAIN ID: 2'

node_idlabel
172173Armenia
218220Chile
181182Eritrea
23Georgia
150151Jordan
4142Lebanon
4243Moldova
2526Mongolia
225227North Macedonia
140141Panama
9293Peru
3233South Africa
190191Ukraine
3334United Arab Emirates
'LOUVAIN ID: 0'

node_idlabel
01American Samoa
8081Antigua and Barbuda
1314Barbados
8283Curaçao
179180Cyprus
159160Greece
8586Grenada
5859Jamaica
119120Marshall Islands
214216Niue
6061Saint Lucia
910Saint Vincent and the Grenadines
3536The Bahamas
'LOUVAIN ID: 3'

node_idlabel
212214Cambodia
5657Comoros
34Laos
9091Madagascar
226228Montenegro
4445Myanmar
227229Pakistan
186187Palau
'LOUVAIN ID: 9'

node_idlabel
132133British Indian Ocean Territory
3637Cook Islands
182183Faroe Islands
155156French Southern and Antarctic Lands
222224Guinea-Bissau
161162Iceland
201203Saint Helena, Ascension and Tristan da Cunha
'LOUVAIN ID: 10'

node_idlabel
192194Costa Rica
8384Dominica
157158Dominican Republic
largest_louvain_id = comm_sizes_df.index[0].item()
largest_louvain_id
5
smallest_louvain_id = comm_sizes_df.index[-1].item()
smallest_louvain_id
10
Community Subgraphs
#

Community subgraphs illustrates clusters where competition is more prevalent among its members than countries outside of the community. For this graph (our Econ CompNet, or compnet), they are almost always (if not always) complete subgraphs. We can plot any cluster by its ID.

plot_cluster("louvain_id", largest_louvain_id)

Largest Community Subgraph

Community Mapping
#

Network visualization is not always the best approach to understand your data. This is a good example of this. Since we’re working with a complete (or nearly complete) subgraph, looking at relationships is less helpful, but looking at a map for a community is a lot more helpful, as we can see below.

plot_cluster("louvain_id", largest_louvain_id, kind="map")

Largest Community Map

Top Exported Products
#
  • Is there any export overlap between large and small communities?
largest_comm_top_exported = top_frac(
    trade_per_cluster(
        "louvain_id",
        largest_louvain_id,
        method="exports"
    ),
    "total_amount_usd",
)
largest_comm_top_exported

producttotal_amount_usd
0Commodities not specified, according to kind1.506978e+12
1Oils petroleum, bituminous, distillates1.382886e+12
2Medicaments, doses, nes1.110865e+12
3Blood7.572479e+11
4Petroleum oils, crude5.977950e+11
5Automobiles nes, gas turbine powered5.668137e+11
6Gold in unwrought forms5.482768e+11
7Automobiles, spark ignition, 1500-3000cc5.231797e+11
8Transmit-receive apparatus for radio, TV4.899130e+11
9Monolithic integrated circuits, digital4.366876e+11
10Trade data discrepancies3.521978e+11
11Parts of data processing equipment3.358087e+11
12Automobiles, spark ignition, 1000-1500cc2.657018e+11
13Fixed wing aircraft, >15,000kg2.641497e+11
14Motor vehicle parts nes2.565987e+11
15Vaccines, human2.412159e+11
16Natural gas, liquefied2.406005e+11
17Gold, semi-manufactured forms2.394518e+11
smallest_comm_top_exported = top_frac(
    trade_per_cluster(
        "louvain_id",
        smallest_louvain_id,
        method="exports",
    ),
    "total_amount_usd",
)
smallest_comm_top_exported

producttotal_amount_usd
0Instruments for medical science, nes1.032487e+10
1Medical needles, catheters8.305035e+09
2Trade data discrepancies7.981437e+09
jaccard_sim(
    largest_comm_top_exported["product"],
    smallest_comm_top_exported["product"]
)
0.05
Top Imported Products
#
  • Is there any import overlap between large and small communities?
largest_comm_top_imported = top_frac(
    trade_per_cluster(
        "louvain_id",
        largest_louvain_id,
        method="imports",
    ),
    "total_amount_usd",
)
largest_comm_top_imported

producttotal_amount_usd
0Petroleum oils, crude1.933629e+12
1Commodities not specified, according to kind1.411868e+12
2Oils petroleum, bituminous, distillates1.197450e+12
3Transmit-receive apparatus for radio, TV9.829336e+11
4Medicaments, doses, nes8.788548e+11
5Trade data discrepancies7.574121e+11
6Gold in unwrought forms7.455694e+11
7Blood6.453376e+11
8Automobiles nes, gas turbine powered5.992672e+11
9Natural gas, as gas5.284708e+11
10Parts of data processing equipment5.119915e+11
11Automobiles, spark ignition, 1500-3000cc4.830173e+11
12Monolithic integrated circuits, digital4.750667e+11
smallest_comm_top_imported = top_frac(
    trade_per_cluster(
        "louvain_id",
        smallest_louvain_id,
        method="imports",
    ),
    "total_amount_usd",
)
smallest_comm_top_imported

producttotal_amount_usd
0Oils petroleum, bituminous, distillates1.373957e+10
1Commodities not specified, according to kind7.539036e+09
2Transmit-receive apparatus for radio, TV3.136932e+09
3Automobiles, spark ignition, 1500-3000cc2.425662e+09
4Jewellery of precious metal2.326848e+09
5Instruments for medical science, nes2.297871e+09
6Monolithic integrated circuits, digital2.243517e+09
7Maize except seed corn2.138190e+09
8Natural gas, liquefied2.103650e+09
9Petroleum oils, crude2.089410e+09
10Propane, liquefied2.061530e+09
jaccard_sim(
    largest_comm_top_imported["product"],
    smallest_comm_top_imported["product"],
)
0.3333333333333333
Trade Alignment
#

Trade alignment can be used to determine a cluster’s self-sufficiency by looking at internal country-country trade, or it can be used to determine a cluster’s external competitiveness by looking at inter-cluster country-country trade. We determine both dimensions of trade alignment (intra and inter cluster) based on the supply/demand ratio, more specifically the weighted average of log-SDR, with weights being total amounts (USD) of exports/imports, globally per cluster.

This score is scaled to a 0..1 range using a sigmoid transformation, so anything above 0.5 should be good. The log-transformation ensures the distribution is not skewed.

Self-Sufficiency
#

Most communities are self-sufficient or nearly self-sufficient, with only community 5 showing a little more vulnerability.

comm_self_sufficiency_df = pd.DataFrame(
    dict(
        louvain_id=louvain_id,
        score=global_sdr_score(
            trade_alignment_by_cluster(
                "louvain_id",
                louvain_id,
                method="intra",
            )
        ),
    )
    for louvain_id in comm_sizes_df.index
).sort_values("score", ascending=False)

comm_self_sufficiency_df

louvain_idscore
990.985065
240.970089
510.959150
830.939180
10100.895896
160.869495
470.860508
620.742791
380.644520
700.564580
050.493976
colors = comm_self_sufficiency_df.score.apply(
    lambda s: MPL_PALETTE[0] if s >= 0.5 else MPL_PALETTE[1]
)

fig, ax = plt.subplots(figsize=(18, 3))

comm_self_sufficiency_df.plot.bar(
    x="louvain_id",
    y="score",
    xlabel="Community ID",
    color=colors,
    rot=0,
    ax=ax,
)

plt.axhline(
    y=0.5,
    color=MPL_PALETTE[1],
    linestyle="--",
    linewidth=2,
)
plt.legend([
    "Self-Sufficiency Threshold",
    "Global Log-SDR Score"
])
plt.show()

Community Self-Sufficiency

compnet_louvain_df[compnet_louvain_df.louvain_id == 5]

node_idlabellouvain_id
67Qatar5
1819Lithuania5
2021Portugal5
2122Palestine5
2223British Virgin Islands5
............
221223Spain5
224226India5
228230Romania5
231233Slovenia5
232234Turkiye5

67 rows × 3 columns

External Competitiveness
#

Most communities are not particularly competitive externally, but this was to be expected due to the criteria used to cluster—community dense subgraphs also point to higher internal competition.

comm_external_comp_df = pd.DataFrame(
    dict(
        louvain_id=louvain_id,
        score=global_sdr_score(
            trade_alignment_by_cluster(
                "louvain_id",
                louvain_id,
                method="inter",
            )
        ),
    )
    for louvain_id in comm_sizes_df.index
).sort_values("score", ascending=False)

comm_external_comp_df

louvain_idscore
050.360300
380.304639
160.200342
240.121624
470.089482
620.072207
510.055127
830.033868
990.018567
10100.012569
700.010781
colors = comm_external_comp_df.score.apply(
    lambda s: MPL_PALETTE[0] if s >= 0.5 else MPL_PALETTE[1]
)

fig, ax = plt.subplots(figsize=(18, 3))

comm_external_comp_df.plot.bar(
    x="louvain_id",
    y="score",
    xlabel="Community ID",
    color=colors,
    rot=0,
    ax=ax,
)

plt.axhline(
    y=0.5,
    color=MPL_PALETTE[1],
    linestyle="--",
    linewidth=2,
)
plt.legend([
    "External Competitiveness Threshold",
    "Global SDR Score"
])
plt.show()

Community External Competitiveness

compnet_louvain_df[compnet_louvain_df.louvain_id == 8]

node_idlabellouvain_id
1415Cocos (Keeling) Islands8
4647Malaysia8
4748Pitcairn8
4849Singapore8
6263Malta8
6566South Georgia and South Sandwich Islds.8
7677South Korea8
7980Vatican City8
8788Hong Kong8
9596Vietnam8
9697Antarctica8
9798China8
102103Lesotho8
113114Taiwan8
120121Northern Mariana Islands8
125126Bahrain8
134135Japan8
160161Heard and McDonald Islands8
162163Israel8
164165Namibia8
169170Wallis and Futuna8
176177Botswana8
195197Macao8
210212Guam8
215217Philippines8
233235Samoa8

Weakly Connected Competitors
#

Strongly connected components in our graph would have capture mutual competition among peers, cyclical or balanced rivalries, or equivalent strategic positions. However, once we removed the “Undeclared” pseudo-country, we weren’t able to find any strongly connected components that were not singletons.

As such, we compute the weakly connected components, instead capturing the individual or isolated components of countries competing among themselves, regardless of export amount (which establishes direction, in our graph).

conn.execute(
    """
    ALTER TABLE Country DROP IF EXISTS wcc_id;
    ALTER TABLE Country ADD IF NOT EXISTS wcc_id INT64;

    CALL weakly_connected_components("compnet")
    WITH node, group_id
    SET node.wcc_id = group_id;
    """
)
compnet_wcc_df = conn.execute(
    """
    MATCH (c:Country)
    WHERE c.country_name_short <> "Undeclared"
    RETURN
        c.node_id AS node_id,
        c.country_name_short AS label,
        c.wcc_id AS wcc_id
    """
).get_as_df()

node_classes = {
  k: g.node_id.to_list()
  for k, g in compnet_wcc_df.groupby("wcc_id")
}

vis.set_labels(compnet_g, LABEL_PROPS)

vis.plot(
    compnet_g,
    node_classes=node_classes,
    hide_edges=True,
)

Weak Components

As we can see, there a multiple weakly connected competitors, but most of them are single nodes in their own SCC. Other than that, there is a large component of 64 countries, and then two other smaller components with over 20 nodes each, that we’ll inspect below.

wcc_sizes_df = (
    compnet_wcc_df[["wcc_id", "node_id"]]
    .groupby("wcc_id")
    .count()
    .rename(columns=dict(node_id="num_nodes"))
)

wcc_sizes_df = wcc_sizes_df.reindex(
    wcc_sizes_df.num_nodes.sort_values(ascending=False).index
)

wcc_sizes_df

num_nodes
wcc_id
064
128
424
511
29
......
2091
2151
2261
2281
2301

68 rows × 1 columns

wcc_sizes_ord_df = wcc_sizes_df.reset_index(drop=True)

wcc_singleton_threshold = (
    wcc_sizes_ord_df[wcc_sizes_ord_df.num_nodes <= 1]
    .index[0]
    .item()
)

fig, ax = plt.subplots(figsize=(30, 5))

wcc_sizes_df.plot.bar(rot=0, ax=ax)

plt.axvline(
    x=wcc_singleton_threshold,
    color=MPL_PALETTE[1],
    linestyle="--",
    linewidth=2,
)

plt.legend(["Singleton Threshold", "No. Nodes"])

plt.show()

Weak Components Size Distribution

Let’s take a look at the members of each weak component, from largest to smallest.

for wcc_id in wcc_sizes_df[wcc_sizes_df.num_nodes > 1].index:
    display(f"WCC ID: {wcc_id}")
    display(
        compnet_wcc_df[compnet_wcc_df.wcc_id == wcc_id]
        .drop(columns="wcc_id")
    )
'WCC ID: 0'

node_idlabel
01American Samoa
67Qatar
1112Azerbaijan
1415Cocos (Keeling) Islands
1819Lithuania
.........
211213Guyana
215217Philippines
221223Spain
224226India
231233Slovenia

64 rows × 2 columns

'WCC ID: 1'

node_idlabel
12Falkland Islands
1617Cabo Verde
1920Mauritius
2627Oman
3940Haiti
4344Maldives
4748Pitcairn
5556Colombia
5960Kiribati
6667Tuvalu
6768Yemen
6970Bangladesh
7475Micronesia
7980Vatican City
102103Lesotho
122123Seychelles
125126Bahrain
132133British Indian Ocean Territory
155156French Southern and Antarctic Lands
161162Iceland
162163Israel
176177Botswana
177178Cameroon
182183Faroe Islands
191192Vanuatu
201203Saint Helena, Ascension and Tristan da Cunha
220222Ecuador
223225Greenland
'WCC ID: 4'

node_idlabel
45Mozambique
1213Benin
3839Ghana
5051Uganda
5253Burundi
7374Finland
7576Gabon
7778Mali
7879Rwanda
99100Western Sahara
101102Guinea
111112Tanzania
116117Central African Republic
121122Sweden
128129Switzerland
131132United Kingdom
163164Kyrgyzstan
165166Niger
170171Afghanistan
184185Kazakhstan
193195Ireland
205207Andorra
219221Republic of the Congo
230232Suriname
'WCC ID: 5'

node_idlabel
56Papua New Guinea
89US Minor Outlying Islands
5859Jamaica
6061Saint Lucia
8384Dominica
8586Grenada
120121Northern Mariana Islands
145146Tonga
157158Dominican Republic
192194Costa Rica
210212Guam
'WCC ID: 2'

node_idlabel
23Georgia
34Laos
2728North Korea
3334United Arab Emirates
4445Myanmar
172173Armenia
174175Belgium
212214Cambodia
226228Montenegro
'WCC ID: 10'

node_idlabel
1011Argentina
8687Guatemala
9394Paraguay
117118Honduras
168169Uruguay
197199Nicaragua
217219Brazil
'WCC ID: 9'

node_idlabel
910Saint Vincent and the Grenadines
5657Comoros
9091Madagascar
119120Marshall Islands
186187Palau
'WCC ID: 15'

node_idlabel
1516Democratic Republic of the Congo
3233South Africa
9293Peru
181182Eritrea
218220Chile
'WCC ID: 42'

node_idlabel
4243Moldova
190191Ukraine
203205Serbia
228230Romania
'WCC ID: 22'

node_idlabel
2223British Virgin Islands
3637Cook Islands
8182Bermuda
178179Cayman Islands
'WCC ID: 17'

node_idlabel
1718Iran
3132Trinidad and Tobago
158159Equatorial Guinea
'WCC ID: 137'

node_idlabel
137138Mauritania
185186Liberia
206208Australia
'WCC ID: 40'

node_idlabel
4041Indonesia
232234Turkiye
'WCC ID: 24'

node_idlabel
2425Bhutan
216218Zambia
'WCC ID: 94'

node_idlabel
9495Eswatini
127128Belize
'WCC ID: 49'

node_idlabel
4950Tunisia
6162Morocco
'WCC ID: 110'

node_idlabel
110111Togo
141142Senegal
'WCC ID: 109'

node_idlabel
109110Somalia
187188Sudan
'WCC ID: 136'

node_idlabel
136137Sri Lanka
143144El Salvador
'WCC ID: 21'

node_idlabel
2122Palestine
199201Poland
'WCC ID: 167'

node_idlabel
167168Turkmenistan
207209Bolivia
'WCC ID: 160'

node_idlabel
160161Heard and McDonald Islands
166167Saint Pierre and Miquelon
'WCC ID: 223'

node_idlabel
222224Guinea-Bissau
233235Samoa
largest_wcc_id = wcc_sizes_df.index[0].item()
largest_wcc_id
0
smallest_wcc_id = wcc_sizes_df.index[-1].item()
smallest_wcc_id
230
Component Subgraphs
#
plot_cluster("wcc_id", largest_wcc_id)

Largest Component Subgraph

Component Mapping
#
plot_cluster("wcc_id", largest_wcc_id, kind="map")

Largest Component Map

Top Exported Products
#
  • Is there any export overlap between large and small components?
largest_wcc_top_exported = top_frac(
    trade_per_cluster("wcc_id", largest_wcc_id, "exports"),
    "total_amount_usd",
)
largest_wcc_top_exported

producttotal_amount_usd
0Monolithic integrated circuits, digital2.880361e+12
1Petroleum oils, crude2.815121e+12
2Oils petroleum, bituminous, distillates2.185186e+12
3Commodities not specified, according to kind1.859842e+12
4Transmit-receive apparatus for radio, TV1.629116e+12
5Trade data discrepancies9.872075e+11
6Parts of data processing equipment7.646240e+11
7Medicaments, doses, nes7.542306e+11
smallest_wcc_top_exported = top_frac(
    trade_per_cluster("wcc_id", smallest_wcc_id, "exports"),
    "total_amount_usd",
)
smallest_wcc_top_exported

producttotal_amount_usd
0Petroleum oils, crude24676511.0
jaccard_sim(
    largest_wcc_top_exported["product"],
    smallest_wcc_top_exported["product"]
)
0.125
Top Imported Products
#
  • Is there any import overlap between large and small components?
largest_wcc_top_imported = top_frac(
    trade_per_cluster("wcc_id", largest_wcc_id, "imports"),
    "total_amount_usd",
)
largest_wcc_top_imported

producttotal_amount_usd
0Petroleum oils, crude3.075323e+12
1Monolithic integrated circuits, digital2.780868e+12
2Commodities not specified, according to kind1.510984e+12
3Oils petroleum, bituminous, distillates1.420618e+12
4Transmit-receive apparatus for radio, TV1.332514e+12
5Trade data discrepancies1.248542e+12
6Medicaments, doses, nes8.050835e+11
7Parts of data processing equipment6.992992e+11
8Automobiles, spark ignition, 1500-3000cc6.571986e+11
smallest_wcc_top_imported = top_frac(
    trade_per_cluster("wcc_id", smallest_wcc_id, "imports"),
    "total_amount_usd",
)
smallest_wcc_top_imported

producttotal_amount_usd
0Oils petroleum, bituminous, distillates88922018.0
1Cargo vessels, not tanker or refrigerated23292230.0
2Commodities not specified, according to kind21288342.0
3Rice, semi- or wholly-milled15654678.0
jaccard_sim(
    largest_comm_top_imported["product"],
    smallest_wcc_top_imported["product"]
)
0.13333333333333333
Trade Alignment
#

Again, trade alignment can be used to determine a cluster’s self-sufficiency by looking at internal country-country trade, or it can be used to determine a cluster’s external competitiveness by looking at inter-cluster country-country trade. We determine both dimensions of trade alignment (intra and inter cluster) based on the supply/demand ratio, more specifically the weighted average of log-SDR, with weights being total amounts (USD) of exports/imports, globally per cluster.

This score is scaled to a 0..1 range using a sigmoid transformation, so anything above 0.5 should be good. The log-transformation ensures the distribution is not skewed.

Self-Sufficiency
#

Most components are self-sufficient or nearly self-sufficient, with only three of them, components 209, 22 and 196, showing a little more vulnerability.

wcc_self_sufficiency_df = pd.DataFrame(
    dict(
        wcc_id=wcc_id,
        score=global_sdr_score(
            trade_alignment_by_cluster(
                "wcc_id",
                wcc_id,
                method="intra",
            )
        ),
    )
    for wcc_id in wcc_sizes_df.index
).sort_values("score", ascending=False)

wcc_self_sufficiency_df

wcc_idscore
201670.999962
571970.999932
25340.999926
23250.999721
521560.999568
.........
240.586408
000.518923
632090.464203
9220.300144
561960.280055

68 rows × 2 columns

colors = wcc_self_sufficiency_df.score.apply(
    lambda s: MPL_PALETTE[0] if s >= 0.5 else MPL_PALETTE[1]
)

fig, ax = plt.subplots(figsize=(30, 5))

wcc_self_sufficiency_df.plot.bar(
    x="wcc_id",
    y="score",
    xlabel="Weak Component ID",
    color=colors,
    rot=0,
    ax=ax,
)

plt.axhline(
    y=0.5,
    color=MPL_PALETTE[1],
    linestyle="--",
    linewidth=2,
)
plt.legend([
    "Self-Sufficiency Threshold",
    "Global SDR Score"
])
plt.show()

Component Self-Sufficiency

compnet_wcc_df[compnet_wcc_df.wcc_id == 0]

node_idlabelwcc_id
01American Samoa0
67Qatar0
1112Azerbaijan0
1415Cocos (Keeling) Islands0
1819Lithuania0
............
211213Guyana0
215217Philippines0
221223Spain0
224226India0
231233Slovenia0

64 rows × 3 columns

External Competitiveness
#

Most components are not particularly competitive externally, even less so than communities, with the large majority having a SDR-based score lower than 0.1.

wcc_external_comp_df = pd.DataFrame(
    dict(
        wcc_id=wcc_id,
        score=global_sdr_score(
            trade_alignment_by_cluster(
                "wcc_id",
                wcc_id,
                method="inter",
            )
        ),
    )
    for wcc_id in wcc_sizes_df.index
).sort_values("score", ascending=False)

wcc_external_comp_df

wcc_idscore
000.424935
111370.135049
240.112296
7150.086413
5100.084617
.........
591950.000212
211600.000073
672300.000053
431140.000023
26650.000009

68 rows × 2 columns

colors = wcc_external_comp_df.score.apply(
    lambda s: MPL_PALETTE[0] if s >= 0.5 else MPL_PALETTE[1]
)

fig, ax = plt.subplots(figsize=(30, 5))

wcc_external_comp_df.plot.bar(
    x="wcc_id",
    y="score",
    xlabel="Weak Component ID",
    color=colors,
    rot=0,
    ax=ax,
)

plt.axhline(
    y=0.5,
    color=MPL_PALETTE[1],
    linestyle="--",
    linewidth=2,
)
plt.legend([
    "External Competitiveness Threshold",
    "Global SDR Score"
])
plt.show()

Component External Competitiveness

compnet_louvain_df[compnet_louvain_df.louvain_id == 0]

node_idlabellouvain_id
01American Samoa0
910Saint Vincent and the Grenadines0
1314Barbados0
3536The Bahamas0
5859Jamaica0
6061Saint Lucia0
8081Antigua and Barbuda0
8283Curaçao0
8586Grenada0
119120Marshall Islands0
159160Greece0
179180Cyprus0
214216Niue0

Communities vs Components
#

By matching the clustering (communities and weak components) with the highest number of clusters, and therefore smaller clusters, to the clustering with the lowest number of clusters, we can run a pairwise cluster comparison:

  • Which countries belong to a community, but not the weak component?
  • Which countries belong to a weak component, but not the community?
  • Which countries belong to both?
  • Is there a particular semantic to these countries?
len(wcc_sizes_df), len(comm_sizes_df)
(68, 11)
NN-Clusters
#

We compute community to weak component similarities, selecting the nearest-neighbor community for each component. Given the higher number of components when compared to communities, we’ll necessarily have repeated nearest-neighbor communities.

cluster_sim_df = []

for wcc_id, wcc in compnet_wcc_df.groupby("wcc_id"):
    for louvain_id, comm in (
        compnet_louvain_df.groupby("louvain_id")
    ):
        cluster_sim_df.append(
            dict(
                wcc_id=wcc_id,
                louvain_id=louvain_id,
                sim=jaccard_sim(wcc.label, comm.label),
            )
        )

cluster_sim_df = pd.DataFrame(cluster_sim_df)
cluster_sim_df = cluster_sim_df.loc[
    cluster_sim_df
    .groupby(["wcc_id"])
    .idxmax()
    .sim
]
cluster_sim_df

wcc_idlouvain_idsim
5050.297030
12110.354839
25230.307692
37440.317073
545100.272727
............
69321500.076923
71322390.125000
71722620.071429
72922830.125000
74323060.030303

68 rows × 3 columns

For example, community 5 matches with 20 different weak components.

cluster_sim_df.louvain_id.value_counts()
louvain_id
5     20
7     10
4     10
2      7
6      6
8      4
1      3
3      3
0      2
9      2
10     1
Name: count, dtype: int64
cluster_sim_df[cluster_sim_df.louvain_id == 5]

wcc_idlouvain_idsim
5050.297030
1262150.029851
1924050.029851
2254950.029851
2697250.014925
2808450.014925
3139850.014925
33510350.014925
40111450.014925
42312650.014925
48914450.014925
51114750.014925
54415350.014925
59917350.014925
61019550.014925
63219750.014925
64319950.014925
65420150.014925
67620950.014925
68721450.014925
Set Comparison
#

Let’s select a weakest component and retrieve its NN community to compare.

## comp_wcc_id = largest_wcc_id
comp_wcc_id = compnet_wcc_df.loc[
    compnet_wcc_df.label == "Australia",
    "wcc_id"
].item()

comp_comm_id = cluster_sim_df.loc[
    cluster_sim_df.wcc_id == comp_wcc_id,
    "louvain_id",
].item()

comp_wcc_id, comp_comm_id
(137, 4)
comp_wcc_countries = set(
    compnet_wcc_df.loc[
        compnet_wcc_df.wcc_id == comp_wcc_id,
        "label"
    ]
)

comp_louvain_countries = set(
    compnet_louvain_df.loc[
        compnet_louvain_df.louvain_id == comp_comm_id,
        "label"
    ]
)
WCC Exclusive
#
pd.Series(
    list(comp_wcc_countries - comp_louvain_countries),
    name="country",
).sort_values().to_frame()

country
Community Exclusive
#
pd.Series(
    list(comp_louvain_countries - comp_wcc_countries),
    name="country",
).sort_values().to_frame()

country
5Afghanistan
26Benin
20Bhutan
14Bolivia
24Burundi
18Central African Republic
4Guinea
1Kyrgyzstan
23Mali
8Mozambique
25Niger
17Papua New Guinea
7Rwanda
21Senegal
2Sierra Leone
13Solomon Islands
0Somalia
15Sudan
12Suriname
10Syria
19Tajikistan
6Tanzania
3Togo
9Turkmenistan
11US Minor Outlying Islands
16Western Sahara
22Zambia
WCC and Community Overlap
#
pd.Series(
    list(comp_wcc_countries | comp_louvain_countries),
    name="country",
).sort_values().to_frame()

country
17Afghanistan
9Australia
14Benin
10Bhutan
5Bolivia
29Burundi
26Central African Republic
2Guinea
1Kyrgyzstan
27Liberia
28Mali
8Mauritania
19Mozambique
13Niger
6Papua New Guinea
18Rwanda
11Senegal
15Sierra Leone
4Solomon Islands
0Somalia
24Sudan
23Suriname
21Syria
7Tajikistan
3Tanzania
16Togo
20Turkmenistan
22US Minor Outlying Islands
25Western Sahara
12Zambia

Economic Pressure (PageRank)
#

Economic pressure can easily be measured using PageRank, as it is a converging metric that aggregates the overall incoming competition strength, increasing its value as the contributing competing countries are themselves under economic pressure.

conn.execute(
    """
    ALTER TABLE Country DROP IF EXISTS pagerank;
    ALTER TABLE Country ADD IF NOT EXISTS pagerank DOUBLE;

    CALL page_rank("compnet", maxIterations := 100)
    WITH node, rank
    SET node.pagerank = rank
    """
)
Most Pressured Countries
#
most_pressured_df = conn.execute(
    """
    MATCH (c:Country)
    WHERE c.country_name_short <> "Undeclared"
    RETURN
        c.node_id AS node_id,
        c.country_name_short AS label,
        c.pagerank AS pagerank
    ORDER BY c.pagerank DESC
    LIMIT 25
    """
).get_as_df()

fig, ax = plt.subplots(figsize=(5, 8))
(
  most_pressured_df.iloc[::-1]
  .plot.barh(x="label", y="pagerank", ax=ax)
)
plt.ylabel(None)
plt.legend([])
plt.show()

Most Pressured Economies

Least Pressured Countries
#
least_pressured_df = conn.execute(
    """
    MATCH (c:Country)
    WHERE c.country_name_short <> "Undeclared"
    RETURN
        c.node_id AS node_id,
        c.country_name_short AS label,
        c.pagerank AS pagerank
    ORDER BY c.pagerank ASC
    LIMIT 25
    """
).get_as_df()

fig, ax = plt.subplots(figsize=(5, 8))
(
  least_pressured_df.iloc[::-1]
  .plot.barh(x="label", y="pagerank", ax=ax)
)
plt.ylabel(None)
plt.legend([])
plt.show()

Least Pressured Economies

Closing Remarks
#

Economies are complex systems, and the complex relations between markets can be captured using a graph. Determining which nodes and relationships to model is crucial to interpretation—our graph focused on competition relationships, and so our metrics and partition approaches illustrated this.

Network analysis tools are usually not as exotic as they want to make us believe. Useful graph data science is usually not that complex, particularly now that tooling is widely available, but it can certainly be extremely insightful, specially when the graph is correctly modeled.

This is only a small introduction to this topic, using world economy and trade as an example topic, which I have been particularly interested in.

The economy and the world overall is suffering. Graphs will help us find solution to complex problems, but it requires the commitment to always ask yourself: could I do this without a graph? When the answer is yes, then you should rethink your approach. If you’re not looking at complex relations, you’re just doing more of the same.

Bottom line, use graphs and use them correctly.