Generating Cross-Linguistic Semantic Maps

One of the central features of my Conlanger's Thesaurus is the cross-linguistic semantic maps. While there are some really good ones out there in linguistics journals, they aren't produced very rapidly. It occurred to me that I could, with the help of a some good dictionaries, automate the production of these maps somewhat. With a little Python and some extra libraries, it turned out to be very simple.

In most semantic maps I've run across it looks like people generated the maps by hand. In particular, people followed their own instincts in how they linked together related senses (see the wonderful Fran├žois 2008 for an example). But, I don't trust myself to do that. I won't actually know all the languages I'm drawing on, so I have to trust the dictionary author, who may not have had time to investigate the prototypical meanings of every word in their dictionary. My answer to this is to imagine all the words together as a giant cluster of linked senses.

Using the NetworkX library through Python, this is easy to do. First, the function make_semantic_cluster, which builds a graph of cross-linked senses,

import networkx as nx
import matplotlib.pyplot as plt

def make_semantic_cluster(*senses):
    cluster = nx.Graph()
    n = len(senses)
    for i in range(n):
        for j in range(i+1, n):
            cluster.add_edge(senses[i], senses[j])
    return cluster

Here's a graph for the senses of the Ancient Greek word koite "bed,"

>>> koite = make_semantic_cluster("bed", "lair", "quarters", "lot", "chest")
>>> nx.draw_circular(koite, node_color='#A0CBE2', node_size=750)
>>> plt.savefig("koite.png")

Which results in this graph,

Next, I need to be able find the intersection of any two semantic clusters. That is very simple. Note that I cannot use the NetworkX library intersection since that requires both graphs to have the same nodes.

def cluster_intersection(G, H):
    """Return a graph with edges common to both graphs."""
    di = nx.Graph()
    for edge in G.edges_iter():
        if H.has_edge(*edge): di.add_edge(*edge)
    return di

Now I can grab another word, say Latin cubile, and produce a map of common polysemies, which turns out to be quite restricted in this instance,

>>> koite = make_semantic_cluster("bed", "lair", "quarters", "lot", "chest")
>>> cubile = make_semantic_cluster("bed", "couch", "lair", "kennel")
>>> kc = cluster_intersection(koite, cubile)
>>> kc.edges()
[('lair', 'bed')]

The only shared polysemy between koite and cubile is "bed" and "lair." So, to generate a large list of common cross-linguistic polysemies I have to compare every semantic cluster to every other semantic cluster and merge the results. If I just run cluster_intersection in each semantic cluster in sequence, I will have obliterated polysemies early that might show up later. So, I need to get the interesection of each possible pair, resulting in (N - 1)! graphs, and then merge those. The resulting code is only slightly longer than the previous functions,

def make_semantic_map(*clusters):
    cluster_pairs = []

    # First, make pairwise comparisons of all clusters.
    n = len(clusters)
    for i in range(n):
        for j in range(i+1, n):
            pair = cluster_intersection(clusters[i], clusters[j])
            # Only save if there are common elements.
            if len(pair.nodes()) != 0: cluster_pairs.append(pair)

    # We now have in cluster_pairs graphs of every polysemy that
    # occurs at least *twice*.  From this, we can accumulate a union.
    smap = nx.Graph()
    for cp in cluster_pairs:
        [smap.add_edge(*edge) for edge in cp.edges_iter()]
        # Could tweak this later to accumulate frequency, too.

    return smap

Now I can hand an arbitrary number of semantic clusters to make_semantic_map and get a nice map out the other side. Here, for example, is "bed" across twelve such clusters (I'll want more languages eventually):

There may be cleverer and tidier ways to make these polysemy maps, but I put a high value on comprehensible programs, at least when they aren't grossly inefficient.

The Maps

To generate these maps yourself, you will need a reasonably current version of Python, the NetworkX library, and some additional library to generate the graphical maps if you want those. I use matplotlib for this page, but there are Python GraphViz libraries that will also work.

The map images below use weighted graphs, so that more frequently occurring polysemies stay closer together. The Python code for a map documents the languages and senses used.

Copyright (c) 2006-2017 William S. Annis