Read Graph from multiple files in IGraph (Python)

  igraph, python

I have multiple node- and edgelists which form a large graph, lets call that the maingraph. My current strategy is to first read all the nodelists and import it with add_vertices. Every node then gets an internal id which depends on the order they are ingested and therefore isnt very reliable (as i’ve read it, if you delete one, all higher ids than the one deleted change). I assign every node a name attribute which corresponds to the external ID I use so I can keep track of my nodes between frameworks and a type attribute.

Now, how do I add the edges? When I read an edgelist it will start making a new graph (subgraph) and hence starts the internal ID at 0. Therefore, "merging" the graphs with maingraph.add_edges(subgraph.get_edgelist) inevitably fails.

It is possible to work around this and use the name attribute from both maingraph and subgraph to find out which internal ID each edges’ incident nodes have in the maingraph:

def _get_real_source_and_target_id(edge):
    ''' takes an edge from the to-be-added subgraph and gets the ids of the corresponding nodes in the
    maingraph by their name '''
    source_id =[edge[0]]["name"])[0].index
    target_id =[edge[1]]["name"])[0].index
    return (source_id,target_id)

And then I tried

edgelist = [_get_source_and_target_id(x) for x in subgraph.get_edgelist()]

But that is hoooooorribly slow. The graph has millions of nodes and edges, which takes 10 seconds to load with the fast, but incorrect maingraph.add_edges(subgraph.get_edgelist) approach. with the correct approach explained above, it takes minutes (I usually stop it after 5 minutes o so). I will have to do this tens of thousands of times. I switched from NetworkX to Igraph because of the fast loading, but it doesn’t really help if I have to do it like this.

Does anybody have a more clever way to do this? Any help much appreciated!


Source: Python Questions

One Reply to “Read Graph from multiple files in IGraph (Python)”

  • Try using the ig.Graph_union() function to combine your subgraphs. Combinations can take some time, but you can split up the task and check for errors between fusions.