I have multiple node- and edgelists which form a large graph, lets call that the
maingraph. My current strategy is to first read all the nodelists and import it with
add_vertices. Every node then gets an internal id which depends on the order they are ingested and therefore isnt very reliable (as i’ve read it, if you delete one, all higher ids than the one deleted change). I assign every node a
name attribute which corresponds to the external ID I use so I can keep track of my nodes between frameworks and a
Now, how do I add the edges? When I read an edgelist it will start making a new graph (
subgraph) and hence starts the internal ID at 0. Therefore, "merging" the graphs with
maingraph.add_edges(subgraph.get_edgelist) inevitably fails.
It is possible to work around this and use the
name attribute from both
subgraph to find out which internal ID each edges’ incident nodes have in the
def _get_real_source_and_target_id(edge): ''' takes an edge from the to-be-added subgraph and gets the ids of the corresponding nodes in the maingraph by their name ''' source_id = maingraph.vs.select(name_eq=subgraph.vs[edge]["name"]).index target_id = maingraph.vs.select(name_eq=subgraph.vs[edge]["name"]).index return (source_id,target_id)
And then I tried
edgelist = [_get_source_and_target_id(x) for x in subgraph.get_edgelist()] maingraph.add_edges(edgelist)
But that is hoooooorribly slow. The graph has millions of nodes and edges, which takes 10 seconds to load with the fast, but incorrect
maingraph.add_edges(subgraph.get_edgelist) approach. with the correct approach explained above, it takes minutes (I usually stop it after 5 minutes o so). I will have to do this tens of thousands of times. I switched from NetworkX to Igraph because of the fast loading, but it doesn’t really help if I have to do it like this.
Does anybody have a more clever way to do this? Any help much appreciated!
Source: Python Questions