Network Data

Network data files (subsettable files) can be subsetted and analyzed online by using the Dataverse Network application. For analysis, the Dataverse Network offers generic network data analysis. A list of Network Analysis Models are provided.

Note: All subsetting and analysis options for network data assume a network with undirected edges.

After you find the network data set that you want, access the Subset and Analysis options to use the online tools. Then, you can subset data by verticies or edges, download subsets, and apply network measures.

Access Network Subset and Analyze Options

You can subset and analyze network data files before you download the file or your subsets. To access the Subset and Analysis options for a network data set:

  1. Click the title of the study from which you choose to analyze or download a file or subset.
  2. Click the Documentation, Data and Analysis tab for the study.
  3. In the list of study files, locate the network data file that you choose to download, subset, or analyze. You can download data sets for a file only if the file entry includes the subset icon.
  4. Click the Access Subset/Analysis link associated with the selected file. If prompted, check the I accept box and click Continue to accept the Terms of Use.
    You see the Data File page listing data for the file that you choose to subset or analyze.

Subset Network Data

There are two ways in which you can subset network data. First, you can run a manual query, and build a query of specific values for edge or vertex data with which to subset the data. Or, you can select from among three automatically generated queries with which to subset the data:

  • Largest graph - Subset the <nth> largest connected component of the network. That is, the largest group of nodes that can reach one another by walking across edges.
  • Neighborhood - Subset the <nth> neighborhood of the selected vertices. That is, generate a subgraph of the original network composed of all vertices that are positioned at most <n> steps away from the currently selected vertices in the original network, plus all of the edges that connect them.

You also can successively subset data to isolate specific values progressively.

Continue to the next topics for detailed information about subsetting a network data set.

Subset Manually

Perform a manual query to slice a graph based on the attributes of its vertices or edges. You choose whether to subset the graph based on vertices or edges, then use the Manual Query Builder or free-text Query Workspace fields to construct a query based on that element's attributes. A single query can pertain only to vertices or only to edges, never both. You can perform separate, sequential vertex or edge queries.

When you perform a vertex query, all vertices whose attributes do not satisfy the query are dropped from the graph, in addition to all edges that touch them. When you perform an edge query, all edges whose attributes do not satisfy the criteria are dropped, but all vertices remain unless you enable the Eliminate disconnected vertices check box. Note that enabling this option drops all disconnected vertices whether or not they were disconnected before the edge query.

Review the Network Data Tips before you start work with a study's files.

To subset variables within a network data set by using a manually defined query:

  1. In the Data File page, click the Manual Query radio button near the top of the page.
  2. Use the Attribute Set drop-down list and select Vertex to subset by node or vertex values.
    Select Edge to subset by edge values.
  3. Build the first attribute selection value in the Manual Query Builder panel:
    1. Select a value in the Attributes list to assign values on which to subset.
    2. Use the Operators drop-down list to choose the function by which to define attributes for selection in this query.
    3. In the Values field, type the specific values to use for selection of the attribute.
    4. Click Add to Query to complete the attribute definition for selection.
      You see the query string for this attribute in the Query Workspace field.

    Alternatively, you can enter your query directly by typing it into the Query Workspace field.

  4. Continue to add selection values to your query by using the Manual Query Builder tools.
  5. To remove any verticies that do not connect with other data in the set, check the Eliminate disconnected vertices check box.
  6. When you complete construction of your query string, click Run to perform the query.
  7. Scroll to the bottom of the window, and when the query is processed you see a new entry in the Subset History panel that defines your query.

Continue to build a successive subset or download a subset.

Subset Automatically

Peform an Automatic Query to select a subgraph of the nextwork based on structural properties of the network. Remember to review the Network Data Tips before you start work with a study's files.

To subset variables within a network data set by using an automatically generated query:

  1. In the Data File page, click the Automatic Query radio button near the middle of the page.
  2. Use the Function drop-down list and select the type of function with which to select your subset:
    • Largest graph - Subset the <nth> largest group of nodes that can reach one another by walking across edges.
    • Neighborhood - Generate a subgraph of the original network composed of all vertices that are positioned at most <n> steps away from the currently selected vertices in the original network, plus all of the edges that connect them. This is the only query that can (and generally does) increase the number of vertices and edges selected.
  3. In the Nth field, enter the <nth> degree with which to select data using that function.
  4. Click Run to perform the query.
  5. Scroll to the bottom of the window, and when the query is processed you see a new entry in the Subset History panel that defines your query.

Continue to build a successive subset or download a subset.

Build or Restart Subsets

Build a Subset

To build successive subsets and narrow your data selection progressively:

  1. Perform a manual or automatic subset query on a selected data set.
  2. Perform a second query to further narrow the results of your previous subset activity.
  3. When you arrive at the subset with which you choose to work, continue to analyze or download that subset.

Undo Previous Subset

You can reset, or undo, the most recent subsetting action for a data set. Note that you can do this only one time, and only to the most recent subset.

Scroll to the Subset History panel at the bottom of the page and click Undo in the last row of the list of successive subsets.
The last subset is removed, and the previous subset is available for downloading, further subsetting, or analysis.

Restart Subsetting

You can remove all subsetting activity and restore data to the original set.

Scroll to the Subset History panel at the bottom of the page and click Restart in the row labeled Initial State.
The data set is restored to the original condition, and is available for downloading, subsetting, or analysis.

Run Network Measures

When you finish selecting the specific data that you choose to analyze, run a Network Measure analysis on that data. Review the Network Data Tips before you start your analysis.

  1. In the Data File page, click the Network Measure radio button near the bottom of the page.
  2. Use the Attributes drop-down list and select the type of analysis to perform:
    • Page Rank - Determine how much influence comes from a specific actor or node.
    • Degree - Determine the number of relationships or collaborations exist within a network data set.
    • Unique Degree - Determine the number of collaborators that exist.
    • In Largest Component - Determine the largest component of a network.
    • Bonacich Centrality - Determine the importance of a main actor or node.
  3. In the Parameters field, enter the specific value with which to subset data using that function:
    • Page Rank - Enter a value for the parameter <d>, a proportion, between 0 and 1.
    • Degree - Enter the number of relationships to extract from a network data set.
    • Unique Degree - Enter the number of unique relationships to extract.
    • In Largest Component - Enter the number of components to extract from a network data set, starting with the largest.
  4. Click Run to perform the analysis.
  5. Scroll to the bottom of the window, and when the analysis is processed you see a new entry in the Subset History panel that contains your analyzed data.

Continue to download the analyzed subset.

Download Network Subsets or Measures

When you complete subsetting and analysis of a network data set, you can download the final set of data. Network data subsets are downloaded in a zip archive, which has the name subset_<original file name>.zip. This archive contains three files:

  • subset.xml - A GraphML formatted file that contains the final subsetted or analyzed data.
  • verticies.tab - A tabular file that contains all node data for the final set.
  • edges.tab - A tabular file that contains all relationship data for the final set.

Note: Each time you download a subset of a specific network data set, a zip archive is downloaded that has the same name. All three zipped files within that archive also have the same names. Be careful not to overwrite a downloaded data set that you choose to keep when you perform sucessive downloads.

To download a final set of data:

  1. Scroll to the Subset History panel on the Data File page.
  2. Click Download Latest Results at the bottom of the history list.
  3. Follow your browser's prompts to open or save the data file to your computer's disk drive. Be sure to save the file in a unique location to prevent overwritting an existing downloaded data file.

Network Data Tips

Use these guidelines when subsetting or analyzing network data:

  • For a Page rank network measure, the value for the parameter <d> is a proportion and must be between 0 and 1. Higher values of <d> increase dispersion, while values of <d> closer to zero produce a more uniform distribution. PageRank is normalized so that all of the PageRanks sum to 1.
  • For a Bonacich Centrality network measure, the alpha parameter is a proportion that must be between -1 and +1. It is normalized so that all alpha centralities sum to 1.
  • For a Bonacich Centrality network measure, the exo parameter must be greater than 0. A higher value of exo produces a more uniform distribution of centrality, while a lower value allows more variation.
  • For a Bonacich Centrality network measure, the original alpha parameter of alpha centrality takes values only from -1/lambda to 1/lambda, where lambda is the largest eigenvalue of the adjacency matrix. In this Dataverse Network implementation, the alpha parameter is rescaled to be between -1 and 1 and represents the proportion of 1/lambda to be used in the calculation. Thus, entering alpha=1 sets alpha to be 1/lambda. Entering alpha=0.5 sets alpha to be 1/(2*lambda).