Welcome to

Description

ArchaMap is actively looking for contributors. See the help page for instructions. Contact admin@catmapper.org for questions or to contribute.

ArchaMap is an open-source tool designed to aid in the integration of multiple complex data sets with different sources, data ontologies, and resolutions. ArchaMap is designed to save time, increase consistency, and document complex data merging processes among multiple sources. This application stores and suggests past translations to build an ever-expanding list of associations to aid in connecting categorical data across different sources. ArchaMap uses previously uploaded categories to build a database of potential category names and includes contextual information to help users find the appropriate match.

ArchaMap provides substantial savings in time for integrating datasets. Furthermore, every dataset that is integrated provides additional alternate names to further improve matching and provides a permanent record showing how the datasets were integrated. This follows open science principles and reduces work for future researchers. An additional benefit is that each category is linked to a dataset. ArchaMap has an explore feature that allows users to find categories through a full text search and then identify all datasets that use that category. This can help tie datasets whether they be publications, online datasets, or archived collections.

Database Progress

Download datasets list

Download All Datasets

Select category domain

Enter search term

Advanced Search

Select category subdomain

Property to search

Context (ID of related node)

CMID

Download Dataset Relationships

Choose relationship to view

choose layout

An authorized user is required to upload data

Please contact admin@catmapper.org if you would like to become a registered user.

Create new dataset if necessary

Choose:

standard

advanced

options	description
Adding new node:	Use this if all rows in the spreadsheet are creating a new node and represent a unique node.
Adding new uses ties:	Use this if you are adding new uses ties with existing nodes or if you have a mix of new nodes and existing nodes or if you have new nodes that have multiple rows of data that represent each node. This function will aggregate rows by dataset, SocioMapID or ArchaMapID (if present), and Key.
Updating existing USES only--add or add to properties	Use this if you are updating properties for existing uses ties but not replacing any information.
Updating existing USES only--replace one property	Use this if you are replacing or removing data from a property. This is only valid for a single property.

Select option

Adding new node for every row

Adding new uses ties (with old or new nodes)

Updating existing USES only--add or add to properties

Updating existing USES only--replace one property

Under Development: Not ready for use

Build Dataset Stack

Choose datasets to stack

Datasets

Under development

Automated Merge Process

Upload two datasets to merge. Both datasets must have a `datasetID` column with a valid CMID for each row. Both datasets must have the original `Key` columns specified in the database translation that was previously uploaded to the dataset with the matching CMID. If you have not yet translated and uploaded your dataset, please do so now.

Upload first dataset

Browse...

Upload second dataset

Browse...

download results

Specify Datasets to Merge

Datasets

Choose domain

Select category domain

Choose type of merge

Standard: Categories are only equivalent if they point to the same node

Refined: Categories are only equivelent if they point to the same node and are within a specified window of time and distance

Extended: Categories can be equivalent if they point to nodes that are connected by contains ties

Return all categories from all datasets (True) or only return categories present in all datasets (False)

download results

Upload Equivalences

Under development

Merging code

choose merging template ID

Download list of datasets

Upload merging template with included file paths

Browse...

Choose syntax output

Download merge files

If you have any questions, please contact us at help@catmapper.org.

Application version 0.6

CatMapper Manual

1. PREAMBLE

2. INTRODUCTION

3. HOW-TO: EXPLORING

4. HOW-TO: TRANSLATING

5. HOW-TO: MERGING

PREAMBLE

This manual includes basic introductory information on CatMapper as well as how-to guides on how to use CatMapper’s main functions.

INTRODUCTION

Objectives. CatMapper assists users in:

Exploring where data is available for complex, evolving categories commonly used in the social sciences (e.g., ethnic, religious, language, geospatial, and archaeological categories). For example, where can I find data on speakers of Yoruba or people who identify as Yoruba or followers of Isese, the Yoruba religion.
Translating categories from new datasets to categories already stored in CatMapper.
Merging data across diverse, external datasets by these complex categories.
Documenting and Sharing their translations and merges so that other users can check and re-use their work.

Apps. CatMapper currently includes two apps aimed at organizing two kinds of categories. SocioMap organizes sociopolitical categories, such as ethnicities, religions, languages, and administrative districts. ArchaMap organizes categories of material objects used in archaeology, including sites, ceramic types, lithic and projectile point types, and faunal types. Our hope in the future is to extend CatMapper’s capabilities to other classes of complex, dynamic categories.

Users. CatMapper accommodates three kinds of users:

Casual Users can use CatMapper’s explore and translate functions to find potential category matches, and find well-sourced and reliable contextual data on categories. They do not need to login. However, they cannot change the database.
Registered Users can upload new proposed matches to CatMapper, and by May 2023, they will also be able to generate and share merging templates from published work. Please email us at admin@catmapper.org if you would like to sign up as a registered user.
Administrative Users have additional access to a range of tools for fixing database errors by directly changing the structure and content of the graph database (e.g., editing tie and node information, moving ties, and merging nodes).

Key Concepts. CatMapper stores information on categories (Table 1), how they are related to each other, and how they are encoded by diverse datasets. CatMapper stores contextual information about categories through a range of ties (e.g., contains, district_of, language_of, religion_of) (Table 2).

Table 1. Current CatMapper Category Domains

App	Primary Domain	Subdomains
SocioMap	ETHNICITY
SocioMap	LANGUOID	LANGUAGE, DIALECT, FAMILY
SocioMap	RELIGION
ArchaMap	CERAMIC	CERAMIC_TYPE, CERAMIC_WARE
ArchaMap	PROJECTILE_POINT	PROJECTILE_POINT_CLUSTER, PROJECTILE_POINT_TYPE
ArchaMap	PERIOD
All	DISTRICT	ADM0-ADM4, ADME, ADMD, ADMX, PPL, SITE, REGION
All	GENERIC

Table 2. Ties that store contextual information about categories

Tie	Description
X CONTAINS Y	Y is a sub-category of X
X DISTRICT_OF Y	X is a geospatial locale for Y
X LANGUAGE_OF Y	X is a language associated with Y
X RELIGION_OF Y	X is a religion associated with Y
X USES Y	Dataset X uses the category Y

When a dataset uses a specific category, there is a USES tie from the dataset to the category. This USES tie records: (1) how the dataset encodes that category (i.e., name, key), (2) claims that the dataset makes about the category (e.g., geospatial location, population estimate, associated languages and religions, other categories it contains or is contained by). A key specifies how a specific dataset uniquely encodes a specific category. A simple key involves a single variable and value (e.g., V131 = 3). More complex keys can involve combinations of variables and values (e.g., V131 = 3 AND V024 = 1).

A category set is the set of categories in a specific domain encoded by a specific dataset (e.g., ethnicities coded in DHS Guatemala 1995 survey, language spoken coded in WVS Cote D’Ivoire). A merging set is a category set for a dataset that resulted from a merge of multiple datasets. For a clearly defined merge, there must be only many-to-one and one-to-one links from each of the datasets’ category sets to the merging set.

A merging template stores all information necessary to create files and scripts for reproducible merging across a specific set of external datasets, including (1) category sets used in each dataset, (2) variables names, variable values and value labels used by each dataset to encode categories, (3) the merging set for the final merged dataset, (4) translations between categories in each external dataset with categories in the merging set, and (5) the operators for aggregating relevant variables over many-to-one mappings from external datasets to the merging set.

A key part of a merging template are stacks. A stack is a dataset which is joined to other datasets during a merge. A stack may be created by appending multiple datasets.

To support merging, CatMapper also stores metadata on variables used in stored merging templates and how these variables are encoded in relevant datasets.

Main Pages. In each app, there are three sets of pages corresponding with three functions: (1) explore, (2) translate, and (3) merge.

For exploring, a search page allows users to search for categories, datasets, and variables stored in CatMapper. A user can then navigate from a specific search result to a view page which provides additional information on the search result.

For translating, a propose translation page allows users to upload a spreadsheet with categories from a new dataset, to request proposed best matches that already exist in CatMapper, and to view and download those matches. An upload translation page allows registered users to upload final curated matches to CatMapper.

For merging, a propose merge page allows users to request proposed matches between categories from different datasets. An upload merge page allows users to build and upload key components of a merge to completely specify a merging template. A download merging template page allows users to download the files to replicate a merge specific by an uploaded merging template.

HOW-TO: EXPLORING

How can I find contextual information on ethnicities, languages, districts, religions, and datasets?

Click on the Explore tab and choose Search.
Under Select category type, choose which type of category (e.g., ethnicity, language, district, religion) you would like to search for. ADM0-ADM4 indicate districts at specific levels of the administrative hierarchy (with ADM0 = countries, and ADM1 = 1^st level administrative subdistricts).
Choose “Name” as property to search
Under enter search term box, enter the category name you are searching for.
If you would like to limit your search to a particular country, under Optional Filters, choose that country in the Country box.
Once you press the Search button, a set of search results will appear below.
Click on the search result you would like to explore, and a View window will open.
This page will include contextual information about the category, including (1) relevant countries and languages, (2) datasets containing the category with info on population estimates, sample size, geospatial location, and name used by the dataset.

8. As an exercise, try searching for the ethnic groups, Yoruba or Pashtun.

How can I identify which datasets include information on a specific ethnicity, language, district, or religion?

The Sample section of the View page contains a row for each dataset that contains information on a specific category. In some cases, a dataset may contain information on that category from different places or times. In those cases, there may be multiple rows for the same ethnicity from the same dataset.
Alternatively, In the Network Explorer box at the bottom of the View window, choose "USES" in the drop down menu labeled "Choose Relationship to View"
This will show links to all datasets with data relevant to the category.
To explore metadata for a dataset of interest, select the relevant node and click the View Selected Node button. The View page for that dataset will appear.
Note: The Network Explorer only includes a maximum of 10 nodes. To view all available nodes, please look at the drop down list under "Choose node to view relationship."

How can I see how a dataset encodes a specific ethnicity, language, district, or religion?

In the Network Explorer box at the bottom of the View window, choose USES in the drop down menu labeled "Choose Relationship to View"
This will show links to all datasets with data relevant to the category.
Hover over the link to the relevant dataset. It will show the variable and variable value used to encode that category.

How can I identify and explore ethnicities, languages, districts and religions that are related to a specific category?

1.       In the Network Explorer at the bottom of the View window, the drop down menu labeled "Choose Relationship to View" gives you the choice of exploring CONTAINS ties (i.e. Maya contains Kiche Maya), LANGUAGE_OF ties, DISTRICT_OF ties, and USES ties which describe how different datasets encode the category

2.       Once you choose the kind of tie you'd like to explore, the network will be shown in the box.

3.       If you click on a node of interest and click the View Selected Node button, the View page for that node will appear.

4.       Note: The Network Explorer only includes a maximum of 10 nodes. To view all available nodes, please look at the drop down list under "Choose node to view relationship."

How can I view the entire set of datasets with encodings stored in SocioMap?

Click on the Explore tab and choose Search.
Choose DATASET under Select category type.
Choose “Name” as property to search
Leave Enter search term box blank.
If you would like to limit your search to a particular country, choose that country under the Country dropdown menu (under Optional Filters).
Once you press the Search button, a set of search results will appear below.

How can I see what variables are available for a specific ethnicity, language, district, religion?

In the Network Explorer box at the bottom of the View window, choose USES ties
Click on a dataset that uses the category, and
choose "Variables" in the drop down menu labeled "Choose Relationship to View"
This will show links to all variables that are available for a specific category.
To view which datasets include that variable for the specific category, hover on the link to the relevant variable.
Note: The network view only includes a maximum of 10 nodes. To view all available nodes, please look at the drop down list under "Choose node to view relationship."

HOW-TO: TRANSLATING

If I have a spreadsheet of category names from a new dataset, how can I find the best matches from CatMapper?

On the Propose Translations page, click “Browse” and choose the spreadsheet located on your computer.
Under “Input column to match”, choose the column in the spreadsheet with the category names.
Under “Select category type”, choose the domain of categories you would like to translate.
Under “Property to search”, choose “Name”
Limiting searches by country can substantially improve specificity of proposals. To do this, make sure to include a column in the spreadsheet with the CatMapper ID for the country associated with each category.
1. Click on “Limit by country”
2. Under “Choose column with country IDs”, choose the appropriate spreadsheet column.
To limit search by other contextual factors (e.g., associated language), make sure to include a column in the spreadsheet with the CatMapper ID for the relevant contextual category
1. Click on “Limit by context”
2. Under “Choose column with context IDs”, choose the appropriate spreadsheet column.
To limit search to only those categories used by a specific dataset, make sure to include a column in the spreadsheet with the CatMapper ID for the relevant dataset
1. Click on “Limit by dataset”
2. Under “Choose column with dataset IDs”, choose the appropriate spreadsheet column.
To limit search to only those categories in a specific time range, add appropriate limits under “from” and “to” under “Time range (years)”.
Press “Search”
Proposed matches will be shown on the right. Color coding indicates non-exact matches that merit attention.
You can download a spreadsheet with proposed matches. After downloading, you can revise and curate the CatMapper-proposed matches to upload.

How can I revise and curate a downloaded spreadsheet of CatMapper-proposed matches?

The spreadsheet of CatMapper-proposed matches includes a column indicating matches that may need attention. These include cases where: (1) no match was found< (2) more than one match was found, and (3) and a name match was not perfect.
When CatMapper couldn’t automatically find a match, the user can combine outside searches with CatMapper’s explore functions to identify a matching CatMapper category.
1. If an appropriate match is found, the user can copy and paste the corresponding CatMapper ID into the CatMapper ID column in the spreadsheet.
2. If an appropriate match is not found, registered users have access to tools under “upload translations” to add a new category. After the category is created, the user can add its CatMapper ID to the CatMapper ID column in the spreadsheet.
When CatMapper automatically proposes more than one match, the user can combine outside searches with CatMapper’s explore functions to identify the best match. The user keeps the CatMapper ID for the appropriate match and erases the rows the excluded matches.
For any other match that the user decides to change, they can replace the existing CatMapper ID with the appropriate CatMapper ID.

How can I upload the spreadsheet of revised and curated matches?

Login as a registered user. If you are not already a registered user, please email admin@catmapper.org to sign up
Go to the “upload translations” page
Click “Create New Dataset” and fill out the necessary details for the dataset from which the new categories are coming. Note the Dataset ID number assigned to the new dataset.
Click “Browse” and choose the spreadsheet on your computer.
Select the domain of categories to be uploaded (e.g., ETHNICITY, ADM1…)
Input the dataset ID for the new dataset you just created. If you can’t remember it, you can find this by searching for the dataset by name in the explore function.
Choose which column contains the category names
Choose which column contains the CatMapper ID
Choose which column contains the unique ID used by the dataset for the category. This is essential for recording the Key used by the dataset for the category.
Click “Upload”. Clicking “Upload Status” will provide information about how long the upload has been running.

HOW-TO: MERGING

I would like to bring together variables from several datasets merging them by a specific category domain (e.g., by ethnicity). How do I find corresponding categories across the datasets?

Ensure the categories from each of the datasets have already been translated into CatMapper (see How-To Translation above).
Go to propose merge page
Click “Load Datasets”.
Choose the datasets you would like to find matches across. As an example, choose Berezkin Folklore and Ethnographic Atlas
Choose the category domain you would like to match by (e.g., ETHNICITY, LANGUOID…)
Choose the method and option for matching categories:
1. The Standard method only matches categories from different datasets that point to the same CatMapper category.
2. The Refine option (available in Summer 2023) allows the user to limit matches between datasets based on temporal and geospatial proximity between samples from different datasets.
3. The Extended option (available in Summer 2023) allows users to leverage CONTAINS ties between categories when finding matches. If there is no direct match, CatMapper proposes the closest category that can be found via CONTAINS ties.
4. The Extended-languages option (available Summer 2023) allows users to leverage language_of ties as well as glottolog language trees stored in CatMapper to find matches. If the Standard method and Extended option do not find a match, then CatMapper proposes the closest category that can be found via the language tree.
Click “Submit”, and a window will list the proposed matches along with the keys from each of the datasets.
Click “Download Results” to download a spreadsheet with all of the proposed matches.
The user can use the downloaded spreadsheet (with keys) as a link file for building a merge between the datasets.

I would like to download a merging template for a published dataset that someone else has already built?

Go to the search page and choose “DATASET”, “Name”.
Input the name of the dataset and record the dataset ID.
Go to the download merging template page
Under “choose merging template ID”, type the dataset ID
Click “Find Merging Template”
Click “Download list of datasets”, and open downloaded spreadsheet on your computer
Make sure all of the datasets listed in the spreadsheet are stored in a working directory folder on your computer.
Fill out the spreadsheet with the location of the working directory as well as the name and location of each dataset within the working directory.
Click “Browse” and choose the completed spreadsheet for uploading to CatMapper.
Choose syntax to output
1. R (available March-April 2023).
2. SPSS (available in Summer 2023).
3. Stata (available in Fall 2023).
4. SAS (available in Fall 2023).
Click “generate merging files”
Click “download merging files”
Store the downloaded files in the working directory.
For R, open and run the R syntax.

I would like to build and share a merging template? Coming Summer 2023.

CatMapper API User Guide

This user guide documents the available endpoints for CatMapper’s API. The api base URL is https://catmapper.org/api.

Examples can be run within a browser or through various API clients by affixing the examples to the base URL (e.g., https://catmapper.org/api/CMID?database=SocioMap&cmid=SM1).

Questions and feedback can be directed to support@catmapper.org.

API User Guide: Search Endpoint

Endpoint Description

The /search endpoint is tailored for conducting database searches on a single or empty search term on the explore page. This endpoint accommodates searches in specific databases and can filter results based on various parameters such as domain, year range, country, and context.

HTTP Request Method

GET

Resource URL

/search

Query Parameters

database: The name of the CatMapper database where the search will be conducted. Only ‘SocioMap’ and ‘ArchaMap’ are valid values.
term (optional): The search term. If not provided, the search will return all results.
property (optional): Specifies the property to search by, with options including ‘Name’, ‘CMID’, or ‘Key’.
domain (optional): Specifies the domain category within which the search is conducted. Default is ‘CATEGORY’.
yearStart (optional): The earliest year of data collection or existence of the category. Results will return if category year range intersects with this range.
yearEnd (optional): The latest year of data collection or existence of the category.
country (optional): CMID of the ADM0 node with a ‘DISTRICT_OF’ tie.
context (optional): CMID of the parent node in the network.
limit (optional): Limits the number of results returned; defaults to 10000 if not specified.
query (optional): If set to ‘true’, returns the cypher query instead of executing it.

Request Examples

GET /search?database=SocioMap&term=Yoruba&domain=ETHNICITY&property=Name

GET /search?database=ArchaMap&term=Grasshopper&domain=SITE&property=Name

Responses

Successful Response

Status Code: 200 OK
Content:
- If query is set to false, returns a JSON array of search results with fields such as CMID, CMName, country, domain, matching, and matchingDistance.
- If query is set to true, returns a JSON object containing the cypher query and relevant parameters.

Error Response

Status Code: 500 Internal Server Error
Content: A JSON object detailing the error encountered during the search operation.

API User Guide: Retrieve CMID Details

Endpoint Description

This API endpoint (/CMID) is designed to retrieve comprehensive details about a specific CatMapperID (CMID) from different databases. It fetches both node properties and their relationships associated with the specified CMID. The endpoint supports the GET method and requires the specification of both the database and the CMID.

HTTP Request Method

GET

Resource URL

/CMID

Query Parameters

database: A string specifying which database to search in. Valid options are:
- SocioMap: Targets the SocioMap database.
- ArchaMap: Targets the ArchaMap database.
cmid: The CatMapperID for which details are to be retrieved. This should be a valid identifier that exists within the specified database.

Request Examples

GET /CMID?database=SocioMap&cmid=SM1

GET /CMID?database=ArchaMap&cmid=AM1

Responses

Successful Response

Status Code: 200 OK
Content: A JSON object containing detailed information about the node and its relationships. The response structure is as follows:
- node: An array of objects, each representing a node property:
  - nodeID: The identifier of the node.
  - nodeProperties: The property name of the node.
  - nodeValues: The value associated with the node property.
- relations: An object mapping relationship IDs to their properties:
  - Each relationship ID will have associated properties and values.

Error Response

Status Code: 500 Internal Server Error
Content: A string message detailing the nature of the error, usually related to incorrect or missing parameters.

API User Guide: Retrieve Dataset Details

Endpoint Description

This API endpoint (/dataset) is designed to retrieve detailed information about a dataset based on a given CMID (CatMapperID) from the specified database, filtering additionally by domain categories. This endpoint allows for querying dataset relations and properties within specified domains. It uses a GET method and requires specifying the database, CMID, and optionally the domain.

HTTP Request Method

GET

Resource URL

/dataset

Query Parameters

database: A string identifier specifying the database from which to fetch the dataset. Accepted values are:
- SocioMap: Targets the SocioMap database.
- ArchaMap: Targets the ArchaMap database.
cmid: The CatMapperID of the dataset for which information is to be retrieved.
domain (optional): A category to filter dataset relationships. Defaults to “CATEGORY” if not specified.

Request Examples

GET /dataset?database=SocioMap&cmid=SD1&domain=CATEGORY

GET /dataset?database=ArchaMap&cmid=AD1

Responses

Successful Response

Status Code: 200 OK
Content: A JSON array of objects, each representing details of relationships and properties of datasets related to the specified CMID. Typical properties included are:
- datasetName: The name of the dataset.
- datasetID: The CMID of the dataset.
- CMID: The CatMapperID of related entities.
- CMName: The name of related entities.
- type: The type of relationship.
- Key: Key information of the relationship.
- Other dynamic properties based on the dataset’s schema and the specified domain.

Error Response

Status Code: 500 Internal Server Error
Content: A string message indicating the nature of the error, typically related to incorrect database or CMID parameters or issues with database connections.

API User Guide: Retrieve All Datasets

Endpoint Description

This API endpoint (/allDatasets) provides a method to retrieve detailed information about datasets from different databases. The endpoint supports a GET method that requires specifying a particular database from which to fetch datasets.

HTTP Request Method

GET

Resource URL

/allDatasets

Query Parameters

database: A string identifier specifying the database from which to retrieve datasets. Accepted values are:
- SocioMap: Retrieves datasets from the SocioMap database.
- ArchaMap: Retrieves datasets from the ArchaMap database.

Request Examples

GET /allDatasets?database=SocioMap

GET /allDatasets?database=ArchaMap

Responses

Successful Response

Status Code: 200 OK
Content: An array of objects, where each object represents a dataset with the following fields:
- nodeID: Identifier of the dataset node.
- CMName: CatMapper name associated with the dataset.
- CMID: CatMapper ID for the dataset.
- shortName: A shorter, more concise name for the dataset.
- project: The project under which the dataset was created or is maintained.
- Unit: The unit the dataset applies to.
- parent: The parent dataset, if any.
- ApplicableYears: The years to which the dataset is applicable.
- DatasetCitation: Citation information for the dataset.
- District: The district covered by the dataset.
- DatasetLocation: URL or other location of the dataset.
- SubNational: Indicates if the dataset is sub-national.
- DatasetVersion: Version information of the dataset.
- DatasetScope: The scope of the dataset.
- Subdistrict: The subdistrict covered by the dataset.
- Note: Additional notes or comments about the dataset.

Error Response

Status Code: 500 Internal Server Error
Content: A string message describing the error, typically related to an invalid database specification or connection issues.

Please contact admin@catmapper.org if you would like to become a registered user.

Welcome to

An authorized user is required to upload data

Under Development: Not ready for use

Build Dataset Stack

Choose datasets to stack

Under development

Automated Merge Process

Specify Datasets to Merge

Choose domain

Upload Equivalences

Under development

Merging code

Contents

PREAMBLE

INTRODUCTION

HOW-TO: EXPLORING

HOW-TO: TRANSLATING

HOW-TO: MERGING

CatMapper API User Guide

API User Guide: Search Endpoint

Endpoint Description

HTTP Request Method

Resource URL

Query Parameters

Request Examples

Responses

Successful Response

Error Response

API User Guide: Retrieve CMID Details

Endpoint Description

HTTP Request Method

Resource URL

Query Parameters

Request Examples

Responses

Successful Response

Error Response

API User Guide: Retrieve Dataset Details

Endpoint Description

HTTP Request Method

Resource URL

Query Parameters

Request Examples

Responses

Successful Response

Error Response

API User Guide: Retrieve All Datasets

Endpoint Description

HTTP Request Method

Resource URL

Query Parameters

Request Examples

Responses

Successful Response

Error Response

Please log in