Welcome to
ArchaMap is actively looking for contributors. See the help page for instructions. Contact admin@catmapper.org for questions or to contribute.
ArchaMap is an open-source tool designed to aid in the integration of multiple complex data sets with different sources, data ontologies, and resolutions. ArchaMap is designed to save time, increase consistency, and document complex data merging processes among multiple sources. This application stores and suggests past translations to build an ever-expanding list of associations to aid in connecting categorical data across different sources. ArchaMap uses previously uploaded categories to build a database of potential category names and includes contextual information to help users find the appropriate match.
ArchaMap provides substantial savings in time for integrating datasets. Furthermore, every dataset that is integrated provides additional alternate names to further improve matching and provides a permanent record showing how the datasets were integrated. This follows open science principles and reduces work for future researchers. An additional benefit is that each category is linked to a dataset. ArchaMap has an explore feature that allows users to find categories through a full text search and then identify all datasets that use that category. This can help tie datasets whether they be publications, online datasets, or archived collections.
Please enter the CMID of the category that provides additional context for the search term. For example this could be a district category with a DISTRICT_OF relationship to the desired category. This requires a specific label and does not work for the CATEGORY label
An authorized user is required to upload data
Please contact admin@catmapper.org if you would like to become a registered user.
Create new dataset if necessary
Under Development: Not ready for use
Build Dataset Stack
Choose datasets to stack
Under development
Automated Merge Process
Upload two datasets to merge. Both datasets must have a `datasetID` column with a valid CMID for each row. Both datasets must have the original `Key` columns specified in the database translation that was previously uploaded to the dataset with the matching CMID. If you have not yet translated and uploaded your dataset, please do so now.
Create Dataset
Specify Stacks
Choose domain
Upload Equivalences
If you have any questions, please contact us at help@catmapper.org.
Application version 0.6
CatMapper Manual
Contents
PREAMBLE
This manual includes basic introductory information on CatMapper as well as how-to guides on how to use CatMapper’s main functions.
INTRODUCTION
Objectives. CatMapper assists users in:
Exploring where data is available for complex, evolving categories commonly used in the social sciences (e.g., ethnic, religious, language, geospatial, and archaeological categories). For example, where can I find data on speakers of Yoruba or people who identify as Yoruba or followers of Isese, the Yoruba religion.
Translating categories from new datasets to categories already stored in CatMapper.
Merging data across diverse, external datasets by these complex categories.
Documenting and Sharing their translations and merges so that other users can check and re-use their work.
Apps. CatMapper currently includes two apps aimed at organizing two kinds of categories. SocioMap organizes sociopolitical categories, such as ethnicities, religions, languages, and administrative districts. ArchaMap organizes categories of material objects used in archaeology, including sites, ceramic types, lithic and projectile point types, and faunal types. Our hope in the future is to extend CatMapper’s capabilities to other classes of complex, dynamic categories.
Users. CatMapper accommodates three kinds of users:
Casual Users can use CatMapper’s explore and translate functions to find potential category matches, and find well-sourced and reliable contextual data on categories. They do not need to login. However, they cannot change the database.
Registered Users can upload new proposed matches to CatMapper, and by May 2023, they will also be able to generate and share merging templates from published work. Please email us at admin@catmapper.org if you would like to sign up as a registered user.
Administrative Users have additional access to a range of tools for fixing database errors by directly changing the structure and content of the graph database (e.g., editing tie and node information, moving ties, and merging nodes).
Key Concepts. CatMapper stores information on categories (Table 1), how they are related to each other, and how they are encoded by diverse datasets. CatMapper stores contextual information about categories through a range of ties (e.g., contains, district_of, language_of, religion_of) (Table 2).
Table 1. Current CatMapper Category Domains
App | Primary Domain | Subdomains |
---|---|---|
SocioMap | ETHNICITY | |
SocioMap | LANGUOID | LANGUAGE, DIALECT, FAMILY |
SocioMap | RELIGION | |
ArchaMap | CERAMIC | CERAMIC_TYPE, CERAMIC_WARE |
ArchaMap | PROJECTILE_POINT | PROJECTILE_POINT_CLUSTER, PROJECTILE_POINT_TYPE |
ArchaMap | PERIOD | |
All | DISTRICT | ADM0-ADM4, ADME, ADMD, ADMX, PPL, SITE, REGION |
All | GENERIC |
Table 2. Ties that store contextual information about categories
Tie | Description |
---|---|
X CONTAINS Y | Y is a sub-category of X |
X DISTRICT_OF Y | X is a geospatial locale for Y |
X LANGUAGE_OF Y | X is a language associated with Y |
X RELIGION_OF Y | X is a religion associated with Y |
X USES Y | Dataset X uses the category Y |
When a dataset uses a specific category, there is a USES tie from the dataset to the category. This USES tie records: (1) how the dataset encodes that category (i.e., name, key), (2) claims that the dataset makes about the category (e.g., geospatial location, population estimate, associated languages and religions, other categories it contains or is contained by). A key specifies how a specific dataset uniquely encodes a specific category. A simple key involves a single variable and value (e.g., V131 = 3). More complex keys can involve combinations of variables and values (e.g., V131 = 3 AND V024 = 1).
A category set is the set of categories in a specific domain encoded by a specific dataset (e.g., ethnicities coded in DHS Guatemala 1995 survey, language spoken coded in WVS Cote D’Ivoire). A merging set is a category set for a dataset that resulted from a merge of multiple datasets. For a clearly defined merge, there must be only many-to-one and one-to-one links from each of the datasets’ category sets to the merging set.
A merging template stores all information necessary to create files and scripts for reproducible merging across a specific set of external datasets, including (1) category sets used in each dataset, (2) variables names, variable values and value labels used by each dataset to encode categories, (3) the merging set for the final merged dataset, (4) translations between categories in each external dataset with categories in the merging set, and (5) the operators for aggregating relevant variables over many-to-one mappings from external datasets to the merging set.
A key part of a merging template are stacks. A stack is a dataset which is joined to other datasets during a merge. A stack may be created by appending multiple datasets.
To support merging, CatMapper also stores metadata on variables used in stored merging templates and how these variables are encoded in relevant datasets.
Main Pages. In each app, there are three sets of pages corresponding with three functions: (1) explore, (2) translate, and (3) merge.
For exploring, a search page allows users to search for categories, datasets, and variables stored in CatMapper. A user can then navigate from a specific search result to a view page which provides additional information on the search result.
For translating, a propose translation page allows users to upload a spreadsheet with categories from a new dataset, to request proposed best matches that already exist in CatMapper, and to view and download those matches. An upload translation page allows registered users to upload final curated matches to CatMapper.
For merging, a propose merge page allows users to request proposed matches between categories from different datasets. An upload merge page allows users to build and upload key components of a merge to completely specify a merging template. A download merging template page allows users to download the files to replicate a merge specific by an uploaded merging template.
HOW-TO: EXPLORING
How can I find contextual information on ethnicities, languages, districts, religions, and datasets?
Click on the Explore tab and choose Search.
Under Select category type, choose which type of category (e.g., ethnicity, language, district, religion) you would like to search for. ADM0-ADM4 indicate districts at specific levels of the administrative hierarchy (with ADM0 = countries, and ADM1 = 1st level administrative subdistricts).
Choose “Name” as property to search
Under enter search term box, enter the category name you are searching for.
If you would like to limit your search to a particular country, under Optional Filters, choose that country in the Country box.
Once you press the Search button, a set of search results will appear below.
Click on the search result you would like to explore, and a View window will open.
This page will include contextual information about the category, including (1) relevant countries and languages, (2) datasets containing the category with info on population estimates, sample size, geospatial location, and name used by the dataset.
8. As an exercise, try searching for the ethnic groups, Yoruba or Pashtun.
How can I identify which datasets include information on a specific ethnicity, language, district, or religion?
The Sample section of the View page contains a row for each dataset that contains information on a specific category. In some cases, a dataset may contain information on that category from different places or times. In those cases, there may be multiple rows for the same ethnicity from the same dataset.
Alternatively, In the Network Explorer box at the bottom of the View window, choose "USES" in the drop down menu labeled "Choose Relationship to View"
This will show links to all datasets with data relevant to the category.
To explore metadata for a dataset of interest, select the relevant node and click the View Selected Node button. The View page for that dataset will appear.
Note: The Network Explorer only includes a maximum of 10 nodes. To view all available nodes, please look at the drop down list under "Choose node to view relationship."
How can I see how a dataset encodes a specific ethnicity, language, district, or religion?
In the Network Explorer box at the bottom of the View window, choose USES in the drop down menu labeled "Choose Relationship to View"
This will show links to all datasets with data relevant to the category.
Hover over the link to the relevant dataset. It will show the variable and variable value used to encode that category.
How can I identify and explore ethnicities, languages, districts and religions that are related to a specific category?
1. In the Network Explorer at the bottom of the View window, the drop down menu labeled "Choose Relationship to View" gives you the choice of exploring CONTAINS ties (i.e. Maya contains Kiche Maya), LANGUAGE_OF ties, DISTRICT_OF ties, and USES ties which describe how different datasets encode the category
2. Once you choose the kind of tie you'd like to explore, the network will be shown in the box.
3. If you click on a node of interest and click the View Selected Node button, the View page for that node will appear.
4. Note: The Network Explorer only includes a maximum of 10 nodes. To view all available nodes, please look at the drop down list under "Choose node to view relationship."
How can I view the entire set of datasets with encodings stored in SocioMap?
Click on the Explore tab and choose Search.
Choose DATASET under Select category type.
Choose “Name” as property to search
Leave Enter search term box blank.
If you would like to limit your search to a particular country, choose that country under the Country dropdown menu (under Optional Filters).
Once you press the Search button, a set of search results will appear below.
How can I see what variables are available for a specific ethnicity, language, district, religion?
In the Network Explorer box at the bottom of the View window, choose USES ties
Click on a dataset that uses the category, and
choose "Variables" in the drop down menu labeled "Choose Relationship to View"
This will show links to all variables that are available for a specific category.
To view which datasets include that variable for the specific category, hover on the link to the relevant variable.
Note: The network view only includes a maximum of 10 nodes. To view all available nodes, please look at the drop down list under "Choose node to view relationship."
HOW-TO: TRANSLATING
If I have a spreadsheet of category names from a new dataset, how can I find the best matches from CatMapper?
On the Propose Translations page, click “Browse” and choose the spreadsheet located on your computer.
Under “Input column to match”, choose the column in the spreadsheet with the category names.
Under “Select category type”, choose the domain of categories you would like to translate.
Under “Property to search”, choose “Name”
Limiting searches by country can substantially improve specificity of proposals. To do this, make sure to include a column in the spreadsheet with the CatMapper ID for the country associated with each category.
Click on “Limit by country”
Under “Choose column with country IDs”, choose the appropriate spreadsheet column.
To limit search by other contextual factors (e.g., associated language), make sure to include a column in the spreadsheet with the CatMapper ID for the relevant contextual category
Click on “Limit by context”
Under “Choose column with context IDs”, choose the appropriate spreadsheet column.
To limit search to only those categories used by a specific dataset, make sure to include a column in the spreadsheet with the CatMapper ID for the relevant dataset
Click on “Limit by dataset”
Under “Choose column with dataset IDs”, choose the appropriate spreadsheet column.
To limit search to only those categories in a specific time range, add appropriate limits under “from” and “to” under “Time range (years)”.
Press “Search”
Proposed matches will be shown on the right. Color coding indicates non-exact matches that merit attention.
You can download a spreadsheet with proposed matches. After downloading, you can revise and curate the CatMapper-proposed matches to upload.
How can I revise and curate a downloaded spreadsheet of CatMapper-proposed matches?
The spreadsheet of CatMapper-proposed matches includes a column indicating matches that may need attention. These include cases where: (1) no match was found< (2) more than one match was found, and (3) and a name match was not perfect.
When CatMapper couldn’t automatically find a match, the user can combine outside searches with CatMapper’s explore functions to identify a matching CatMapper category.
If an appropriate match is found, the user can copy and paste the corresponding CatMapper ID into the CatMapper ID column in the spreadsheet.
If an appropriate match is not found, registered users have access to tools under “upload translations” to add a new category. After the category is created, the user can add its CatMapper ID to the CatMapper ID column in the spreadsheet.
When CatMapper automatically proposes more than one match, the user can combine outside searches with CatMapper’s explore functions to identify the best match. The user keeps the CatMapper ID for the appropriate match and erases the rows the excluded matches.
For any other match that the user decides to change, they can replace the existing CatMapper ID with the appropriate CatMapper ID.
How can I upload the spreadsheet of revised and curated matches?
Login as a registered user. If you are not already a registered user, please email admin@catmapper.org to sign up
Go to the “upload translations” page
Click “Create New Dataset” and fill out the necessary details for the dataset from which the new categories are coming. Note the Dataset ID number assigned to the new dataset.
Click “Browse” and choose the spreadsheet on your computer.
Select the domain of categories to be uploaded (e.g., ETHNICITY, ADM1…)
Input the dataset ID for the new dataset you just created. If you can’t remember it, you can find this by searching for the dataset by name in the explore function.
Choose which column contains the category names
Choose which column contains the CatMapper ID
Choose which column contains the unique ID used by the dataset for the category. This is essential for recording the Key used by the dataset for the category.
Click “Upload”. Clicking “Upload Status” will provide information about how long the upload has been running.
HOW-TO: MERGING
I would like to bring together variables from several datasets merging them by a specific category domain (e.g., by ethnicity). How do I find corresponding categories across the datasets?
Ensure the categories from each of the datasets have already been translated into CatMapper (see How-To Translation above).
Go to propose merge page
Click “Load Datasets”.
Choose the datasets you would like to find matches across. As an example, choose Berezkin Folklore and Ethnographic Atlas
Choose the category domain you would like to match by (e.g., ETHNICITY, LANGUOID…)
Choose the method and option for matching categories:
The Standard method only matches categories from different datasets that point to the same CatMapper category.
The Refine option (available in Summer 2023) allows the user to limit matches between datasets based on temporal and geospatial proximity between samples from different datasets.
The Extended option (available in Summer 2023) allows users to leverage CONTAINS ties between categories when finding matches. If there is no direct match, CatMapper proposes the closest category that can be found via CONTAINS ties.
The Extended-languages option (available Summer 2023) allows users to leverage language_of ties as well as glottolog language trees stored in CatMapper to find matches. If the Standard method and Extended option do not find a match, then CatMapper proposes the closest category that can be found via the language tree.
Click “Submit”, and a window will list the proposed matches along with the keys from each of the datasets.
Click “Download Results” to download a spreadsheet with all of the proposed matches.
The user can use the downloaded spreadsheet (with keys) as a link file for building a merge between the datasets.
I would like to download a merging template for a published dataset that someone else has already built?
Go to the search page and choose “DATASET”, “Name”.
Input the name of the dataset and record the dataset ID.
Go to the download merging template page
Under “choose merging template ID”, type the dataset ID
Click “Find Merging Template”
Click “Download list of datasets”, and open downloaded spreadsheet on your computer
Make sure all of the datasets listed in the spreadsheet are stored in a working directory folder on your computer.
Fill out the spreadsheet with the location of the working directory as well as the name and location of each dataset within the working directory.
Click “Browse” and choose the completed spreadsheet for uploading to CatMapper.
Choose syntax to output
R (available March-April 2023).
SPSS (available in Summer 2023).
Stata (available in Fall 2023).
SAS (available in Fall 2023).
Click “generate merging files”
Click “download merging files”
Store the downloaded files in the working directory.
For R, open and run the R syntax.
I would like to build and share a merging template? Coming Summer 2023.
CatMapper API User Guide
This user guide documents the available endpoints for CatMapper’s API. The api base URL is https://catmapper.org/api.
Examples can be run within a browser or through various API clients by affixing the examples to the base URL (e.g., https://catmapper.org/api/CMID?database=SocioMap&cmid=SM1).
Questions and feedback can be directed to support@catmapper.org.
API User Guide: Search Endpoint
Endpoint Description
The /search
endpoint is tailored for conducting database searches on a single or empty search term on the explore page. This endpoint accommodates searches in specific databases and can filter results based on various parameters such as domain, year range, country, and context.
HTTP Request Method
- GET
Resource URL
/search
Query Parameters
-
database: The name of the CatMapper database where the search will be conducted. Only ‘SocioMap’ and ‘ArchaMap’ are valid values.
-
term (optional): The search term. If not provided, the search will return all results.
-
property (optional): Specifies the property to search by, with options including ‘Name’, ‘CMID’, or ‘Key’.
-
domain (optional): Specifies the domain category within which the search is conducted. Default is ‘CATEGORY’.
-
yearStart (optional): The earliest year of data collection or existence of the category. Results will return if category year range intersects with this range.
-
yearEnd (optional): The latest year of data collection or existence of the category.
-
country (optional): CMID of the ADM0 node with a ‘DISTRICT_OF’ tie.
-
context (optional): CMID of the parent node in the network.
-
limit (optional): Limits the number of results returned; defaults to 10000 if not specified.
-
query (optional): If set to ‘true’, returns the cypher query instead of executing it.
Request Examples
GET /search?database=SocioMap&term=Yoruba&domain=ETHNICITY&property=Name
GET /search?database=ArchaMap&term=Grasshopper&domain=SITE&property=Name
Responses
Successful Response
-
Status Code: 200 OK
-
Content:
-
If
query
is set tofalse
, returns a JSON array of search results with fields such asCMID
,CMName
,country
,domain
,matching
, andmatchingDistance
. -
If
query
is set totrue
, returns a JSON object containing the cypher query and relevant parameters.
-
Error Response
-
Status Code: 500 Internal Server Error
-
Content: A JSON object detailing the error encountered during the search operation.
API User Guide: Retrieve CMID Details
Endpoint Description
This API endpoint (/CMID
) is designed to retrieve comprehensive details about a specific CatMapperID (CMID) from different databases. It fetches both node properties and their relationships associated with the specified CMID. The endpoint supports the GET method and requires the specification of both the database and the CMID.
HTTP Request Method
- GET
Resource URL
/CMID
Query Parameters
-
database: A string specifying which database to search in. Valid options are:
-
SocioMap: Targets the SocioMap database.
-
ArchaMap: Targets the ArchaMap database.
-
-
cmid: The CatMapperID for which details are to be retrieved. This should be a valid identifier that exists within the specified database.
Request Examples
GET /CMID?database=SocioMap&cmid=SM1
GET /CMID?database=ArchaMap&cmid=AM1
Responses
Successful Response
-
Status Code: 200 OK
-
Content: A JSON object containing detailed information about the node and its relationships. The response structure is as follows:
-
node: An array of objects, each representing a node property:
-
nodeID: The identifier of the node.
-
nodeProperties: The property name of the node.
-
nodeValues: The value associated with the node property.
-
-
relations: An object mapping relationship IDs to their properties:
- Each relationship ID will have associated properties and values.
-
Error Response
-
Status Code: 500 Internal Server Error
-
Content: A string message detailing the nature of the error, usually related to incorrect or missing parameters.
API User Guide: Retrieve Dataset Details
Endpoint Description
This API endpoint (/dataset
) is designed to retrieve detailed information about a dataset based on a given CMID (CatMapperID) from the specified database, filtering additionally by domain categories. This endpoint allows for querying dataset relations and properties within specified domains. It uses a GET method and requires specifying the database, CMID, and optionally the domain.
HTTP Request Method
- GET
Resource URL
/dataset
Query Parameters
-
database: A string identifier specifying the database from which to fetch the dataset. Accepted values are:
-
SocioMap: Targets the SocioMap database.
-
ArchaMap: Targets the ArchaMap database.
-
-
cmid: The CatMapperID of the dataset for which information is to be retrieved.
-
domain (optional): A category to filter dataset relationships. Defaults to “CATEGORY” if not specified.
Request Examples
GET /dataset?database=SocioMap&cmid=SD1&domain=CATEGORY
GET /dataset?database=ArchaMap&cmid=AD1
Responses
Successful Response
-
Status Code: 200 OK
-
Content: A JSON array of objects, each representing details of relationships and properties of datasets related to the specified CMID. Typical properties included are:
-
datasetName: The name of the dataset.
-
datasetID: The CMID of the dataset.
-
CMID: The CatMapperID of related entities.
-
CMName: The name of related entities.
-
type: The type of relationship.
-
Key: Key information of the relationship.
-
Other dynamic properties based on the dataset’s schema and the specified domain.
-
Error Response
-
Status Code: 500 Internal Server Error
-
Content: A string message indicating the nature of the error, typically related to incorrect database or CMID parameters or issues with database connections.
API User Guide: Retrieve All Datasets
Endpoint Description
This API endpoint (/allDatasets
) provides a method to retrieve detailed information about datasets from different databases. The endpoint supports a GET method that requires specifying a particular database from which to fetch datasets.
HTTP Request Method
- GET
Resource URL
/allDatasets
Query Parameters
-
database: A string identifier specifying the database from which to retrieve datasets. Accepted values are:
-
SocioMap: Retrieves datasets from the SocioMap database.
-
ArchaMap: Retrieves datasets from the ArchaMap database.
-
Request Examples
GET /allDatasets?database=SocioMap
GET /allDatasets?database=ArchaMap
Responses
Successful Response
-
Status Code: 200 OK
-
Content: An array of objects, where each object represents a dataset with the following fields:
-
nodeID: Identifier of the dataset node.
-
CMName: CatMapper name associated with the dataset.
-
CMID: CatMapper ID for the dataset.
-
shortName: A shorter, more concise name for the dataset.
-
project: The project under which the dataset was created or is maintained.
-
Unit: The unit the dataset applies to.
-
parent: The parent dataset, if any.
-
ApplicableYears: The years to which the dataset is applicable.
-
DatasetCitation: Citation information for the dataset.
-
District: The district covered by the dataset.
-
DatasetLocation: URL or other location of the dataset.
-
SubNational: Indicates if the dataset is sub-national.
-
DatasetVersion: Version information of the dataset.
-
DatasetScope: The scope of the dataset.
-
Subdistrict: The subdistrict covered by the dataset.
-
Note: Additional notes or comments about the dataset.
-
Error Response
-
Status Code: 500 Internal Server Error
-
Content: A string message describing the error, typically related to an invalid database specification or connection issues.
Please contact admin@catmapper.org if you would like to become a registered user.
Please log in
Invalid username or password!