Welcome to 
ArchaMap is actively looking for contributors. See the help page for instructions. Contact admin@catmapper.org for questions or to contribute.
ArchaMap is an open-source tool designed to aid in the integration of multiple complex data sets with different sources, data ontologies, and resolutions. ArchaMap is designed to save time, increase consistency, and document complex data merging processes among multiple sources. This application stores and suggests past translations to build an ever-expanding list of associations to aid in connecting categorical data across different sources. ArchaMap uses previously uploaded categories to build a database of potential category names and includes contextual information to help users find the appropriate match.
ArchaMap provides substantial savings in time for integrating datasets. Furthermore, every dataset that is integrated provides additional alternate names to further improve matching and provides a permanent record showing how the datasets were integrated. This follows open science principles and reduces work for future researchers. An additional benefit is that each category is linked to a dataset. ArchaMap has an explore feature that allows users to find categories through a full text search and then identify all datasets that use that category. This can help tie datasets whether they be publications, online datasets, or archived collections.
More information can be found in the following citation. You may also use this citation to reference CatMapper and CatMapper applications.
Hruschka, Daniel J., Robert Bischoff, Matt Peeples, Sharon Hsiao, and Mohamed Sarwat
2022 CatMapper: A User-Friendly Tool for Integrating Data across Complex Categories. SocArXiv Papers.
Early development of CatMapper has been supported by:
- ASU’s Institute for Social Science Research seed grant
- School for Human Evolution and Social Change interdisciplinary research grant
- National Science Foundation (BCS-2051369) through the Human Networks and Data Science and Cultural Anthropology programs.
- Arizona State University's Center for Archaeology and Society
An authorized user is required to upload data
Please contact admin@catmapper.org if you would like to become a registered user.
Build Dataset Stack
Choose datasets to stack
Under development
Automated Merge Process
Upload two datasets to merge. Both datasets must have a `datasetID` column with a valid CMID for each row. Both datasets must have the original `Key` columns specified in the database translation that was previously uploaded to the dataset with the matching CMID. If you have not yet translated and uploaded your dataset, please do so now.
Create Dataset
Specify Stacks
Choose domain
Upload Equivalences
If you have any questions, please contact us at help@catmapper.org.
Application version 0.2.11
CatMapper Manual
Contents
PREAMBLE
This manual includes basic introductory information on CatMapper as well as how-to guides on how to use CatMapper’s main functions.
INTRODUCTION
Objectives. CatMapper assists users in:
Exploring where data is available for complex, evolving categories commonly used in the social sciences (e.g., ethnic, religious, language, geospatial, and archaeological categories). For example, where can I find data on speakers of Yoruba or people who identify as Yoruba or followers of Isese, the Yoruba religion.
Translating categories from new datasets to categories already stored in CatMapper.
Merging data across diverse, external datasets by these complex categories.
Documenting and Sharing their translations and merges so that other users can check and re-use their work.
Apps. CatMapper currently includes two apps aimed at organizing two kinds of categories. SocioMap organizes sociopolitical categories, such as ethnicities, religions, languages, and administrative districts. ArchaMap organizes categories of material objects used in archaeology, including sites, ceramic types, lithic and projectile point types, and faunal types. Our hope in the future is to extend CatMapper’s capabilities to other classes of complex, dynamic categories.
Users. CatMapper accommodates three kinds of users:
Casual Users can use CatMapper’s explore and translate functions to find potential category matches, and find well-sourced and reliable contextual data on categories. They do not need to login. However, they cannot change the database.
Registered Users can upload new proposed matches to CatMapper, and by May 2023, they will also be able to generate and share merging templates from published work. Please email us at admin@catmapper.org if you would like to sign up as a registered user.
Administrative Users have additional access to a range of tools for fixing database errors by directly changing the structure and content of the graph database (e.g., editing tie and node information, moving ties, and merging nodes).
Key Concepts. CatMapper stores information on categories (Table 1), how they are related to each other, and how they are encoded by diverse datasets. CatMapper stores contextual information about categories through a range of ties (e.g., contains, district_of, language_of, religion_of) (Table 2).
Table 1. Current CatMapper Category Domains
App | Primary Domain | Subdomains |
---|---|---|
SocioMap | ETHNICITY | |
SocioMap | LANGUOID | LANGUAGE, DIALECT, FAMILY |
SocioMap | RELIGION | |
ArchaMap | CERAMIC | CERAMIC_TYPE, CERAMIC_WARE |
ArchaMap | PROJECTILE_POINT | PROJECTILE_POINT_CLUSTER, PROJECTILE_POINT_TYPE |
ArchaMap | PERIOD | |
All | DISTRICT | ADM0-ADM4, ADME, ADMD, ADMX, PPL, SITE, REGION |
All | GENERIC |
Table 2. Ties that store contextual information about categories
Tie | Description |
---|---|
X CONTAINS Y | Y is a sub-category of X |
X DISTRICT_OF Y | X is a geospatial locale for Y |
X LANGUAGE_OF Y | X is a language associated with Y |
X RELIGION_OF Y | X is a religion associated with Y |
X USES Y | Dataset X uses the category Y |
When a dataset uses a specific category, there is a USES tie from the dataset to the category. This USES tie records: (1) how the dataset encodes that category (i.e., name, key), (2) claims that the dataset makes about the category (e.g., geospatial location, population estimate, associated languages and religions, other categories it contains or is contained by). A key specifies how a specific dataset uniquely encodes a specific category. A simple key involves a single variable and value (e.g., V131 = 3). More complex keys can involve combinations of variables and values (e.g., V131 = 3 AND V024 = 1).
A category set is the set of categories in a specific domain encoded by a specific dataset (e.g., ethnicities coded in DHS Guatemala 1995 survey, language spoken coded in WVS Cote D’Ivoire). A merging set is a category set for a dataset that resulted from a merge of multiple datasets. For a clearly defined merge, there must be only many-to-one and one-to-one links from each of the datasets’ category sets to the merging set.
A merging template stores all information necessary to create files and scripts for reproducible merging across a specific set of external datasets, including (1) category sets used in each dataset, (2) variables names, variable values and value labels used by each dataset to encode categories, (3) the merging set for the final merged dataset, (4) translations between categories in each external dataset with categories in the merging set, and (5) the operators for aggregating relevant variables over many-to-one mappings from external datasets to the merging set.
A key part of a merging template are stacks. A stack is a dataset which is joined to other datasets during a merge. A stack may be created by appending multiple datasets.
To support merging, CatMapper also stores metadata on variables used in stored merging templates and how these variables are encoded in relevant datasets.
Main Pages. In each app, there are three sets of pages corresponding with three functions: (1) explore, (2) translate, and (3) merge.
For exploring, a search page allows users to search for categories, datasets, and variables stored in CatMapper. A user can then navigate from a specific search result to a view page which provides additional information on the search result.
For translating, a propose translation page allows users to upload a spreadsheet with categories from a new dataset, to request proposed best matches that already exist in CatMapper, and to view and download those matches. An upload translation page allows registered users to upload final curated matches to CatMapper.
For merging, a propose merge page allows users to request proposed matches between categories from different datasets. An upload merge page allows users to build and upload key components of a merge to completely specify a merging template. A download merging template page allows users to download the files to replicate a merge specific by an uploaded merging template.
HOW-TO: EXPLORING
How can I find contextual information on ethnicities, languages, districts, religions, and datasets?
Click on the Explore tab and choose Search.
Under Select category type, choose which type of category (e.g., ethnicity, language, district, religion) you would like to search for. ADM0-ADM4 indicate districts at specific levels of the administrative hierarchy (with ADM0 = countries, and ADM1 = 1st level administrative subdistricts).
Choose “Name” as property to search
Under enter search term box, enter the category name you are searching for.
If you would like to limit your search to a particular country, under Optional Filters, choose that country in the Country box.
Once you press the Search button, a set of search results will appear below.
Click on the search result you would like to explore, and a View window will open.
This page will include contextual information about the category, including (1) relevant countries and languages, (2) datasets containing the category with info on population estimates, sample size, geospatial location, and name used by the dataset.
8. As an exercise, try searching for the ethnic groups, Yoruba or Pashtun.
How can I identify which datasets include information on a specific ethnicity, language, district, or religion?
The Sample section of the View page contains a row for each dataset that contains information on a specific category. In some cases, a dataset may contain information on that category from different places or times. In those cases, there may be multiple rows for the same ethnicity from the same dataset.
Alternatively, In the Network Explorer box at the bottom of the View window, choose "USES" in the drop down menu labeled "Choose Relationship to View"
This will show links to all datasets with data relevant to the category.
To explore metadata for a dataset of interest, select the relevant node and click the View Selected Node button. The View page for that dataset will appear.
Note: The Network Explorer only includes a maximum of 10 nodes. To view all available nodes, please look at the drop down list under "Choose node to view relationship."
How can I see how a dataset encodes a specific ethnicity, language, district, or religion?
In the Network Explorer box at the bottom of the View window, choose USES in the drop down menu labeled "Choose Relationship to View"
This will show links to all datasets with data relevant to the category.
Hover over the link to the relevant dataset. It will show the variable and variable value used to encode that category.
How can I identify and explore ethnicities, languages, districts and religions that are related to a specific category?
1. In the Network Explorer at the bottom of the View window, the drop down menu labeled "Choose Relationship to View" gives you the choice of exploring CONTAINS ties (i.e. Maya contains Kiche Maya), LANGUAGE_OF ties, DISTRICT_OF ties, and USES ties which describe how different datasets encode the category
2. Once you choose the kind of tie you'd like to explore, the network will be shown in the box.
3. If you click on a node of interest and click the View Selected Node button, the View page for that node will appear.
4. Note: The Network Explorer only includes a maximum of 10 nodes. To view all available nodes, please look at the drop down list under "Choose node to view relationship."
How can I view the entire set of datasets with encodings stored in SocioMap?
Click on the Explore tab and choose Search.
Choose DATASET under Select category type.
Choose “Name” as property to search
Leave Enter search term box blank.
If you would like to limit your search to a particular country, choose that country under the Country dropdown menu (under Optional Filters).
Once you press the Search button, a set of search results will appear below.
How can I see what variables are available for a specific ethnicity, language, district, religion?
In the Network Explorer box at the bottom of the View window, choose USES ties
Click on a dataset that uses the category, and
choose "Variables" in the drop down menu labeled "Choose Relationship to View"
This will show links to all variables that are available for a specific category.
To view which datasets include that variable for the specific category, hover on the link to the relevant variable.
Note: The network view only includes a maximum of 10 nodes. To view all available nodes, please look at the drop down list under "Choose node to view relationship."
HOW-TO: TRANSLATING
If I have a spreadsheet of category names from a new dataset, how can I find the best matches from CatMapper?
On the Propose Translations page, click “Browse” and choose the spreadsheet located on your computer.
Under “Input column to match”, choose the column in the spreadsheet with the category names.
Under “Select category type”, choose the domain of categories you would like to translate.
Under “Property to search”, choose “Name”
Limiting searches by country can substantially improve specificity of proposals. To do this, make sure to include a column in the spreadsheet with the CatMapper ID for the country associated with each category.
Click on “Limit by country”
Under “Choose column with country IDs”, choose the appropriate spreadsheet column.
To limit search by other contextual factors (e.g., associated language), make sure to include a column in the spreadsheet with the CatMapper ID for the relevant contextual category
Click on “Limit by context”
Under “Choose column with context IDs”, choose the appropriate spreadsheet column.
To limit search to only those categories used by a specific dataset, make sure to include a column in the spreadsheet with the CatMapper ID for the relevant dataset
Click on “Limit by dataset”
Under “Choose column with dataset IDs”, choose the appropriate spreadsheet column.
To limit search to only those categories in a specific time range, add appropriate limits under “from” and “to” under “Time range (years)”.
Press “Search”
Proposed matches will be shown on the right. Color coding indicates non-exact matches that merit attention.
You can download a spreadsheet with proposed matches. After downloading, you can revise and curate the CatMapper-proposed matches to upload.
How can I revise and curate a downloaded spreadsheet of CatMapper-proposed matches?
The spreadsheet of CatMapper-proposed matches includes a column indicating matches that may need attention. These include cases where: (1) no match was found< (2) more than one match was found, and (3) and a name match was not perfect.
When CatMapper couldn’t automatically find a match, the user can combine outside searches with CatMapper’s explore functions to identify a matching CatMapper category.
If an appropriate match is found, the user can copy and paste the corresponding CatMapper ID into the CatMapper ID column in the spreadsheet.
If an appropriate match is not found, registered users have access to tools under “upload translations” to add a new category. After the category is created, the user can add its CatMapper ID to the CatMapper ID column in the spreadsheet.
When CatMapper automatically proposes more than one match, the user can combine outside searches with CatMapper’s explore functions to identify the best match. The user keeps the CatMapper ID for the appropriate match and erases the rows the excluded matches.
For any other match that the user decides to change, they can replace the existing CatMapper ID with the appropriate CatMapper ID.
How can I upload the spreadsheet of revised and curated matches?
Login as a registered user. If you are not already a registered user, please email admin@catmapper.org to sign up
Go to the “upload translations” page
Click “Create New Dataset” and fill out the necessary details for the dataset from which the new categories are coming. Note the Dataset ID number assigned to the new dataset.
Click “Browse” and choose the spreadsheet on your computer.
Select the domain of categories to be uploaded (e.g., ETHNICITY, ADM1…)
Input the dataset ID for the new dataset you just created. If you can’t remember it, you can find this by searching for the dataset by name in the explore function.
Choose which column contains the category names
Choose which column contains the CatMapper ID
Choose which column contains the unique ID used by the dataset for the category. This is essential for recording the Key used by the dataset for the category.
Click “Upload”. Clicking “Upload Status” will provide information about how long the upload has been running.
HOW-TO: MERGING
I would like to bring together variables from several datasets merging them by a specific category domain (e.g., by ethnicity). How do I find corresponding categories across the datasets?
Ensure the categories from each of the datasets have already been translated into CatMapper (see How-To Translation above).
Go to propose merge page
Click “Load Datasets”.
Choose the datasets you would like to find matches across. As an example, choose Berezkin Folklore and Ethnographic Atlas
Choose the category domain you would like to match by (e.g., ETHNICITY, LANGUOID…)
Choose the method and option for matching categories:
The Standard method only matches categories from different datasets that point to the same CatMapper category.
The Refine option (available in Summer 2023) allows the user to limit matches between datasets based on temporal and geospatial proximity between samples from different datasets.
The Extended option (available in Summer 2023) allows users to leverage CONTAINS ties between categories when finding matches. If there is no direct match, CatMapper proposes the closest category that can be found via CONTAINS ties.
The Extended-languages option (available Summer 2023) allows users to leverage language_of ties as well as glottolog language trees stored in CatMapper to find matches. If the Standard method and Extended option do not find a match, then CatMapper proposes the closest category that can be found via the language tree.
Click “Submit”, and a window will list the proposed matches along with the keys from each of the datasets.
Click “Download Results” to download a spreadsheet with all of the proposed matches.
The user can use the downloaded spreadsheet (with keys) as a link file for building a merge between the datasets.
I would like to download a merging template for a published dataset that someone else has already built?
Go to the search page and choose “DATASET”, “Name”.
Input the name of the dataset and record the dataset ID.
Go to the download merging template page
Under “choose merging template ID”, type the dataset ID
Click “Find Merging Template”
Click “Download list of datasets”, and open downloaded spreadsheet on your computer
Make sure all of the datasets listed in the spreadsheet are stored in a working directory folder on your computer.
Fill out the spreadsheet with the location of the working directory as well as the name and location of each dataset within the working directory.
Click “Browse” and choose the completed spreadsheet for uploading to CatMapper.
Choose syntax to output
R (available March-April 2023).
SPSS (available in Summer 2023).
Stata (available in Fall 2023).
SAS (available in Fall 2023).
Click “generate merging files”
Click “download merging files”
Store the downloaded files in the working directory.
For R, open and run the R syntax.
I would like to build and share a merging template? Coming Summer 2023.
Please contact admin@catmapper.org if you would like to become a registered user.
Please log in
Invalid username or password!