A list of digital tools, cribbed from a bunch of resources (5) and put together. Created in collaboration with Dr. Greg Elmer.
This document is organized according to the sort of flow that a digital methods research project would undertake. If you’re crunched for time, your best bet is probably to search for a keyword that you’re looking for (if you’re reading this in a browser, something like Ctrl+F
or CMD+F
should pull up a search box; if you’re reading this outside of a browser somehow, you probably know how to grep for text).
https://github.com/digitalmethodsinitiative/arpg
This PHP script allows you to enter a (set of) ASIN(s) and crawl its recommendations up til a user-specified depth.
http://tools.digitalmethods.net/beta/amazon
Provides different analytics for Amazon.com’s book search
https://tools.digitalmethods.net/beta/appTrackers
The DMI’s App Tracker Tracker is a tool to detect a set of predefined fingerprints of known tracking technologies or other software libaries.
.csv
Gethttps://github.com/fizx/csvget/tree/master
Scrape elements from a website and generate a .csv file.
Use: grab select data like headlines, categories, etc.
http://tools.digitalmethods.net/beta/disqusScraper
This tool scrapes threads and comments from websites implementing the http://www.disqus.com/Disqus commenting system.
http://tools.digitalmethods.net/beta/githubOrgsLaunch
Extract the meta-data of organizations on Github
http://tools.digitalmethods.net/beta/githubReposMeta/
Extract the meta-data of Github repositories
https://tools.digitalmethods.net/beta/githubRepos/
Scrape Github for forks of projects
https://tools.digitalmethods.net/beta/github/
Scrape Github for user interactions and user to repository relations
https://tools.digitalmethods.net/beta/githubUserMeta/
Extract meta-data about users on Github
https://tools.digitalmethods.net/beta/githubContributors/
Find out which users contributed source code to Github repositories
http://tools.digitalmethods.net/beta/scrapeGoogle/autocomplete.php
Retrieves autocomplete suggestions from Google
http://tools.digitalmethods.net/beta/googleImages
Query images.google.com with one or more keywords, and/or use images.google.com to query specific sites for images.
http://tools.digitalmethods.net/beta/googlePlaySimilar
DMI Google Play Similar Apps is a simple tool to extract the details of individual apps, collect ‘Similar’ apps, and extract their details.
http://tools.digitalmethods.net/beta/googleReverseImages
Scrape Google for occurance of images
http://tools.digitalmethods.net/beta/searchEngineScraper/
Batch queries Google. Query the resonance of a particular term, or a series of terms, in a set of Websites.
http://tools.digitalmethods.net/beta/imagesDeep
Scrape images from a single page.
https://tools.digitalmethods.net/beta/instagramLoader/
Easily scrape images from Instagram based on hashtag, location, or user data. If the website asks you for a login, try from a different internet connection.
https://tools.digitalmethods.net/beta/internetArchiveWaybackMachineLinkRipper
Scrapes links from the Wayback Machine
https://tools.digitalmethods.net/beta/waybackNetworkPerYear/
Enter a set of URLs and the archived versions closest to 1 July for a specific year are retrieved. Thereafter links are extracted and a network file is output.
http://tools.digitalmethods.net/beta/itunesStore
Query the iTunes store and grab both tabular and .gdf
data regarding results.
https://tools.digitalmethods.net/beta/newsAgencies/
Basic scraper for various news agencies for particular keywords and extract titles, images, dates and full text.
An all-in-one solution for scraping websites, including the ability to scrape platform pages. Closed source, paid, and requires a sign-up, although the website offers a 14-day demo trial.
Using Octoparse for Instagram
Octoparse provides a tutorial for scraping Instagram. It can be found on their website.
https://tools.digitalmethods.net/beta/searchEngineScraper/
A browser extension that allows you to build scrapers, scrape websites, and export data in .csv format. Closed-source, but the browser extension is free.
http://tools.digitalmethods.net/beta/wikitoc/
Scrape Table of Contents for revisions of a wikipedia page and explore the results by moving a slider to browse across chronologically ordered TOCs.
https://tools.digitalmethods.net/beta/wikipediaCategoryAnalysis
Scrape Wikipedia for the categories of articles and the categories of related articles in different languages.
http://tools.digitalmethods.net/beta/wikipedia2geo/
Scrapes Wikipedia history and does IP to Geo for anonymous edits
https://github.com/philbot9/youtube-comment-scraper-cli
Scrape comments from YouTube pages.
Use: uh… scrape comments from YouTube pages.
Create datasets from webforums such as 4chan and Reddit and perform textual analysis on the resulting datasets. Login required.
http://tools.digitalmethods.net/beta/proxies
Check whether a URL is censored in a particular country by using proxies located around the world.
http://tools.digitalmethods.net/beta/expandTinyUrls/
Expands URLs that have been shortened by tools like tinyurl.com or bit.ly.
http://tools.digitalmethods.net/beta/geoIP/
Translates URLs or IP addresses into geographical locations
Login required; contact me at ab {at} anthbrtn.com
An instance of the University of Amsterdam’s Twitter Capture and Analysis toolkit accessible to Ryerson students.
http://tools.digitalmethods.net/beta/linkRipper/
Capture all internal links and/or outlinks from a page.
http://tools.digitalmethods.net/robots
Display a site’s robot exclusion policy.
http://tools.digitalmethods.net/beta/screenshotGenerator
Produce screenshots for a list of URLs
http://tools.digitalmethods.net/beta/sourceCodeSearch
loads a URL and searches for patterns in the page’s source code
http://tools.digitalmethods.net/beta/textRipper
Rip all non-html (i.e. text) from a specified page.
http://tools.digitalmethods.net/beta/timestamp
Rips and displays a web page’s last modification date (using the page’s HTML header). Beware of dynamically generated pages, where the date stamps will be the time of retrieval.
http://tools.digitalmethods.net/beta/triangulate/
Enter two or more lists of URLs or other items to discover commonalities among them. Possible visualizations include a Venn Diagram.
https://tools.digitalmethods.net/netvizz/tumblr/Launch
Analyze co-hashtags and other basic text information from Tumblr posts.
https://twxplorer.knightlab.com/
Search recent tweets and analyze them.
Use: if you want a quick analysis that the TCAT doesn’t provide.
http://tools.digitalmethods.net/beta/wikipediaCrosslingualImageAnalysis
Makes the images of all language versions of a Wikipedia article comparable.
http://tools.digitalmethods.net/beta/wikipediaEntryCheck/
This tool checks if the issues exist as a Wikipedia page, i.e., an article. If it exists it checks whether the organization is mentioned on that page.
https://tools.digitalmethods.net/netvizz/youtube/
A collection of simple tools for extracting data from the YouTube platform via the YouTube API v3.
http://www.secretgeek.net/agnes/twoWay.html
Convert .csv
data to .json
and vice-versa.
Use: much API data is returned as .json
-formatted files.
.csv
to Tablehttps://github.com/vividvilla/csvtotable
Convert .csv
files to searchable and sortable HTML table.
Use: visualize and analyze data formatted in .csv
https://github.com/eBay/tsv-utils
Analyze data saved with tab
delimiters, as opposed to the standard comma
. Yes, it’s that eBay.
Use: perform maintenance and reading on tab
-separated files.
https://github.com/python-mario/mario
Gain access to the Python programming language’s variety of tools and libraries to perform analysis on .csv
, .json
, .html
files and more.
Use: pretty much any analysis and conversion under the sun; it’s a powerful toolkit but requires reading the documentation to figure out your own use-case.
A Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
Use: analyze graphs and networks and return them using python.
A web-based data management, network analysis & visualisation environment.
Use: an all-in-one suite for analyzing, managing and graphing data.
https://www.papaparse.com/demo
A browser-based tool that allows you to parse and analyze .csv
data.
Use: look for basic patterns and characteristics of a .csv
.
https://github.com/BurntSushi/xsv
A command-line toolkit to analyze and investigate .csv files.
Use: easily find out things like frequencies of data, different values, and correlations.
https://github.com/antonycourtney/tad
Tad is a desktop application for viewing and analyzing tabular data such as .csv
files.
Use: easily create “pivot tables” to analyze your data, among other csv functions.
https://github.com/TAPoR-3-Tools/Tapor-Coding-Tools
A collection of coding tools, mostly in python, to analyze text.
Use: worth exploring to find programming examples for the analysis of text. Many use-cases in the repository.
.csv
Editorhttps://www.ronsplace.eu/products/ronseditor
Deal with massive .csv files easily.
Use: organize, read, and analyze .csv files that would normally crash a spreadsheet program.
https://github.com/faradayio/scrubcsv
Remove bad lines from a .csv file and normalize the rest.
Use: sometimes .csv files exported from SQL databases have errors; many tools here, such as the YouTube data tools and the Twitter Capture and Analysis toolkit are exported as such. This tool discards those error-ridden rows and allows you to read the files.
A tool for cleaning data; transforming it from one format into another; and extending it with web services and external data. OpenRefine can be used to scrape data from websites or convert data between formats. It also makes it easy to save the processing steps to a file that can be loaded back into the tool at a later time, making it easy to repeat the process again on a different set of data.
http://tools.digitalmethods.net/beta/bubbleline/
Input tags and values to produce relatively sized bubbles. Output is an svg.
https://folk.uib.no/nfylk/concordle/
Generate word clouds and see word correlations in a given text. Calls itself the “not-so-pretty cousin of Wordle” (below).
Use: basic text analysis of word frequencies, along with visualization.
http://tools.medialab.sciences-po.fr/iwanthue/
Generate and refine palettes of optimally distinct colors. (by Sciences-Po)
Datawrapper allows users to create a variety of basic charts and graphs using submitted tabular data.
http://tools.digitalmethods.net/beta/deduplicate
Replicates the tags in a tag cloud by their value
http://tools.digitalmethods.net/beta/dorling/
Input tags and values to produce a Dorling Map (i.e. bubbles). Output is an svg.
http://hyperstudio.mit.edu/software/chronos-timeline/ Chronos allows scholars and students to dynamically present historical data in a flexible online environment.
http://tools.digitalmethods.net/beta/lippmannianDeviceToGephi
This tool allows one to visualize the output of the Lippmannian device as a network with Gephi.
http://tools.digitalmethods.net/beta/tagcloud/
Takes raw text, counts the words and returns an ordered, unordered or alphabetically ordered tagcloud.
Rawgraphs is an online tabular data processing program that allows users to create advanced charts and graphs using submitted tabular data.
https://scene.knightlab.com/ Create a multimedia story told through 3D “VR” tools.
https://densitydesign.github.io/strumentalia-seealsology/
Create a graph out of the “see also” networks between given Wikipedia pages.
A collection of free, open-source web widgets, mostly for data visualizations.
http://soundcite.knightlab.com/
Stitch together audio from various sources and embed it within a readable text.
http://storyline.knightlab.com/
Easy-to-use tool to build an annotated, interactive line chart.
http://storymap.knightlab.com/
Create a narrative, sequential story that moves through locations on a map.
http://tools.medialab.sciences-po.fr/table2net/
Extract a network from a table. Set a column for nodes and a column for edges. It deals with multiple items per cell. (by Médialab Sciences-Po)
http://tools.digitalmethods.net/beta/tagCloudCombinator
Enter two or more tag clouds and the values of each tag will be summed.
http://tools.digitalmethods.net/beta/svgcloud/
Input tags and values to produce a tag cloud. Output is in SVG.
http://labs.polsys.net/tools/visual/tagcloud/
Input tags and values in wordle format to produce a HTML tag cloud or tag list.
http://tools.digitalmethods.net/beta/tagcloudToWordle/
This tool allows one to transform a normal tag cloud into a fancy Wordle one.
A web-based timeline builder
http://timeline.knightlab.com/
Create a visually-appealing annotated timeline.
A tool for creating timelines which can be added to a website or blog.
https://umap.openstreetmap.fr/en/
Create resuable, static, embeddable maps from OpenStreetMap data.
A platform that helps you create customized “views” such as interactive maps and timelines.
An interactive, command-line tool for analyzing and visualizing tabular data.
Use: get quick visualizations and perform other data-scientific methods on tabular data.
Generate word clouds (clouds of words that size the words based on frequency) for a given text.
Use: visualize frequency of words in a given corpus.
http://tools.digitalmethods.net/beta/analyse
Compare two lists of URLs for their commonalities and differences.
A iPython notebook that walks the user through performing complex sentiment analysis of passages like Tweets for sentiment analysis. You can download the iPython notebook and run it yourself (which requires Jupyter lab, linked in the previous sentence), or read the text for an example.
Use: learn how to use python for sentiment analysis; perform sentiment analysis on texts.
Gephi is a visualization and exploration software for all kinds of graphs and networks.
Use: analyze the .gdf
and .gxml
files returned by many scraping and collection tools. The most robust tool available, but sometimes slow and hard to configure; an online alternative is Polinode, below.
http://tools.digitalmethods.net/beta/harvestUrls/
Extract URLs from text, source code or search engine results. Produces a clean list of URLs.
http://juxtapose.knightlab.com/
Easily compare two images within a frame.
http://tools.digitalmethods.net/beta/text_cat/
Detects language for given URLs. The first 1000 characters on the Web page(s) are extracted, and the language of each page is detected.
https://tools.digitalmethods.net/beta/lippmannianDevice/
The Lippmannian device is named Walter Lippmann, and provides a coarse means of showing actor partisanship.
NodeXL is a plugin to Microsoft Excel that allows you to visualize and analyze data beyond what the program has normally built in.
Use: visualize and analyze data using Microsoft Excel (although for a faster, lighter, and free alternative, see LibreOffice).
http://hdlab.stanford.edu/palladio/
Various analyses of historical data in tabular format.
Login required
Polinode is an online tool that allows for the opening and basic manipulation of .gdf
files.
Use: analyze the .gdf
and .gxml
files returned by many scraping and collection tools. An online tool that is not as powerful as Gephi, above, but easier to understand and get started with.
http://tools.digitalmethods.net/beta/sentences
Rip text from a specified page and force line breaks between sentences.
https://medialab.github.io/table2net/
Parse tabular data for relationships and convert into a table.
http://tools.digitalmethods.net/beta/tldCounts/
Enter URLS, and count the top level domains.
A web-based tool that provides text reading and basic analysis based on copy-pasted text.
http://digitalhumanities.unc.edu/resources/tools/
https://digitalhumanities.duke.edu/tools
https://github.com/dh-tech/awesome-dhtools
https://wiki.digitalmethods.net/Dmi/ToolDatabase
http://tools.medialab.sciences-po.fr/