Researcher
There are innumerable books published about every topic imaginable concerning the Bible. In the world of data, books and other similar documents (including videos, magazine or news articles, emails, tweets, and text messages) are known as "unstructured data". Computers generally have a hard time understanding or analyzing unstructured data. "Structured data", on the other hand, is defined in a way that computers can easily use. Think of structured data as information organized in a spreadsheet with specific kinds of information available in defined columns. (E.g., all birth dates are in a "birth date" column, and all of the first names are in a "first name" column, etc.)
As I began writing my second book, I needed some data from the Bible, but I was surprised to find that almost none of the information contained in the Scriptures was available as machine-readable "structured data". I couldn't do the analysis because the data just wasn't available.
Having reached "middle age", instead of having the cliche "mid-life crisis", I'd discovered a "mid-life passion": researching the information contained in the Bible and making it available to others as structured data for their studies and projects.
Published Resources
I started with some Christian reference books that are in the public domain, including:
My next challenge was to build a complete dataset of ancient Scriptural texts, including the Leningrad Codex, Codex Alexandrinus, the Samaritan Pentateuch, and Targum Onkelos, along with English translations. This dataset became:
While this was an enjoyable project, I craved something more substantial and more detailed from within Scripture itself. I started a collection of datasets focused on the "who", "what", "when", and "where" details of the Bible. Most of these are still works in progress, but the collection has become:
Stephenson's Bible Data
- Bible Data- Book (Data about each book of the Bible)
- Bible Data- Reference (Data about the chapters and verses within each book)
- Bible Data- Hebrew Words (A dataset of the Hebrew and Aramaic words of the Bible as organized by James Strong in his Hebrew and Chaldee Dictionary)
- Bible Data- Commandments (Data about the traditionally enumerated 613 commandments of the Bible)
- Bible Data- Greek Words (a yet-to-be-started dataset of the Greek words of the Bible)
- Bible Data- Person (A dataset listing each named person in the Bible)
- Bible Data- Person Label (Data about the labels, names, nicknames, and titles for each person)
- Bible Data- Person Relationship (A dataset with relationships like father, mother, son, daughter, husband, wife, master, servant, etc.)
- Bible Data- Person Verse (Data about the specific person named in every verse of the Bible)
- Bible Data- Person Verse Tanakh (Recognizing that Jewish students and scholars would not be interested in Christian literature, this is the Person Verse dataset focused exclusively on the Hebrew Scriptures.)
- Bible Data Person Verse Apostolic Writings (Christian students and scholars might want to narrow their focus to these Scriptures.)
- Bible Data- Event (A dataset of events that occurred in the Bible- primarily focused on the birth and death of individuals but also including significant events like Creation, the Flood, and the Exodus.)
- Bible Data- Epoch (Data about periods of time in Scripture: the life of Abraham, the 40-year wandering of Israel in the desert, the reign of King Saul, King David, and King Solomon, etc.)
All of the links above are to the data.world version of the datasets. If you prefer to use GitHub, you can access the same datasets there.
If you have questions about any of these datasets or would like to collaborate on them, please contact me!