About the Texts
The texts in To The Reader have been extracted from the EEBO-TCP dataset. We used a combination of the paratext’s position within its book, and the XML attributes assigned to each section, to determine selection. These texts have been transcribed by hand from the images of the Early English Books Online database, and so while they are generally very accurate, omissions have been made where text was illegible in the source image. Such omissions are marked by the placeholder characters • and ▪.
We have added the following metadata to these texts according to the aims of the project:
- Normalized author name (cleaned up from EEBO-TCP’s metadata)
- Language (identified using the fast.ai library)
- USTC Subject
- Normalized printer name (derived from the STC index of printers)
- Normalized publisher name (also derived from the STC index of printers)
Texts are placed into one of eleven categories – a system that reflects the main groups of paratexts we found in EEBO-TCP. These are buckets into which we have sorted the thousands of ‘div types’, used by EEBO-TCP to label sections of books. You can read more about this method in greater detail in James Misson and Devani Singh, ‘Computing Book Parts with EEBO-TCP’, Book History, 25, pp. 503-529
Many of the errata in EEBO-TCP consist only of tables or lists. As To The Reader focuses on prose paratexts, we have omitted such errata, including only those accompanied by a prose text.