21 lines
927 B
Markdown
21 lines
927 B
Markdown
|
+++
|
||
|
title = "Datasets"
|
||
|
weight = 20
|
||
|
+++
|
||
|
|
||
|
To debug / fuzz Aerogramme, we seek some datasets.
|
||
|
|
||
|
## Emails datasets
|
||
|
|
||
|
- [stalwartlabs/mail-parser](https://github.com/stalwartlabs/mail-parser/tree/main/tests)
|
||
|
- [basecamp/mail](https://github.com/basecamp/mail/tree/master/spec/fixtures)
|
||
|
- [Enron dataset - 500k entries](https://www.cs.cmu.edu/~enron/)
|
||
|
- [Jeb Bush dataset - 290k entries](https://ab21www.s3.amazonaws.com/JebBushEmails-Text.7z)
|
||
|
- [spambase dataset](https://archive.ics.uci.edu/ml/datasets/spambase) (also contains legit emails)
|
||
|
- mailing lists
|
||
|
- [W3C](https://lists.w3.org/Archives/Public/)
|
||
|
- [Wikimedia](https://lists.wikimedia.org/hyperkitty/)
|
||
|
- [Apache](https://commons.apache.org/mail-lists.html) - [tomcat](https://lists.apache.org/list.html?dev@tomcat.apache.org), [kafka](https://lists.apache.org/list.html?dev@kafka.apache.org).
|
||
|
- [Linux](https://marc.info/?l=linux-kernel)
|
||
|
- your own inbox
|