+++ title = "Datasets" weight = 20 +++ To debug / fuzz Aerogramme, we seek some datasets. ## Emails datasets - [stalwartlabs/mail-parser](https://github.com/stalwartlabs/mail-parser/tree/main/tests) - [basecamp/mail](https://github.com/basecamp/mail/tree/master/spec/fixtures) - [Enron dataset - 500k entries](https://www.cs.cmu.edu/~enron/) - [Jeb Bush dataset - 290k entries](https://ab21www.s3.amazonaws.com/JebBushEmails-Text.7z) - [spambase dataset](https://archive.ics.uci.edu/ml/datasets/spambase) (also contains legit emails) - mailing lists - [W3C](https://lists.w3.org/Archives/Public/) - [Wikimedia](https://lists.wikimedia.org/hyperkitty/) - [Apache](https://commons.apache.org/mail-lists.html) - [tomcat](https://lists.apache.org/list.html?dev@tomcat.apache.org), [kafka](https://lists.apache.org/list.html?dev@kafka.apache.org). - [Linux](https://marc.info/?l=linux-kernel) - your own inbox