htmlparser2 is the fastest HTML parser, and takes some shortcuts to get there. If you need strict HTML spec compliance, have a look at parse5.
A new phishing and malware distribution toolkit called MatrixPDF allows attackers to convert ordinary PDF files into interactive lures that bypass email security and redirect victims to credential ...