Multipart Email Messages

i was messing around with building emails and extracting information from emails a couple weeks ago, but i never got around to writing out what they were all about.  It's rather annoying and important, so i figure that i should do this before i forget and spend an ass amount of time remembering what i was     doing.

One of the applications i've writen here at widen has been a service that forwards email from one box to a list of recipients.  It'll grab all the emails in a POP3 account, filter out the garbage, extract all the information, and send that information onto recipients.  Grabbing the emails is pretty easy as indicated on this site: http://www.javaworld.com/javaworld/jw-10-2001/jw-1026-javamail-p2.html

this website helped me out quite a bit, but they neglected to talk about message types...  here's their code for it:


    // -- Get the message part (i.e. the message itself) --
    Part messagePart=message;
    Object content=messagePart.getContent();

    // -- or its first body part if it is a multipart message --
    if (content instanceof Multipart)
    {
        messagePart=((Multipart)content).getBodyPart(0);
        System.out.println("[ Multipart Message ]");
    }

    // -- Get the content type --
    String contentType=messagePart.getContentType();

    // -- If the content is plain text, we can print it --
    System.out.println("CONTENT:"+contentType);

    if (contentType.startsWith("text/plain") || contentType.startsWith("text/html"))
    {
        InputStream is = messagePart.getInputStream();

        BufferedReader reader = new BufferedReader(new InputStreamReader(is));
        String thisLine=reader.readLine();

        while (thisLine!=null)
        {
            System.out.println(thisLine);
            thisLine=reader.readLine();
        }
    }

it grabs the content.  if the content happens to be a multipart, it will grab the 0 index of that multipart, and use that as the message.  This will return the message in plain text, but i want to send off the email in it's original state which is almost always HTML based.

I looked around on wikipedia and used eclipse's debugger quite a bit to figure out how these multipart messages actually work, and here's what i found.


*multipart/mixed  ---->  *multipart/alternative   ---->   *text/plain
                   ->  attachment                ->   *text/html
                   ->  attachment
                   ->  attachment
                   ->  attachment
                   ->  attachment

where attachment can be an file type: image/gif, image/jpg, application/octet-stream, etc.

whenever you send an email, all the information of the body and attachments is clumped into one spot.  all the attachments, text and html will be just one long String in the email itself, and these content types will tell the mail service how to read and parse the different types of data included.

an email's message type can be one of four different entry points on this graph, either:\
- multipart/mixed - if the message has attachments
- multipart/alternative - if the message has a text/plain alt for it's html part
- text/plain - if the message is just text
- text/html - if the message is just html

once i had this graph, building the code was rather easy..  you've got 4 entry points (indicated by the * above) and 2 exit points.  return text/html if it exists, otherwise return text/plain:


    private BodyMessage getMessageBody(Message msg) throws MessagingException, IOException
    {
        Part messagePart = msg;
        String contentType = msg.getContentType();
        Object content = messagePart.getContent();

        while(contentType.contains("multipart/mixed"))
        {
            //if the message type is mixed, grab the first child
            //the other children are attachments
            messagePart = ((Multipart) content).getBodyPart(0);
            content = messagePart.getContent();
            contentType = messagePart.getContentType();
        }

        if(contentType.contains("multipart/alternative"))
        {
            //if the message type is an alternative, grab the text/html part (second part)
            try
            {
                messagePart = ((Multipart) content).getBodyPart(1);
            }
            catch(Exception e)
            {
                //or grab the text/plain part, if the text/html part doesn't exist.
                messagePart = ((Multipart) content).getBodyPart(0);
            }
            content = messagePart.getContent();
            contentType = messagePart.getContentType();
        }

        if(contentType.contains("text/plain"))
        {
            return new BodyMessage(content.toString(), "text/plain");
        }
        else if(contentType.contains("text/html"))
        {
            return new BodyMessage(content.toString(), "text/html");
        }
        return new BodyMessage("No text found in received message.", "text/plain");
    }

    private class BodyMessage
    {
        public String msg;
        public String mimeType;

        public BodyMessage(String msg, String mimeType)
        {
            this.msg = msg;
            this.mimeType = mimeType;
        }
    }

____

interesting enough, i had come across this same type of multipart interface before when developing the file uploading application with ASP/VBscript.

Whenever you want to use an HTML form to send a file somewhere you must use the multipart/form-data encoding type for the form, like so:


<form name="form" id="form" method="post" action="fileupload.jsp" enctype="multipart/form-data">

that will grab all the raw binary data from the file, and slap it onto the Request object sent to the next page.  parts of the internet aren't so mutually exclusively coded after all, are they? ^_^

also, through stress testing the form submitting file uploading, 2 gigs of data is far too large.  the form can't build and send the data without crashing.  the limitations of html.  i'd guess more than 2^31 bytes is too much.