Web Page Printing and Output Format Issues
A well-designed web site looks great! But printing it doesn’t always work so well. There are a lot of options to help solve this problem.
TL;DR: For the vast majority of situations, I recommend:
- Downloadable Static Report: PDF
- Downloadable data for local manipulation/analysis: Excel
As to the specific issue of Excel, CSV, etc. That is a big problem in the industry, but I firmly believe that Excel output (as much as I may not like Microsoft) is the only practical solution for complex, functional, output. The options for output from a web page (technically for any computer system) currently include:
- “Static” HTML/CSS
- Static images (JPG, PNG)
- Flash (and similar technologies)
- Plain Text
- JSON and XML
- Other proprietary formats
This is a “traditional” web page.
- Easy to create
- Works on any device
- Printing is problematic – depending on many factors, some of which (browser configuration) are beyond server control, printing will not necessarily produce all the expected output and/or may move things around in unexpected (and undesirable) ways.
- Any changes (e.g., select/deselect an item on a chart) require a full page reload, which is disruptive to the user experience.
This is how almost all web pages have operated for many years except for the most basic. This has been taken to an extreme by Google Docs and similar systems where every mouse move or keystroke results in complex action on the screen, and often in a server round trip (AJAX) to update the server or load new data. Done well, this provides an experience close to that of desktop software but with every detail under server control – i.e., no software required on the client computer beyond a browser.
- Provides a highly interactive and dynamic user experience
- More work to create than a static/reload-for-changes page
- Printing is even more problematic than with a static HTML/CSS page.
Static images (JPG, PNG)
- Complete server side control – i.e., user actions or browser configuration problems will have absolutely no detrimental effect
- Images can be captured in an application (Word, PowerPoint, desktop publishing, etc.) for other uses.
- Printing is easy. Actually, printing can still have some very minor sizing issues, but that can be handled by capturing in another application and then printing.
- If used as a web page display, no user interaction except via page reload.
- No use of actual data – e.g., if a legend or data table is included, it is just a bitmap and not text that can be manipulated.
Flash (and similar technologies)
Fortunately, we never used Flash for any PMEI products, so no replacement work needed, but including for completeness.
- At one time, this was one of the best ways to generate dynamic web sites.
- Printing is extremely problematic.
A plain text page has absolutely no formatting except tabs and line breaks. Included for completeness but normally only used to meet very specific requirements – e.g., simple data to be captured and loaded into another program.
- Extremely easy to produce
- Useful for data capture needed to meet very specific, but simple, requirements.
- Prints great!
- Not a modern user experience, as it has no graphics (not even logos), bold, underline, font sizes, etc.
CSV = Comma Separated Values
It is a very simple plain text format but with a minimal amount of structure. That structure is enough to support the spreadsheet concept of rows and columns. For a straight machine-to-machine data transfer it works quite well. However, it does not provide ANY user formatting beyond assignment of data to an X-Y grid. It does not include any: bold, underline, images (e.g., logos), column widths, font sizes, etc.
Some high-profile sites (e.g., PayPal) use CSV for their “Excel” reports. That works in the sense that a valid CSV file is, by definition, compatible with Excel. However, those downloaded files do not provide any formatting, requiring users to apply any formatting, even setting obvious choices such as “currency format”, on every new download file.
- Ideal for very simple data transfer – typically first row contains field names and all other rows contain field data.
- Poor user experience. Downloads on a typical desktop will automatically open in a spreadsheet such as Excel, but absolutely no formatting is provided. Any serious spreadsheet user (“analyst”) will need to, at a minimum, resize columns, set decimal places & other field formatting and add heading information beore doing any actual work.
- Printing is dependent on user creating reasonable formatting (e.g., if some fields are longer than others, default load by Excel may result in fields that will be “chopped off” when printed)
PDF is the ultimate “printing” format. It is designed to provide 100% What-you-see-is-what-you-get across all devices and with fonts, sizes, image locations, etc. all virtually identical across all devices.
- Printing is perfect!
- Display is great. The only problem on display is that while many browsers can display a PDF directly, a PDF does not provide the interactivity or scalability of a regular HTML/CSS web page. So display as a download is perfect but it does not substitute for an actual web page.
- Users can’t easily manipulate the data – i.e., it will stay “as downloaded” except possibly for extreme power users. (i.e., a Pro for security)
- Users can’t easily manipulate the data – i.e., it will stay “as downloaded” except possibly for extreme power users. (i.e., a Con for users who actually need to manipulate/analyze the data)
- More work to generate than simpler formats, though the amount of work is proportional the level of detail needed, and libraries (e.g., ReportLab for Python) can do quite a bit.
Excel is a proprietary spreadsheet format from Microsoft. It has become the de facto standard for complex spreadsheet output. Typical libraries (similar for Perl, PHP, Python, etc.) can create native Excel files that include text formatting (bold, underline, italic, font sizes, etc.), colors (both foreground & background), formulas (though often it is easier to do the calculations server-side and store computed values in the spreadsheet), field formatting (decimal places, wrapping, etc.), headings, embedded images (e.g., logos), and much more.
In short, a server-created Excel file can provide a user experience virtually the same as a high-level human-created spreadsheet.
- Printing is perfect!
- Display is great, though only after download as most browsers do not have a built-in Excel viewer.
- Power spreadsheet users can dive in to the file immediately without having to first adjust columns and formatting as needed with CSV.
- Exporting to other formats (anything supported by Excel) and other data manipulation can be done easily by the users.
- If the ONLY need is save/print then PDF provides a better experience as it prevents most users from making any changes.
JSON and XML
JSON and XML are two formats used primarily internally by various programs for internal transfer. They are both technically human-readable, but are not designed for direct human use. I have included them here because many transfers between systems require these formats.
- Clean and simple for machine-to-machine communication (unlike PDF, Excel – though Excel is popular and standardized enough that it can often be used too)
- Far more structured than CSV (e.g., allows for multiple levels of data where CSV is “flat”)
- Easy to create.
- Useless for printing and other direct human interaction
Other proprietary formats
There are many other formats that have been created over the years.
Some are specific to particular applications but more officially open than Microsoft products (e.g., WordPerfect made their full data file structures available to developers for a nominal fee where Microsoft…), but Excel (for spreadsheets), Word (for documents) and PowerPoint (for presentations) have become de facto standards due to market share, and Excel is the only one of those relevant to typical report/data downloads.
Some have been attempts at “universal” complex data formats, but PDF is the only one that I am aware of that has become a standard, with free readers available for every common platform and numerous libraries (as well as proprietary Adobe products) to create the files.