Web Page Printing and Output Format Issues
A well-designed web site looks great! But printing it doesn’t always work so well. There are a lot of options to help solve this problem.
TL;DR: For the vast majority of situations, I recommend:
- Online viewable: HTML/CSS, generally with Javascript
- Downloadable Static Report: PDF
- Downloadable data for local manipulation/analysis: Excel
As to the specific issue of Excel, CSV, etc. That is a big problem in the industry, but I firmly believe that Excel output (as much as I may not like Microsoft) is the only practical solution for complex, functional, output. The options for output from a web page (technically for any computer system) currently include:
- “Static” HTML/CSS
- “Dynamic” HTML/CSS with Javascript
- Static images (JPG, PNG)
- Flash (and similar technologies)
- Plain Text
- CSV
- Excel
- JSON and XML
- Other proprietary formats
“Static” HTML/CSS
This is a “traditional” web page.
Pros:
- Easy to create
- Works on any device
Cons:
- Printing is problematic – depending on many factors, some of which (browser configuration) are beyond server control, printing will not necessarily produce all the expected output and/or may move things around in unexpected (and undesirable) ways.
- Any changes (e.g., select/deselect an item on a chart) require a full page reload, which is disruptive to the user experience.
“Dynamic” HTML/CSS with Javascript
This is how almost all web pages have operated for many years except for the most basic. This has been taken to an extreme by Google Docs and similar systems where every mouse move or keystroke results in complex action on the screen, and often in a server round trip (AJAX) to update the server or load new data. Done well, this provides an experience close to that of desktop software but with every detail under server control – i.e., no software required on the client computer beyond a browser.
Pros:
- Works on nearly any device (the limitations of complex Javascript on small devices and differences between HTML/CSS/Javascript implementations on different browsers have essentially disappeared thanks to tools such as jQuery and industry standards such as HTML5).
- Provides a highly interactive and dynamic user experience
Cons:
- More work to create than a static/reload-for-changes page
- Printing is even more problematic than with a static HTML/CSS page.
Static images (JPG, PNG)
In the earliest web pages, a chart would be created by the server as an image file and displayed by the client. This can be used for more complex information as well. However, HTML/CSS/Javascript can now, with some effort, create nearly any image that can be created on the server, with the significant advantage of interactive usage. That being said, there is still one big use for static images – using one server to capture an image of a web page at high-resolution on a virtual client browser so that an image created via a dynamic Javascript process can be used for other purposes, or printed properly.
Pros:
- Complete server side control – i.e., user actions or browser configuration problems will have absolutely no detrimental effect
- Images can be captured in an application (Word, PowerPoint, desktop publishing, etc.) for other uses.
- Printing is easy. Actually, printing can still have some very minor sizing issues, but that can be handled by capturing in another application and then printing.
Cons:
- If used as a web page display, no user interaction except via page reload.
- No use of actual data – e.g., if a legend or data table is included, it is just a bitmap and not text that can be manipulated.
Flash (and similar technologies)
Before advanced, open Javascript libraries and, especially, HTML5, producing dynamic, highly interactive web pages often used Adobe Flash. Some sites – e.g., www.ziphany.com – still do. Flash was very advanced for its time but has been largely replaced by a combination of HTML5 and various Javascript libraries. Many browsers (stock Android, iPhone) don’t support Flash at all (some used to but don’t any more, some never did) and in typical desktop browsers (Firefox, Chrome), Flash is being deprecated and is no longer automatically available and functional. Adobe has officially called for an end to Flash in 2020.
Fortunately, we never used Flash for any PMEI products, so no replacement work needed, but including for completeness.
Pros:
- At one time, this was one of the best ways to generate dynamic web sites.
Cons:
- Deprecated/end-of-life
- Printing is extremely problematic.
Plain Text
A plain text page has absolutely no formatting except tabs and line breaks. Included for completeness but normally only used to meet very specific requirements – e.g., simple data to be captured and loaded into another program.
Pros:
- Extremely easy to produce
- Useful for data capture needed to meet very specific, but simple, requirements.
- Prints great!
Cons:
- Not a modern user experience, as it has no graphics (not even logos), bold, underline, font sizes, etc.
CSV
CSV = Comma Separated Values
It is a very simple plain text format but with a minimal amount of structure. That structure is enough to support the spreadsheet concept of rows and columns. For a straight machine-to-machine data transfer it works quite well. However, it does not provide ANY user formatting beyond assignment of data to an X-Y grid. It does not include any: bold, underline, images (e.g., logos), column widths, font sizes, etc.
Some high-profile sites (e.g., PayPal) use CSV for their “Excel” reports. That works in the sense that a valid CSV file is, by definition, compatible with Excel. However, those downloaded files do not provide any formatting, requiring users to apply any formatting, even setting obvious choices such as “currency format”, on every new download file.
Pros:
- Ideal for very simple data transfer – typically first row contains field names and all other rows contain field data.
Cons:
- Poor user experience. Downloads on a typical desktop will automatically open in a spreadsheet such as Excel, but absolutely no formatting is provided. Any serious spreadsheet user (“analyst”) will need to, at a minimum, resize columns, set decimal places & other field formatting and add heading information beore doing any actual work.
- Printing is dependent on user creating reasonable formatting (e.g., if some fields are longer than others, default load by Excel may result in fields that will be “chopped off” when printed)
PDF is the ultimate “printing” format. It is designed to provide 100% What-you-see-is-what-you-get across all devices and with fonts, sizes, image locations, etc. all virtually identical across all devices.
Pros:
- Printing is perfect!
- Display is great. The only problem on display is that while many browsers can display a PDF directly, a PDF does not provide the interactivity or scalability of a regular HTML/CSS web page. So display as a download is perfect but it does not substitute for an actual web page.
- Users can’t easily manipulate the data – i.e., it will stay “as downloaded” except possibly for extreme power users. (i.e., a Pro for security)
Cons:
- Users can’t easily manipulate the data – i.e., it will stay “as downloaded” except possibly for extreme power users. (i.e., a Con for users who actually need to manipulate/analyze the data)
- More work to generate than simpler formats, though the amount of work is proportional the level of detail needed, and libraries (e.g., ReportLab for Python) can do quite a bit.
Excel
Excel is a proprietary spreadsheet format from Microsoft. It has become the de facto standard for complex spreadsheet output. Typical libraries (similar for Perl, PHP, Python, etc.) can create native Excel files that include text formatting (bold, underline, italic, font sizes, etc.), colors (both foreground & background), formulas (though often it is easier to do the calculations server-side and store computed values in the spreadsheet), field formatting (decimal places, wrapping, etc.), headings, embedded images (e.g., logos), and much more.
In short, a server-created Excel file can provide a user experience virtually the same as a high-level human-created spreadsheet.
Pros:
- Printing is perfect!
- Display is great, though only after download as most browsers do not have a built-in Excel viewer.
- Power spreadsheet users can dive in to the file immediately without having to first adjust columns and formatting as needed with CSV.
- Exporting to other formats (anything supported by Excel) and other data manipulation can be done easily by the users.
Cons:
- If the ONLY need is save/print then PDF provides a better experience as it prevents most users from making any changes.
JSON and XML
JSON and XML are two formats used primarily internally by various programs for internal transfer. They are both technically human-readable, but are not designed for direct human use. I have included them here because many transfers between systems require these formats.
Pros:
- Clean and simple for machine-to-machine communication (unlike PDF, Excel – though Excel is popular and standardized enough that it can often be used too)
- Far more structured than CSV (e.g., allows for multiple levels of data where CSV is “flat”)
- Easy to create.
Cons:
- Useless for printing and other direct human interaction
Other proprietary formats
There are many other formats that have been created over the years.
Some are specific to particular applications but more officially open than Microsoft products (e.g., WordPerfect made their full data file structures available to developers for a nominal fee where Microsoft…), but Excel (for spreadsheets), Word (for documents) and PowerPoint (for presentations) have become de facto standards due to market share, and Excel is the only one of those relevant to typical report/data downloads.
Some have been attempts at “universal” complex data formats, but PDF is the only one that I am aware of that has become a standard, with free readers available for every common platform and numerous libraries (as well as proprietary Adobe products) to create the files.