Paged.js: a Demo
Print Books with Browsers
Julie Blanc (@julieblancfr)
Publishing Fair Torino – November 24, 2019
Hello, I'm Julie Blanc, i'm a graphic designer and PhD student at University of
Paris 8 and ensadlab. I'm also part of the core team of paged.js and that's why I'm here.
Today, we read on more and more supports and media
Responsive web publications
HTML, CSS, JavaScript...
We can publish responsive web publications. To design this we use HTML and CSS.
Electronic books
ePUB (HTML, CSS)
We can also publish electronic books with the ePUB format. This format also use
HTML and CSS.
Print
For a printed publication, the only way is to create a PDF.
And the most commun solution to design printed publications is InDesign.
A proprietary and expensive software.
where only the graphic designer can enter corrections
with backwards compatibility problems.
Without versioning possibility.
I'm going to stop there, but you know...
Web publications → HTML, CSS, JavaScript...
Electronic books → ePUB (HTML, CSS)
Print → Indesign HTML, CSS ?
But what append if we use also HTML and CSS for print ?
This. One content for all format.
This slide is very simplified and the workflow is much more complex than that. But the idea is
there. It becomes possible to create once a content that would go to all outputs.
I'm going through it quickly because there's a lot to say, but that's not what I came to talk to you
about today.
we will look at how it is possible, very concretly , with HTML content to create a PDF for
print-ready books.
before that, just a quick reminder about web technologies.
The principle is to seperate the content and the presentation with different languages.
HTML is for sementical, the structure content of a document. With this code, I say "this element is
a title."
CSS is for visual and design aspect, I say "this element have a font-size of twenty-for point or a
bold weight".
JavaScript is for a lot of thing, dynamic transformation of the HTML or the CSS, animation,
functionnality... With Javascript I can say, "on click, this element become red"
Automated typesetting and pagination for print
Make PDF outputs of HTML contents from browsers
So, back to print. Today I'm going to talk about automated typesetting and
pagination for print. In other words, how to make PDF outputs of HTML content from browsers.
Flux → Pagination
When you want to print HTML, the main difficulty is to transform the flow into paginated
content.
You need different rules from those used to display content in the browser: for example, you need to
control when and where your content needs a page break.
You also need many specific elements used in printed layout: margins, running headers, page numbers,
a table of contents, and so on.
A lot of solutions and tools exists to make printed publications with web technologies. Some are
better than others, some are open-source, some are proprietary but non ticked all what we needed.
Problems
Proprietary vs. open-source
(own) Rendering engines
Non-standard properties
No visual preview
1/ First is the choice between open-source or proprietary solutions.
Concerning the layout possibilities, the most effective tools are the proprietary ones but they are
very expensive and you can't hack it if you want a new feature.
2/ Second, most of this tools are based on their onw layout engine.
The result is that the support of CSS is limited. You can't use the latest new properties like CSS Grid or variable font.
3/ It's also current that this solutions implement Non-standard CSS properties with different prefixes. So, you can't use your stylesheet in other tools.
4/ And last but not least, these tools are usable with command line. That means no visual preview of your layout before generating the PDF. You need to regenerate your outputfile to check any changes.
What we need
Open and free tool(s)
Based on web standards
Visual preview
Automated workflows
Now, I highlight what we really need to make PDF from HTML.
1/ We need open and free tools, driven by the community.
2/ These tools must be based on web standards.
3/ We need a visual preview so we can design and debug more easily and quickly.
4/ And these tools must be adaptable to any workflows and easy to use in automated workflows.
A free and open source JavaScript library that paginates content in browser
to create PDF outputs from any HTML content
https://pagedmedia.org/paged-js
Our missison was to develop a free and open-source library to paginate content in the browser to create PDF outputs from any HTML content based on the W3C specifications. Paged.js was born.
Team
Founder: Adam Hyde
Core team: Fred Chasen, Julie Blanc, Julien Taquet
Funding: Shuttleworth Foundation, Cabbage Tree Labq
Adam Hyde lead the projet and the core team is Fred Chasen, Julien Taquet and me. The project was
supported previously by the Shuttleworth Foundation and now by the Cabbage Tree Lab
But it is a project that we want community-driven and we invite everyone to participate.
Standards
(W3C)
What we want above all is to build tools that respect World wide web consortium standards.
So we looked at what they already had on their shelves to print using CSS.
These are the modules that are particular interest to us.
CSS Paged Media Module “describes the page model that partitions a flow into pages.
(…)
It adds functionality for page margins, page size and orientation, trim size.
CSS Generated Content for Paged Media Module defines the requirements for content
automatically generated form the book content:
running headers and footers, footnotes, generated text for cross-references or table of
contents.
CSS Fragmentation Module defines how and where the content can be fragmented,
including across page breaks.
CSS page floats defines how an element is to be removed from the flow and be placed
into a different place depending on the page type. For exemple, on the top left of the page.
The idea is to use thos modules in a print stylesheet with the media query "print" like you would do
for mobile or tablet. Print is a varient of responsive web.
The styles declared in this media query will only be applied when the web page is printed from the
browser print dialog to create a PDF.
Here is an extract from CSS for printing. It is quite readable and simple to use.
Browser developers have already taken some interest in implementing parts of the Paged Media Working
Draft standards. @page rules have partial support in Chrome, Firefox and Edge.
But it's the only property implemented.
The idea came up to create a polyfill to support CSS print modules with current browsers.
In other words, Paged.js simulates CSS features that are not yet available on browsers.
How paged.js works ?
Now that I've told you what we want to do about standards. In practice, how does paged.js work?
Semantic content (HTML)
I will start with the display in a browser to better explain. We start with HTML tagged content. If
you open it in a browser, it's just a long flow.
CSS design
You can design this content in CSS, classically, like you do for a web site. If you add the specific
CSS for print, nothing will happen.
Paged.js script
Paginated display of content
You need to add the paged.js script to your page that will interpret the specific CSS, you can now
see your content displayed paginated in your browser.
Web developpement tools
With the display in the browser, it is possible to use web development tools to get
instant feedback and debug. This is very useful when the document is being designed.
Printing
In a web browser
Once the design is finished, you can print directly from your web browser with its print dialog box.
It is also possible to generate a PDF at that time.
Demo
Now, let's see how the CSS works.
Page size and margins
First, you declare your page and margins in a @page rule. You can use inch or millimeters and even
pixels for margins.
This is the result.
I added some basic CSS so you can see the spread.
Symetric margins
The left and right page have the same size but typically are symmetrical to each other and are
centered on the gutter.
You can make a distinction between right and left pages with pseudo class selectors for @page rules
and define different margins.
It's better.
Now, you can see here a chapter starting in the middle of the left page.
We want it to start at the top of a right page. Let's add a page break.
Page breaks
You can write it like this to start the chapter on a right page..
And here the result. Note that if the chapter ends on a right page,
paged.js will automatically a blank left page to start the new chapter on the good page.
Page breaks
You have different way to force the break of your content. You can break before or after a fragment.
You can also specify if your want it to start on the next page or on the next left or right page.
Avoid page breaks
And if you want avoid a break, you have properties too.
In the first exemple, I want to avoid that a table is split in two pages.
In the second exemple, I want to avoid a break page after a title.
The CSS Generated Content for Paged Media Module introduce a new way to insert generated content
into your document: you can define it into the margin boxes of your pages.
Page numbers
Here is an exemple with page numbers. I select the bottom left margin boxes of left
pages and I call into it a specific counter named "page".
Also, you can style how the content appears in the margin boxes.
Now we have pages numbers. But we also want running headers.
String-set (running headers and footers)
You can use string and string-set properties to generate running headers and
footers.
This code define that all my title level two will be running title with the string-set properties.
After, I call this string in the bottom center margin boxes.
The string property act like a variable. It read your DOM and each time a new title
level 2 is encountered, it change the variable from the page where that title appears.
This variable is passed into the margin boxes of the page and into all the following margin
boxesuntil there is a new title.
This will add generated content to all left and right pages, included the pages created by page
breaks. Problem, we have now generated contents into pages without content.
Pseudo class selectors for pages
Fortunately, the W3C has defined Pseudo class selectors for specific pages. We already see the left
and right selector. But there is also a selector for the first page and a selector to specified a
page number of your document. And there is also a selector for blank pages.
Blank pages
So you select your blank pages and remove the generated content like this.
Now, here could be pages in your book that may need a special layout: different background,
different margins, different fonts.
You can use what is called “named pages” to define this. Based on your HTML, you can bind a
specific layout to any content.
Named pages
Imagine you want a specific layout for the pages of your frontmatter. That's how you will do with
the code.
You define that all sections with the class frontmatter will have a page template named here
"frontmatterLayout".
You create also a specific @page rule with the same name where you create new properties of the
page.
You can mix pseudo selectors and named pages. Here, I define a layout for all my chapter and I
select the first page of each chapter.
I move the counter page from bottom right to bottom left, I set a bigger font-size for the counter
and I delete the running head.
Here, the result. You can see that the first page of the chapter has a different
layout than the other pages below.
Columns layout
Paged.js also allow you to use all the properties already implemented in your browser. Here, I use
columns to create a two-column layout.
You can also use hyphens, drop caps, shape-outside or svg as all of this is
implemented in your browser.
Cross references
But let's go back to paged.js. You can also add generated content specific to your book content.
Well know use case would be cross-references.
Here I want to indicate on which page is my figure three. But maybe this page number will change
according to my layout. So I can't write it in my HTML, I have to generate it.
For that I can use the target-counter function. In my HTML, there is a link that
refers to the unique identifier of my figure three.
In CSS, I generate the text of my cross-reference and target the page where the identifier is
located.
You can use this function to create table of content and book indexes for example.
Collection of scripts
In addition to polyfill work of paged.js, we also build small scripts to facilitate the creation of
some elements of your document and provide adapted CSS.
For example, here, a script allows you to create a table of contents from scratch based on the
titles of your document. A linked stylesheet allows you to format it and create leaders.
We also have one scriptfor book indexes.
External scripts
It is possible to add other scripts to paged.js. For example, here, I added mathjax to render math
equations of this article.
This is the real strength of paged.js. You can always add scripts to increase the possibilities of
automated typesetting of your document.
Handlers
To do this, you can use handlers. You can also add your own. These are sort of hooks methods that
allow you to add your scripts at different times during rendering. For example, after the formatting
of each page.
Bleed and marks
Once you are ready to print, you may want to had bleeds and marks. Like this.
Patrick Radden Keefe, Addiction sur ordonnance , C&F éditions
Information Literacy and Theological Libraries, an open access publication in PDF
en ePUB from Atla Open Press
Not only textbooks
I like to show that paged.js is not only for textbook.
Web and printed publication for the Musée Saint-Raymond
Here is a project I made for a french museum. It's a web and printed publication.
There are many high quality images, with interactions like three-hundred and sixty degree viewers.
All the project is in single source publishing.
I'm making the printed version with paged.js right now. You can see my process.
I use CSS grid for some pages.
Pged.js is a polyfill, so you can use all the properties available in your browser.
What next for Paged.js ?
We have seen the features that have been implemented in paged.js, now look at what next.
Let's talk about notes.sMargin notes, sidenotes and endnotes are very easy to implement.
Endnotes are the easiest, just put them in the right place in your HTML.
Margins notes can be done with absolute posititonning.
Sidenotes can be move with a little script and are easy because you don't need to change the height
of your content area.
Footnotes
Footnotes, on the other hand, are difficult. But there are into W3C specifications. It's a features
we haven't implement yet. We hope to have a first implementation before the end of the year.
Page floats
Same with the page floats, it's a module difficult to implement because we need more time and
funding. With page floats you can position some elements in relation to the page.
MIT licence
Documentation - Website
Community (Mattermost)
All the work you have just seen is under MIT license. You can use paged.js as you wish and build
your own tools with it.
We are writing the documentation and a website should be available during the fall.
We also have a mattermost where you can come and discuss with the community
What about the future ?
Advocate for better support of print-related standards in browser engines
Paged.js is a polyfill so your files will still be usable when browsers implement preating
features.
By providing more examples of what can be accomplished using print CSS, we hope to advocate for
better support of print-related standards in browser engines. It's our first goal, until then will
try to do their job.
https://pagedmedia.org/paged-js
julie-blanc.fr/slides (@julieblancfr)
You can go to the first link to have more information and the link of our gitlab and our
mattermost. We are in the process of creating our website
This slides are available on my website.
I'm here all the day if you have any questions.
Thanks you for your attention
Fragmentation of the content (chunker)
Transformation of CSS declarations (polisher)
Preview (previewer)
Paged.js is made of three modules that work together:
- the chuncker fragment the content into discrete pages,
- the polisher transforms the CSS declarations
- and the previewer call the preview of your book in the browser.
So, Paged.js take all your rendered document content - this means your content with all the design
rules applied to it. It put this content in a box that have the size of your content area.
It try to fit all into this container and looks for the overflowing content.
After, the script create a new bow and puts the overflow content in it. And look for the next
overflowing content.
Paged.js do it all over again until the book is done.
Paged.js also build new boxes to create pages and and place your content boxrs on this pages.
In parallel, the script reads your CSS file to have the information about the print style and
transform your @page rules into classes that your browser understand today.
You write CSS that conforms to the standards, paged.js do the work for apply it on the transformed
DOM.
In parallel, the script reads your CSS file to have the information about the print style and
transform your @page rules into classes that your browser understand today.
You write CSS that conforms to the standards, paged.js do the work for apply it on the transformed
DOM.
Here is an exemple of paged.js implementation.
On the left, it's the specifications defined by the W3C: the margins are divided into sixteen boxes
where you can put generated content (like page number and running title).
On the right, it's an page rendered with paged.js. We use CSS and flexbox to create the margin boxes
This is what a document looks like when you inspect the DOM in the browser.
You can see the HTML elements created by paged.js.
This elements exists for the rendering only, nothing change in your initial HTML file.
Post Processing
pdf-lib (parsing and editing the structure)
hummus (visual updates)
Ghostscript (images and color color management)
Once we’ve generated a PDF, we can start post processing the file and adding missing features:
metadata or color management.
Paged.js: a Demo Print Books with Browsers Julie Blanc (@julieblancfr) Publishing Fair Torino – November 24, 2019 Hello, I'm Julie Blanc, i'm a graphic designer and PhD student at University of
Paris 8 and ensadlab. I'm also part of the core team of paged.js and that's why I'm here.