Components of Web Browsers
Web Browser is among one of the software that changed the World. The usage of a Web Browser is increasing with the reach of the internet around the corners of the world.
Web Browser is a software program that allows users to locate, access and view web pages. The primary functionality of the browser includes - Information Retrieval, Rendering Web pages, Navigation Functionality
Understanding of the Browser Components and the significance of each component can help developers to write more fast, optimized code.
High-level components of the Web Browsers include -
- User Interface
- Browser Engine
- Rendering Engine
- Networking
- UI Backend
- Javascript Interpreter
- Data Storage
Figure : High-level Components of Browser
User Interface
User Interface of the Browser includes all the elements of the Browser Window except the content. User Interface elements include - Home button, Back and Forward Buttons, Reload and Stop Button, Address Bar, Site Information Button and Details popup, Bookmark Button, Menu, Web Inspector etc.
Browser Engine
Browser Engine acts as the bridge between User Interface and Rendering Engine. It provides the methods for each of the Interactive User Interface component of the browser, e.g. Backward and Forward Buttons, Home Buttons.
Rendering Engine
Rendering Engine renders the contents of the Web Page. By default, Rendering Engine can render HTML, XML and Images. It renders the content based on the MIME Type of the response received for the requested URL. For example, if the MIME type is “text/html”, Rendering Engine parses HTML and CSS and renders the content. It renders other types of content using plugins and extensions. For example, PDF is rendered using PDF Viewer.
Rendering Engines are also called Layout Engines or Web Browser Engines and the implementations are different for each browser. Chrome and Opera use Blink(fork of Webkit), Opera v7 to v12 use Presto, Safari uses Webkit, Firefox uses Gecko, Internet Explorer uses Trident and Edge uses EdgeHTML.
Networking
Functionalities of Network stack include Socket Management, Connection Limits, Formatting Network Requests, Sandboxing individual applications from one another, dealing with Proxies, Caching, Request Prioritization, Protocol Negotiation and much more.
UI Backend
This component draws the basic User Interface Widgets like Select Boxes, Inputs, Radio Buttons, Checkboxes, Color Pickers, Date and Time Pickers, Alert Boxes, Confirm Boxes, Prompts, etc.
Javascript Interpreter
Javascript Interpreter parses and executes Javascript Code and sends the output to Rendering Engine. Different browsers use different rendering engines. E.g Chrome uses V8, Safari uses Nitro, MS Edge uses Chakra, MS IE uses Chakra and Firefox uses SpiderMonkey.
Data Storage
Data Storage component is used to store data in the browser. The storage options include Cookies, Web Storage, Web SQL database, Indexed Databases and File Access.
Web Browser comprises of many other components which include Process Management, Security Sandboxes, Optimization Caches, Sensors, Audio and Video, etc.
Main Flow of Rendering Engine
The rendering Engine gets the contents of the document from the networking layer usually in 8kb chunks and carries out following tasks on the content of the document. Rendering is the gradual process, Rendering Engine starts it as soon as it starts receiving content of the document to be rendered.
Content Tree Construction
HTML elements are converted to DOM nodes.
Render Tree Construction
Styles from Style Elements and External Stylesheets are parsed and added to the Content Tree to generate Render Tree.
Layout Process
Each node of the Render Tree is assigned a position.
Painting Process
Each node of the Render Tree is painted using UI Backend.
Figure : Webkit Main Flow
Parsing
Parsing is the key process carried out by a Rendering Engine. Parsing is a process in which the input is broken down into smaller elements in order to convert the input in some other format. The parsing process generates the document structure which comprises of node trees.
The expression ( 2 + 3 - 1 ) after parsing looks like -
Figure : mathematical expression tree node
The document is parsed based on the Vocabulary and Syntax Rules followed by the document. These rules are called Context Free Grammar and must be followed by a code to be parsed.
Figure : Document to Parse Tree
Parsing comprises of two processes - Lexical Analysis and Syntax Analysis.
In the Lexical Analysis process, the code is broken down into the tokens which are valid elements in the vocabulary of the language. It is carried out by Lexer or Tokenizer. The process is also known as Tokenization.In the Syntax Analysis process, the Syntax Rules are applied on the tokens returned by Lexer.
Parsing is the recursive process. The parser tries to match Syntax Rules with tokens returned by Lexer. If the Syntax Rule matches, the token is added to the Parse Tree and parser asks for a new token. If the rule is not matched then the token is saved internally and parser asks for a new token until a rule is matched with all the internally stored tokens is found. If the rule is not matched the parser raises an exception. This means that the document was not valid as per Context Free Grammar.
Representation of Vocabulary and Syntax
The Vocabulary of the language is expressed in the form of Regular Expressions.
Context Free Grammar is defined in Backus–Naur form notation technique which is used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols.
Types of Parsers
There are two types of parsers - Top-down Parsers and Bottom-up Parsers. Top-down parsers examine the high-level structure of the syntax and try to find a rule match. Bottom-up parsers start with the input and gradually transform it into the syntax rules, starting from the low-level rules until high-level rules are met.
Let's see how the two types of parsers will parse our example.
The top-down parser will start from the higher level rule: it will identify 2 + 3 as an expression. It will then identify 2 + 3 - 1 as an expression (the process of identifying the expression evolves, matching the other rules, but the start point is the highest level rule).
The bottom-up parser will scan the input until a rule is matched. It will then replace the matching input with the rule. This will go on until the end of the input. The partly matched expression is placed on the parser's stack. This type of bottom-up parser is called a shift-reduce parser, because the input is shifted to the right (imagine a pointer pointing first at the input start and moving to the right) and is gradually reduced to syntax rules.
Stack |
Input |
a + b * c | |
Term | + b - c |
Term operation | b- c |
Expression | -c |
Expression operation | c |
Expression | - |
HTML Parser
HTML Parser converts HTML markup into the Parse Tree. HTML Grammar and syntax are defined in W3C HTML5 Syntax Specification.
HTML cannot easily be defined by a context-free grammar that parsers need. There is a formal format for defining HTML–DTD (Document Type Definition)–but it is not a context-free grammar. HTML cannot be parsed easily by conventional parsers since its grammar is not context-free. HTML cannot be parsed by XML parsers.
HTML DTD
HTML conforms to Data Type Definition format which is used to define languages of the SGML family. The format contains definitions for all allowed elements, their attributes and hierarchy. HTML DTD doesn't form a context-free grammar.
There are a few variations of the DTD. The strict mode conforms solely to the specifications but other modes contain support for markup used by browsers in the past. The purpose is backward compatibility with older content. However, HTML5 is not based on SGML and therefore does not require a reference to DTD.
The Document Object Model(DOM) is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. Parse Tree is a tree structure formed with DOM elements and attribute nodes.
Figure : DOM Tree
HTML Parser
Figure :HTML Parsing Process
Given an encoding, the bytes in the input stream must be converted to Unicode characters for the tokenizer, these processes are carried out in Byte Stream Decoder and Input Stream Preprocessor. Tokenization is the lexical analysis, parsing the input into tokens. Among HTML tokens are start tags, end tags, attribute names and attribute values. The tokenizer recognizes the token, gives it to the tree constructor, and consumes the next character for recognizing the next token, and so on until the end of the input.
CSS Parser
The CSS bytes are converted into characters, then tokens, then nodes, and finally they are linked into a tree structure known as the "CSS Object Model" (CSSOM):
Figure :CSS Parsing Process
When computing the final set of styles for any object on the page, the browser starts with the most general rule applicable to that node (for example, if it is a child of a body element, then all body styles apply) and then recursively refines the computed styles by applying more specific rules; that is, the rules "cascade down."
Figure :CSSOM Tree
CSS Grammar is defined in the W3C CSS 2.1Grammar documentation.
Webkit uses Flex(Fast Lexical Analyzer Generator) to generate Scanners or Lexers and Bison to generate Parser. These generators use CSS Grammar files to generate Lexer and Parser. Bison generates Bottom-up Shift-Reduce Parser. Firefox uses a top-down parser written manually. In both cases each CSS file is parsed into a StyleSheet object, each object contains CSS rules. The CSS rule objects contain selector and declaration objects and other object corresponding to CSS grammar.
Order of Loading External Resources
Since stylesheets don’t change the DOM Tree, they are loaded parallel to the document parsing. But there is Edge case when the script requires values of style properties before stylesheets are loaded, they get wrong values. To avoid this Webkit blocks scripts before all stylesheets are loaded and parsed, only when the scripts try to access style properties. Firefox blocks scripts before all stylesheets are loaded and parsed, all the time.
Render Tree Construction
Along with the construction of DOM tree, Rendering Engine constructs Render Tree. The CSSOM and DOM trees are combined into a render tree, which is then used to compute the layout of each visible element and serves as an input to the paint process that renders the pixels to screen. Render tree contains only the nodes required to render the page.
Figure :Render Tree
To construct the render tree, the browser roughly does the following:
1.Starting at the root of the DOM tree, traverse each visible node.
- Some nodes are not visible (for example, script tags, meta tags, and so on), and are omitted since they are not reflected in the rendered output.
- Some nodes are hidden via CSS and are also omitted from the render tree; for example, the span node---in the example above---is missing from the render tree because we have an explicit rule that sets the "display: none" property on it.
2.For each visible node, find the appropriate matching CSSOM rules and apply them.
3.Emit visible nodes with content and their computed styles.
Changing the DOM, through adding and removing elements, changing attributes, classes, or through animation, will all cause the browser to recalculate element styles and, in many cases, layout (or reflow) the page, or parts of it. This process is called computed style calculation.
The first part of computing styles is to create a set of matching selectors, which is essentially the browser figuring out which classes, pseudo-selectors and IDs apply to any given element.The second part of the process involves taking all the style rules from the matching selectors and figuring out what final styles the element has.
As per explained by Rune Lillesveen in Style Invalidation in Blink, the issue in style computations is - Roughly 50% of the time used to calculate the computed style for an element and is used to match selectors, and the other half of the time is used for constructing the RenderStyle (computed style representation) from the matched rules.
There are 2 ways to solve this issue -
- Reduce the complexity of your selectors; use a class-centric methodology like BEM.
- Reduce the number of elements on which style calculation must be calculated.
Stylesheet Cascade Order & Specificity
The cascading algorithm determines how to find the value to apply for each property of each document element.
- It first filters all the rules from the different sources to keep only the rules that apply to a given element. That means rules whose selector matches the given element and which are part of an appropriate media at-rule.
- Then it sorts these rules according to their importance, that is, whether or not they are followed by !important, and by their origin. The cascade is in ascending order, which means that !important values from a user-defined style sheet have precedence over normal values originated from a user-agent style sheet:
- In case of equality, the specificity of a value is considered to choose one or the other.
Origin |
Importance |
user agent | normal |
user | normal |
author | normal |
CSS Animations | see below |
author | !important |
user | !important |
user agent | !important |
CSS animations, using @keyframes at-rules, define animations between states. Keyframes don't cascade, meaning that at any given time CSS takes values from only one single @keyframes, and never mixes multiple ones together.
When several keyframes are appropriate, it chooses the latest defined in the most important document but, never combined all together.
Also, note that values within @keyframes at-rules overwrite all normal values but are overwritten by !important values.
Specificity is the means by which browsers decide which CSS property values are the most relevant to an element and, therefore, will be applied. Specificity is based on the matching rules which are composed of different sorts of CSS selectors. Specificity is a weight that is applied to a given CSS declaration, determined by the number of each selector type in the matching selector. When multiple declarations have equal specificity, the last declaration found in the CSS is applied to the element. Specificity only applies when the same element is targeted by multiple declarations. As per CSS rules, directly targeted elements will always take precedence over rules which an element inherits from its ancestor.
Layout Process
Upto Rendering Process, we get all nodes that should be visible and style attributes of those nodes. The only missing attributes are position and size of the elements within the viewport of the device. These are calculated in the Layout Process. It is also known as Reflow process.
Layout is a recursive process. It gets started at Root Renderer( HTML element ) and continues through some or all of the renderers in the frame hierarchy for which geometric information needs to be calculated.
Dirty Bit System is used in browsers to avoid calculating entire layout when a small change happens. When a new renderer is added or existing renderer is changed, it marks itself and its children as “dirty”. If the renderer and its children are dirty “dirty” flag is used. When the renderer is not changed but one or more of children are changed or added, “children are dirty” flag is set.
Global Layout Process happens when the process is triggered on entire Render Tree as a result of global style changes. E.g. window size change, global style change. Global Layout is usually done synchronously.
Incremental Layout Process happens when the process is triggered on dirty renderers as a result of style changes in a particular renderer or its children or addition of DOM node. Incremental Layout is usually done asynchronously, except in some cases for example, when scripts request style values.
Steps in the Layout Process are as follows -
1.Parent Renderer determines its own width
2.Parent goes over children and
- Place the Child Renderer
- Calls child’s layout when if required - global layout process, when children are dirty
3.Parent uses accumulative heights of children, margins and padding to set its own height. This height in turn will be used by it’s Parent renderer for own height calculation
4.Set its dirty bit to false.
Width of the renderer is calculated using the container block’s width, renderer’s width, margin and padding. Width Calculation in the browser is carried out as -
- Container width is the maximum of available width and 0. contentWidth is calculated as, clientWidth() - paddingLeft() - paddingRight() clientWidth() is the width of the element excluding border and scrollbar.
- Element width is calculated based on it’s style attribute. If the value is mentioned in px, it is used as it is. If it is specified in percentage, it will be calculated based on contentWidth of its container.
- After that horizontal borders and padding are added.
Painting Process
Painting is the process which converts each node in the render tree to actual pixels on the screen. It is also called as Rasterizing. When layout is complete, the browser issues “Paint” event which actually draws the content on the screen using Infrastructure Component of the browser.
Similar to Layout process, Paint process can also be Global or Incremental. In the Global Paint process, entire tree is painted. In the Incremental Paint process, some renderers and their children are painted. Renderer which is modified invalidates its rectangle on the screen which causes OS to consider this rectangle as a “dirty region” and generate “Paint” event. In Chrome this process is complicated since the renderer is in the separate process than the main process. Chrome simulates the OS behavior to some extent. The presentation listens to these events and delegates the message to render root. The tree is traversed until the relevant renderer is reached. It will repaint itself and its children if required.
Order of the Painting Process is defined by CSS2 Specification and is in the order in which elements are stacked in the stacking contexts. The order affects painting since the stacks are painted from back to front. The stacking order of the Renderer is - Background Color, Background Image, Border, Children, Outline.
Firefox builds a display list for the painted rectangle which contains nodes that will be actually visible on the screen, relevant to the rectangle in the right painting order. In this way the tree will be traversed only once for repaint.
Before repainting, WebKit saves old rectangle as a bitmap and paints only difference between old and new rectangles.
Browsers try do make minimal possible actions in response to changes. If color of the element changes, the browser will only repaint the element. If position of the element changes, browser will carry out layout and repaint of the element, children and possibly siblings. If DOM node is added, browser will carry out layout and painting of the node. If major change like font-size change of the root element happens then all layout caches are invalidated, relaout and repaint of the entire tree is carried out.
Leave a Reply
Your email address will not be published. Required fields are marked *