Learn to Program HTML in 21 Minutes


Hardcopy featured a 10-year-old boy one night back in the
mid-1990s. His psychotic mother wouldn’t take her meds and was beating
him up. He wanted to live with his father but the judge wouldn’t change
his custody arrangement. So the 10-year-old kid built a Web site to
encourage Internetters to contact the judge in support of a change in
custody.

If you think that you need professional help to build a static HTML Web
site, tell yourself “The abused 10-year-old got his site to
work; I think I can, too.”

You May Already Have Won $1 Million

Then again, maybe not. But at least you already know how to write legal HTML:

My Samoyed is really hairy.

That is a perfectly acceptable HTML document. Type it up in a text
editor, save it as index.html, and put it on your Web server. A Web
server can serve it. A user with Netscape Navigator can view it. A
search engine can index it.

Suppose you want something more expressive. You want the word
really to be in italic type:

My Samoyed is <I>really</I> hairy.

HTML stands for Hypertext Markup Language. The <I> is markup. It tells the browser to start rendering words in italics. The </I> closes the <I> element and stops the italics If you want to be more tasteful, you can tell the browser to emphasize the word really:

My Samoyed is <EM>really</EM> hairy.

Most browsers use italics to emphasize, but some use boldface and browsers for ancient ASCII terminals (e.g., Lynx) have to ignore this tag or come up with a clever rendering method. A picky user with the right browser program can even customize the rendering of particular tags.

There are a few dozen more tags in HTML. You can learn them by choosing
View Source from a Web browser when visiting sites whose formatting you
admire. You can also work through a comprehensive HTML guide, e.g.,
http://www.w3schools.com/html/html_reference.asp (Web) and
HTML & XHTML: The Definitive Guide by Musciano and Kennedy (O’Reilly, 2002; print).

Document Structure

Rollercoaster.  Santa Cruz, California
Armed with a big pile of tags, you can start strewing them among your
words more or less at random. Though browsers are extremely forgiving of
technically illegal markup, it is useful to know that an HTML document
officially consists of two pieces: the head and the body.
The head contains information about the document as a whole, such as the
title. The body contains information to be displayed by the user’s
browser.

Another structure issue is that you should try to make sure that you
close every element that you open. So if your document has a
<BODY> it should have a </BODY> at the end. If you
start an HTML table with a <TABLE> and don’t have a
</TABLE>, a Web browser may display nothing. Tags can
overlap, but you should close the most recently opened before the
rest, e.g., for something both boldface and italic:

My Samoyed is <B><I>really</I></B> hairy.

Something that confuses a lot of new users is that the <P> element
used to surround a paragraph has an optional closing tag </P>.
Browsers by convention assume that an open <P> element is
implicitly closed by the next <P> element. This leads a lot of
publishers (including lazy old me) to use <P> elements as
paragraph separators.

Here’s the HTML template from which documents at
philip.greenspun.com start out:


<html>
 <head>
  <title>New Doc</title>
 </head>
 <body bgcolor=white text=black>
  <h2>New Doc</h2>
  by <a href="https://r.search.aol.com/">Philip Greenspun</a>, revised April 1, 2003
  <hr>
  introductory text
  <h3>First Subhead</h3>
  more text
  <p>
  yet more text
  <h3>Second subhead</h3>
  concluding text
  <hr>
  <a href="mailto:philg@mit.edu">
   <address>philg@mit.edu</address>
  </a>
 </body>
</html>

Let’s go through this document piece by piece (see
for how it looks rendered by a browser).

The <HTML> element at the top says “I’m an HTML document”. Note
that this tag is closed at the end of the document. It turns out that
this tag is unnecessary. We’ve saved the document in the file
“https://r.search.aol.com/basic.html”. When a user requests this document, the Web server looks
at the file’s “.html” extension and adds a MIME header to tell the user’s browser that
this document is of type “text/html”.

The <HEAD> element’s primary purpose in this document is so that
one can legally use the <TITLE> element to give this document a
name. Whatever text is placed between <TITLE> and </TITLE>
will appear at the top of the user’s browser window, on the menu that
pops up when the user clicks on the Back button, and in his bookmarks
menu should he bookmark this page. After closing the head with a
</HEAD>, the body of the document is opened with a <BODY>
element, to which are added some optional parameters to set the
background to white and the text to black. Some Web browsers default to
a gray background, and the resulting lack of contrast between background
and text is sufficiently offensive that it may be worth changing the
default colors. This is a violation of some of the principles
articulated in this book because it potentially introduces an
inconsistency in the user’s experience of the Web. However, one need
not feel too guilty about it because (1) a lot of browsers use a white
background by default, (2) enough other publishers set a white
background that white pages won’t seem inconsistent, and (3) it doesn’t
affect the core user interface the way that setting custom link colors
would.

Just below the body, there is a headline, size 2, wrapped in an
<H2> element. This will be displayed to the user at the top of the
page. One could alternatively use <H1> but browsers typically
render that in a ridiculously huge font. Underneath the headline, it
makes sense to indicate authorship, link to a parent work, and specify
the revision date. The authorship link shows that someone is taking
responsibility for the content. The link to the parent work, e.g., a
book table of contents if the file is one chapter, helps users who’ve
landed on this page from a public search engine. The revision date is
important because Web pages often linger forgotten by the author but
still available to the public long after they are obsolete.
Notice in this example that the authorship phrase “Philip
Greenspun” is a hypertext anchor which is why it is wrapped in an
A element. The <A HREF= says “this is a hyperlink.” If the
reader clicks anywhere from here up to the </A> the browser should
send him to the root page on the server (“https://r.search.aol.com/”).

After the headline, author, and optional navigation, the template adds a
horizontal rule tag: <HR>. Don’t overuse these big lines across
the window: Real graphic designers use whitespace for separation. This
template uses <H3> headlines in the text to separate sections and
<HR>s at the very top to separate the document contents from the
headline information and at the very bottom to separate the document
contents from the author’s email link.

Underneath the last <HR>, the document is signed with
“philg@mit.edu”. The <ADDRESS> element usually results
in an italics rendering. Readers expect that they can scroll to the
bottom of a browser window and find out who is responsible for what
they’ve just read. Note that this one is wrapped in an anchor tag. If
the user clicks on the anchor text (my email address), the browser will
pop up a “send mail to philg@mit.edu” window. It is generally a good
idea to wrap every email address on a Web page in a “mailto” tag. Sadly
in Age of Spam it may not be a good idea to put any email address on a
Web page. An alternative to the author’s personal email address would
be a form that a reader could use to send a message to the author or
editor.

Tarting Up Your Pages


More than a decade of browser development and committee meetings have
introduced a whole raft of tags with which you can tart up your pages.
Instead of saying “this is a headline, level 3” you can say “stick this
in 18-point Helvetica Bold and make it red”. Instead of “emphasize
this”, you can say “stick this in 14-point Times Italic”. There are a
bunch of problems with filling up your document with tags such as FONT:

  • Older browsers on PCs will ignore them; every browser knows how to
    render a headline, level 3. Not every browser understands a directive
    to use a specific Microsoft font that ships with the Windows operating
    system.
  • Newer browsers will ignore them; mobile phones and palmtops are some
    of the most interesting devices attached to the Web and they only
    understand basic HTML.
  • When you change your graphic designer, you have to edit 10,000
    .html documents.

If you can’t “just say no” to formatting your documents instead of
working on the content, you might want to consider developing a
site-wide cascading style sheet. Here’s the cascading style sheet for
the online version of this book (http://philip.greenspun.com/panda/ ):


body {margin-left: 3% ; margin-right: 3%}
P { margin-top: 0pt; text-indent : 0.2in }
P.stb { margin-top: 12pt }
P.mtb { margin-top: 24pt; text-indent : 0in}
P.ltb { margin-top: 36pt; text-indent : 0in}
p.marginnote { background-color: #E0E0E0 }
p.paperonly { background-color: #E0E0E0 }
li.separate { margin-top: 12pt }

Each line of the style sheet gives formatting instructions for one HTML
element and/or a subclass of an HTML element. The first directive adds
a bit of whitespace between the browser window frame and the text within
a chapter. This small separation, a modification to the BODY tag,
should make reading easier . The next directive tells browsers not to
separate paragraphs with blank lines (“margin-top: 0 pt”), but rather
simply to indent the first line of a new paragraph by 0.2 inches (I
tried “3em” but it didn’t look right). Thus the paragraphs within
chapters will be mushed together like those in a printed book or
magazine. Books and magazines do sometimes use whitespace, however,
mostly to show thematic breaks in the text. This stylesheet therefore
defines three classes of thematic breaks and tells browsers how to
render them. The first, “stb” (for “small thematic break”) will insert
12 points of white space. A paragraph of class “stb” will inherit the
0.2 inch first-line indent of the regular P element. For medium and
large thematic breaks, more whitespace is specified and overrides the
first-line indent.

How does one use this style sheet? Park it somewhere on a Web server in
a file with the extension “.css”. This extension will tell the Web
server program to MIME-type it “text/css”. Inside each document that
uses the cascading style sheet, put the following LINK element inside
the document HEAD, just above the TITLE:

<LINK REL=STYLESHEET HREF="https://r.search.aol.com/books/philg.css" TYPE="text/css">

Note the leading “https://r.search.aol.com/” at the beginning of the HREF. This causes the
user’s browser to come back and request
“http://philip.greenspun.com/books/philg.css” before rendering any of
the page. Note that this will slow down page viewing a bit, although if
many pages refer to the same site-wide style sheet, users’ browsers
should be smart enough to cache it. If you read 17 chapters from this
book on-line with Microsoft Internet Explorer 6.0 for example, the
browser would request the philg.css style sheet only once.

Okay, now the browser knows where to get the style sheet and that a small
thematic break should be rendered with an extra bit of whitespace. How
do we tell the browser that a particular paragraph is “of class stb”?
Instead of “<P>”, we use

<P CLASS="stb">

right before the text that starts a new theme.

Book designers have all kinds of clever ways of setting off margin
notes, body notes, and footnotes. Not being a book designer or
especially clever, I simply defined a couple of styles that get rendered
with a gray background ("p.marginnote { background-color: #E0E0E0
}"
). This alerts readers that margin notes aren’t part of the
main text.

The final new subclass (“li.separate { margin-top: 12pt }“)
is directed at making lists with whitespace between each bullet item.
It worked nicely in Microsoft Internet Explorer circa 1998 but failed in
Netscape Navigator (if you’re under the age 20, ask your parents about
Netscape) so the book doesn’t use it (instead the chapters use two
line-break tags, <BR><BR>).

For a complete guide to all the Cascading Style Sheet directives,
look in
HTML
& XHTML: The Definitive Guide

and Cascading
Style Sheets: The Definitive Guide
(Meyer 2000; O’Reilly).

Now That You Know How to Write HTML, Don’t

“Owing to the neglect of our defences and the mishandling of the
German problem in the last five years, we seem to be very near the bleak
choice between War and Shame. My feeling is that we shall choose Shame,
and then have War thrown in a little later, on even more adverse terms
than at present.”

— Winston Churchill in a letter to Lord Moyne, 1938 (Churchill:
A Life
; Gilbert 1991)

HTML represents the worst of two worlds. We could have taken a
formatting language and added hypertext anchors so that users had
beautifully designed documents on their desktops. We could have
developed a powerful document structure language so that browsers could
automatically do intelligent things with Web documents. What we actually
have with HTML is a hybrid: ugly documents without formatting
or structural information.

Eventually the Web will work like a naïve user would expect it
to. You ask your computer to find you the cheapest pair of blue jeans
being hawked on the World Wide Web and ten seconds later you’re staring
at a photo of the product and being asked to confirm the purchase. You
see an announcement for a concert and click a button on your Web browser
to add the date to your calendar; the information gets transferred
automatically. More powerful formatting isn’t far off,
either. Eventually there will be browser-independent ways to render the
average novel readably.

None of this will happen without radical changes to HTML, however. We’ll
need semantic tags so that publishers can say, in a way that a
computer can understand, “This page sells blue jeans,” and
“The price of these jeans is $25 U.S.” Whether we need them or
not, we are sure to get new formatting tags with every new generation of
browser. (Personally I can’t wait to be able to caption photographs and
figures, a common feature of word processing programs in the 1960s.)

Back in 1994, a lowly graduate student wrote a paper titled
“We have Chosen Shame and Will Get War” (http://philip.greenspun.com/research/shame-and-war)
presenting a scheme for embedding semantic markup in HTML documents so
that it wouldn’t break old browsers (e.g., NCSA Mosaic!). More
importantly, the paper suggested that we needed to develop a common set of
document classes, e.g., “advertisement”, “novel”,
“daily-newspaper-article”, so that programmers could write software to
make life easier for authors and readers.

This paper was rejected from the Web Consortium’s 1994 conference,
apparently because the idea was too brilliant, radical, and
forward-looking for its time. The idea of semantic markup in documents
had barely been tested. Charles Goldfarb, Raymond Lorie, and Edward
Mosher tried it out in 1969 with Generalized Markup Language (GML).
They got their company to use it for about 90 percent of its document
production. But this was only at one little company so not too many Web
standards experts would have noticed. Oh yes, the company name was
“International Business Machines.”

The American National Standards Institute (ANSI) published its first
draft of Standard Generalized Markup Language (SGML) in 1980. A few
small organizations, such as the United States Department of Defense,
the Internal Revenue Service, and the Securities and Exchange
Commission, began using the semantic markup features of the new
language.

The most bizarre thing about HTML is that it borrows the
(uninteresting) syntax of SGML:

<element> ...  stuff being marked up ... </element>

but it doesn’t have the (interesting) semantic markup or document type
definition capability of SGML.

To Web publisher and Web users who read “We have Chosen Shame and Will
Get War”, it seemed natural to me that the folks who set the Web
standards would see the importance of semantic markup and machine
processing of documents on behalf of users. A student of Max Weber,
however, would not have been surprised that this paper was rejected and
that the whole semantic markup issue was ignored for six years. People
who write Web standards and go to Web conferences are not doing it
because they have a passion for Web publishing or Web surfing. They
have a passion for sitting on conference committees, sitting on
standards committees, and escaping the boredom of their hometowns by
going on company-paid trips to wherever these committee meetings happen
to be taking place. The people who are passionate about publishing are
busy building on-line applications. The people who are passionate about
surfing are at home with their cable modems.

It has been nine years since the “Shame and War” paper was published.
Has there been any progress since then? Yes and no. The Extensible
Markup Language (XML) has been standardized by the World Wide Web
Consortium (W3C). Described by Dan Lyke as “the subset of SGML that
Microsoft’s developers could understand”, XML addresses the need for
semantic markup but not the requirement that publishers agree on a
common set of classes for semantic markup to be useful. With XML, each
publisher or community of publishers can agree on some new document
types and concomitant sets of tags. Internet Explorer can render XML.
A variety of server-side tools are available for parsing XML, generating
XML from databases, converting XML to HTML, and authoring XML. What
does all of these XML tags mean though? More or less nothing,
which is why there is another project at the Web Consortium: The Semantic
Web (http://www.w3.org/2001/sw/).

If you are publishing structured data, does it make sense to use HTML or
XML files? Neither. XML will let you store and exchange structured
data. But that doesn’t mean it addresses the same problems as
database management systems. With XML, you can certainly keep a
catalog of products for sale in a file system directory and easily write
a computer program to pull out the price of an item. But you can’t
easily build an index to facilitate rapid retrieval of all the blue
items or all the items available in size 6. XML lets you store how many
items are left in inventory but you won’t get any support for writing a
program that subtracts 1 when an order is placed (and, more importantly,
making sure that 10 simultaneous subtractions from different users won’t
collide). An XML document is like one record or a series of records in
a database management system. XML is therefore useful if you want to
ship a record from one database to another, but it doesn’t really help
you build the entire database.

For most publishers it is most sensible to keep their information in
whatever database management system they’re accustomed to and write
scripts to generate either HTML or XML pages. With such an architecture
a change in language standards or publishing requirements could be met
by rewriting a couple of scripts rather than editing thousands of XML or
HTML files. A “database”? Does that mean a relational database
management system as discussed later in this book? No. If you aren’t
updating your data in real-time, an ordinary text file is fine.

For example, suppose that you are putting a company phone directory on
the Web. You can define a structured format like this:

first name|last name|department|office number|home number|location

There is one line for each person in the directory. Fields are separated
by vertical bars. So a file at MIT might look like this:


Philip|Greenspun|eecs|253-8574|864-6832|ne43-414
Rajeev|Surati|athletics|253-8581|555-1212|dupont gym
...

In less than an hour, you can write a simple computer program to read
this file and generate

  • A public Web service offering names and office phone numbers for
    everyone at the university
  • A public Web page for each department showing names and office phone numbers
  • A private Web page for each department showing names and home phone numbers

If you decide to start using a new HTML feature, you don’t have to edit
all these pages manually. You just need to change a few lines in the
computer program and then run it again to regenerate the HTML pages.

When the XML wave has finally broken on the beach and someone comes up
with a document type for phone listings, you can generate a set of
private and public XML files containing names and phone numbers. People
downloading an XML file will be able to tell their computer to dial the
phone number automatically, since the number will be encased in a
VOICE_PHONE_NUMBER element.

The high level message here is that you should think about the structure
of the information you are publishing first. Then think about the best
way to build an investment in that structure and preserve it. Finally,
devote a bit of time to the formatting of the final HTML or XML that you
generate and ship to users over the Web.

It’s Hard to Mess Up a Simple Page

Great Sand Dunes National Monument.  Mosca, Colorado.
People with limited time, money, and experience usually build fairly
usable Web sites. However, there is no publishing concept so simple that
money, knowledge of HTML arcana, and graphic design can’t make slow,
confusing, and painful for users. After you’ve tarted up your site with
frames, graphics, and color, check the server log to see how much
traffic has fallen. Then ask yourself whether you shouldn’t have thought
about user interface stability.

CD-ROMs are faster, cheaper, more reliable, and a more engaging
audio/visual experience than the Web. Why then do they sit on the shelf
while users greedily surf the slow, unreliable, expensive Web? Stability
of user interface.

There are many things wrong with HTML. It is primitive as a formatting
language and it is almost worthless for defining document
structure. Nonetheless, the original Web/HTML model has one big
advantage: All Web pages look and work more or less the same. You see
something black, you read it. You see something gray, that’s the
background. You see something blue (or underlined), you click on it.
Splash.  Las Vegas, Nevada.

When you use a set of traditional Web sites, you don’t have to learn
anything new. Every CD-ROM, on the other hand, has a sui generis user
interface. Somebody thought it would be cute to put a little navigation
cube at the bottom right of the screen. Somebody else thought it would
be neat if you clicked on the righthand page of an open book to take you
to the next page. Meanwhile, you sit there for 15 seconds feeling
frustrated, with no clue that you are supposed to do anything with that
book graphic on the screen. The CD-ROM goes back on the shelf.

The beauty of the browsers built since 1995 is that they allow the
graphic designers behind Web sites to make their sites just as opaque
and hard to use as CD-ROMs. Graphic designers are not user interface
designers. If you read a book such as Macintosh
Human Interface Guidelines
(Apple Computer, Inc.;
Addison-Wesley, 1993), you will appreciate what kind of thought goes
into a well-designed user interface. Most of it has nothing to do with
graphics and appearance. Pull-down menus are not better than pop-up
menus because they look prettier; they are better because you always
know exactly where to find the Print command.

Some of the bad things a graphic designer can do with a page were
possible even way back in the days of Netscape 1.1. A graphic designer
might note that most of the text on a page was hyperlinks and decide
just to make all the text black (text=#000000, link=#000000,
vlink=#000000). Alternatively, he or she might choose a funky color for a
background and then three more funky colors for text, links, and visited
links. Either way, users have no way of knowing what is a hyperlink and
what isn’t. Often designers get bored and change these colors even for
different pages on the same site.

There is probably a place in this world for Web sites that are pretty
rather than functional. Nonetheless it is worth weighing the prettiness
of a new design against the cold shock of unfamiliar user interface that
greets the user.

Java and Flash — The BLINK Tag Writ Large

Fascination.  Santa Cruz, California
A Fortune 500 company executive told us not to be deceived by his
company’s lack of Web presence. “We’re going to have a great site soon.
Customers will be able to see products before their retail launch and
upload audio comments on each item. We’ll have Java applets and
Flash animations.”

“Glad to hear that your company is so profitable,” We responded. “Since
you’re able to hire 50 tech support people for your Web site then you
must be raking in the bucks.”

“What do you mean 50 tech support people?!?!” he asked.

“If people can’t get a plug-in to work on their Windows machine or can’t
figure out how to record from a microphone on their PCs then surely you
must have a plan for dealing with all the support emails that they’ll be
sending to your webmaster,” we said.

“Uh… well, I guess we have to think about that…,” he mumbled as he
wandered off.

Before you spend money on animation, Java, or authoring content for a
plug-in, think about whether you couldn’t buy the on-line rights to an
interesting book on your Web site’s subject. Remember that search
engines don’t recognize animations, Java applets, or graphic
design. Search engines index text (Google does photos too but that is a
separate and comparatively seldom-used application). Therefore an
on-line book is going to pull a tremendous number of people into your
site.

Maybe you have infinite money and can buy the book plus a raft of
multimedia authors. It still might be worth remembering what brought
users to the Web in the first place: control and depth. Software such as
Java and Flash enables you to lead users around by the nose. Flash them
a graphic here, play them a sound there, roll the credits, and so
on. But is that really why they came to your site? If they want to be
passive, how come they aren’t watching TV or going to a lecture?

This may seem like an obvious point, but worth mentioning because there
are so many tools to convert PowerPoint presentations into Web
sites. The whole point of a PowerPoint-style presentation is that you
have a room full of people all of whose thoughts are to be herded in a
common direction by the speaker. Ideas are condensed to the barest bones
because there is such limited time and space available and because the
speaker is going to embroider them. The whole point of the Web is that
each reader finds his own path through a site. There is unlimited time
and space for topics in which the reader has a burning interest.

A Java applet can make a good site great in the following situations:

  • You need a richer user interface than you can get with HTML forms.
  • You need to respond to user input without network delays–mouse
    movements, for example.
  • You need to give the user real-time updates.

Richer User Interface

French Roast, 6th Avenue and 11th, Manhattan 1995.
A richer user interface is always harder to learn. Your readers don’t
want to learn how to use new programs. They already learned how
to use a Web browser and probably also word processing, spreadsheet, and
drawing programs. However, it is possible that you can come up with a
Java applet delivering such great benefits that people will invest
in learning your user interface.

Don’t get too excited by the possibility of offering a rich
custom user interface with Java. Adobe PhotoShop has a beautiful user
interface but it took Adobe hundreds of person-years to perfect it. It
takes Adobe hundreds of person-years to test each new version. It costs
Adobe millions of dollars to write documentation and prepare
tutorials. It takes users hours to learn how to use the program. You
don’t have a huge staff of programmers to concentrate on a single
application. You don’t have a full-time quality assurance staff. You
don’t have a budget for writing documentation and tutorials. Even if you
did have all of those things, your users don’t have extra hours to spend
learning how to use the Web site that you build. Either they are
experienced Web users and they want something that works like other
sites or they are naïve users who want their effort in learning
to use their browser and your site to pay off when they visit other sites.

Real-time Response

Burning car.  New Jersey 1995.
Some of the user-interface devices on a computer are just not
well-suited to the stateless request-response HTTP protocol. Even a
continuous network connection might not be good enough unless the Web
server and Web client are physically close to each other. Examples of
user-interface devices that require real-time response are the mouse,
the tablet, and the joystick.

You would not want to use a drawing tool that needed to go out to the
network to add a line. An HTML forms-based game might be fun for your
brain but it probably won’t have the visceral excitement of a
first-person shooter game on Xbox. Anything remotely like a video game
requires code executing on the user’s local processor.

Real-time Updates

Obvious candidates for Java include stock tickers, newsfeeds, and chat
systems. The user can launch an applet that spends the whole day
connected to a quote or headline server and then scrolls text around the
screen. Though obvious, these are applications where Java isn’t
essential. The information provider could just as easily have written a
“client-pull” HTML document by adding the following element to
the HEAD:

<META HTTP-EQUIV=REFRESH CONTENT="60; URL=update.cgi">

The user’s browser will fetch “update.cgi” 60 seconds after
grabbing the page with this element.

In 1995 some folks at Boston Children’s Hospital built a Web-based
real-time patient monitoring system using Java applets. Data from
instruments attached to people in the intensive care unit (ICU) would be
streamed back to doctors who could be in another area of the ICU,
working at another institution, or relaxing at home. See “A real time
patient monitoring system on the World Wide Web” by K. Wang, I. Kohane,
et al. in Proceedings AMIA Annual Fall Symposium 1996;:729-32.

Oh yes, it will crash the user’s browser

Surfer.  Santa Cruz, California

“Java on the client doesn’t work, and we at Netscape have done an
about-turn on client-side Java in recent months.”

— Marc Andreessen, VP Products at Netscape (Quoted in a trade journal,
July 1998)

Java is often promoted as a safe and reliable language. Unfortunately,
since those safe and reliable Java applets run on top of the unreliable
substrate of Java Virtual Machine + Java window system, it is inevitable
that your Java applet will eventually crash the user’s browser.
Plug-ins have similar drawbacks. If you read
the chapter on
server-side programming
you might decide that you can do everything
with programs that run on the server.

Why Graphic Designers Just Don’t Get It

Venice Beach, California.
Most of what I’ve said in this chapter goes against conventional wisdom
as observed on big corporate sites and in books on Web page design. One
possible explanation is that graphic designers get interfaces so wrong
because they never figured out that they aren’t building CD-ROMs. With a
CD-ROM, the designer can control the user’s access to the content. In a
classic book of the dotcom era, David Siegel’s Creating
Killer Web Sites
(Hayden Books 1997), readers are urged to
build an “entry tunnel” of three pages. Each of these three
pages should contain only a single image that “takes no more than 45
seconds to load”. Then there should be an “exit tunnel” with
three more full-page images. In between, there are a handful of
“content” pages that constitute the site per se.

Siegel is making some implicit assumptions: that there are no users with
text-only browsers; that users are willing to wait several minutes
before getting to the content of a site; that there is some obvious
place to put these tunnels on a site with thousands of pages. Even if
all of those things are true, if the internal pages do indeed contain
any content, the public search engines will roar through and wreck
everything. People aren’t going to enter the site by typing in
“http://www.greedy.com” and then let themselves be led around
by the nose by a designer. They will find the site by using a search
engine and typing a query string that is of interest to them. Google
does not think a Dave Siegel (TM) “entry tunnel” is
“killer”. In fact, it might not even bother to index a page
that is just one image (search engines can’t read text that is inside
GIF or JPEG image files).

If you intend to get radical by putting actual content on your Web
server, then it is probably a good idea to make each URL stand on its
own. Making a URL stand on its own has implications for site navigation
design. Each document will need a link to the page’s author, the
service home page, and the next page in a sequence if the document is
part of a linear work. Remember, the Web is not there so that publishers
can impose what they think is cool on readers. Each reader has his own
view of the Web. Maybe that view is returned by a search engine in
response to a query string. Maybe that view is links from a friend’s
home page. Maybe that view is a link from a personalization service that
sweeps the Internet every night to find links and stories that fit the
reader’s interest profile.

Our task as Web publishers is to produce works that will fit seamlessly
not into the Web as we see it, but into the many Webs that our readers
see.

An Information Designer Who Got It

Everything important about Web design is on pages 146 to 149 of Edward
Tufte’s
Visual
Explanations: Images and Quantities, Evidence and Narrative
(1997;
Graphics Press). Here are Tufte’s points:

  • The screen should contain information, not navigation or
    administration icons. The information should become the interface,
    i.e., clicking on a word that was itself informational should take you to
    a screen with more detailed information.
  • Give users broad flat overviews of the information (e.g., tables of
    contents) rather than forcing them through sequential screens of choices.
  • Organize your data according to expected user interest rather than
    mimicking the internal structure of your organization [see the university research lab example in Chapter 1].
  • Why use icons for navigation when words are clearer and take up less
    screen space?.

What is truly impressive is that Tufte wasn’t even writing about the
Web. He was explaining his design for a guide kiosk at Washington’s
National Gallery. Moreover, because the pages on which he articulates
these ideas happen to be mostly given over to illustrations of kiosk
screens, Tufte actually gets these fundamental ideas across in less than
two pages of text.

The Alexander Nevsky of the Long-Suffering Users

Surfer statue.  Santa Cruz, California
A friend of ours was called in to design a Web site for a multi-billion
dollar company. The new president had called everyone responsible for
the old site into a conference room. He plugged in a laptop with a 14.4
modem and handed the Web group leader a stop watch.

“Time how long it takes to download the home page.”

63 seconds.
“Now time how long it takes to get the first results back from a
search.”

90 seconds.

“What do you guys plan to do about this?” asked the president.

“Uh… Well.. we could get a faster server,” responded the Web expert.

“Great. Thanks. You’re all fired.”

Our friend was specifically asked by the president to do a site with no
animation and no Java. The focus would be entirely on a fast search of
a server-side database.

Multi-Page Design and Flow

Most of this chapter has been about the design of individual pages;
most of this book is about the design of multi-page Web applications.
The bad design of a single page will offend a user; the bad design of a
the page-to-page flow of a site will defeat a user.

Let’s look at general design principles that can be applied to different
kinds of Web applications.

One of the things that users love about the Web is the way in which
computation is discretized. A desktop application is generally a
complex miasma in which the state of the project is only partially
visible. Despite software vendors having added multiple-level Undo
commands to many popular desktop programs, the state of those programs
remains opaque to users.

The first general principle is Don’t break the browser’s Back menu.
Users should be able to go forward and back at any time in their session
with a site. For example, consider the following flow of pages:

  • choose a book
  • enter shipping address
  • enter credit card number
  • confirm
  • thank-you

A user who notices a typo in the shipping address on the confirm
page should be able to return to the shipping address entry form with
either the Go menu or the Back button, correct the address and proceed
from there. It would be nice if the credit card entry form was
defaulted by the server with any previously entered credit card data,
but this is not as essential as making sure that arbitrarily moving back and
forth did not result in an error. This idea sounds simple but it can be
difficult to implement, especially as publishers’ ambitions become
grander, as more session state is kept by the server, and as publishers
start using JavaScript.

The second general principle is Have users pick the object first and
then the verb
. For example, consider the customer service area of
an ecommerce site. Assume that Jane Consumer has already identified
herself to the server. The merchant can show Jane a list of all the
items that she has ever purchased. Jane clicks on an item (picking
the object
) and gets a page with a list of choices, e.g., “return
for refund” or “exchange”. Jane clicks on “exchange” (picking the
verb
) and gets a page with instructions on how to schedule a pickup
of the unwanted item and pages offering replacement goods.

How original is this principle? It is lifted straight from the Apple
Macintosh circa 1984 and is explicated clearly in Macintosh
Human Interface Guidelines
(Apple Computer, Inc.;
Addison-Wesley, 1993). Originality is valorized in modern creative
culture but it was not a value for medieval authors and it does not help
users. The Macintosh was enormously popular to begin with and then
Microsoft went on to monopolize the desktop with a copy of the
Macintosh. Web publishers can be sure that the vast majority of their
users will be intimately familiar with the “pick the object then the
verb” style of interface. Sticking with a familiar user interface cuts
down on user time and confusion at a site.

What happens when publishers ignore these guidelines?

Date: Wed, 5 Aug 1998 23:20:33 -0400 (EDT)
From: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
To: philg@MIT.EDU
Subject: Another bad Web user interface example
I thought it was a really great idea when BankBoston [ed: now Fleet]
replaced their clunky modem-only terminal-based home banking system with
one that works over the Internet.  Not only is the user interface
painfully slow (downloading images that are different for every page
over a 33.6 modem and an encrypted connection), but it totally
disregards the perfectly good user interface built into my browser.  In
particular, if you try to actually navigate anywhere and then do
something, you get this:
    Error Description 2300004 - Screen Error
    You only need to click once on your selection. Please do not
    double-click, use the buttons on your browser, or open a second
    window while logged on to BankBoston. You can use the
    buttons on the screen or Short Cut to move around the system
    easily.
Oops!
What's worse, once you get it into this sort of a state, it is totally
unable to unwedge itself, and going back to the login screen gives only
the helpful message:
    Error Description 1501002 - Invalid Card Number
    This Card Number is currently logged on. Please make sure
    you have logged off the system and try again.
Um, hello?!
The only way to communicate with them is through the feedback function,
which I can't use because their system won't talk to me.  (Of course,
the whole thing is run under Windows NT, judging by the file names.  I
think I should probably be very concerned about that.)
Eventually, I was able to ``log in'' again, and got to the `feedback'
section to send them a message.  I spent about ten minutes composing
my diatribe, hit the ``images'' button to figure out which inline
image was hiding the ``send'' function, and then hit it.  It comes
back with another error message -- the date I had given it did not fit
its simple-minded notions of what a date should look like, so it gave
me another error message.  Of course, I didn't want to lose my
carefully composed diatribe, so I hit Meta-Back to get back to the
form.  (I then added another paragraph about what kind of idiot would
give users a blank text field to enter a date without any indication
that the simple-minded program would only accept one form.)  You can
of course guess what happened: I changed the format of the date to the
one it wanted, hit ``send'', and it gave me the original error message
again.  Oops... better find something to waste ten minutes doing until
it times out again!
All in all, it took me a good hour to finally send my message, and I
never did manage to pay my bills.

According to the July 27, 1997 issue of PC
Week
, BankBoston’s Web service [now Fleet Bank] was developed by
Sapient, a consulting firm, using virtually the entire panoply of
technologies that were fashionable in corporate IT departments at the
time: WebObjects, Windows NT, and a C++ and CORBA middleware layer.

Summary

Here’s what you might have learned in this chapter:

  • Learning basic HTML shouldn’t take more than a few minutes.
  • The more HTML you know, the uglier and harder to use your site is
    likely to be.
  • HTML is not powerful enough to express the most interesting
    structural characteristics of your documents
  • XML is powerful enough to represent structure, but XML documents
    represent records in a database, not a database management system
  • You may want to keep your content in a database management system of
    some kind instead and generate HTML and XML pages programmatically
  • If you have a limited budget, spend it on content that search
    engines can index rather than style and flash.
  • Don’t forget that using Java applets or plug-ins will get you into
    the business of educating and supporting users.
  • Because a search engine can send users to any document at your site,
    every document on your site should have navigation links to the rest
    of your content.

More




or
move on to Chapter 6: Adding Images to Your Site


philg@mit.edu

Reader’s Comments

The point about a consistent user interface is very important. As a university student I have come into contact with WebCT Vista. This program is a web based program that is supposed to enable students and teachers to interact and to enable students to find information (at least I think that is what it is meant to do). It is a god awful program that uses Java and Javascript and other nasties, breaks the user interface of the web browser (for example the back button) and is not internally consistent. An example of an internal inconsistency (regarding the user interface), there are two separate home buttons that are not distinguishable , one that links to the students home on WebCT Vista, and one to the home of the subject. I hate the program and have only talked to one person who likes it. Rather they tolerate it ’cause they are blind and have been taught to use it.

— Anonymous Smith, January 20, 2007

I’m interested in marking-up existing HTML documents for my own use for my research.–highlight, add comment, etc. I can’t afford acrobat, and have also tried to save the page as a pdf and import to word. Any suggestions?

— Ellen Frick, February 6, 2007


Add a comment

Related Links

  • Principles of good website design- Some additional thoughts on html, css and other guiding principles for good, clean website design   (contributed by Raghu Srinivasan)
  • Charlotte’s HTML Help- A site with HTML tutorials.   (contributed by C D)
  • HTML Learning Tool- This is a free tool to help you learn HTML concepts.   (contributed by Elizabeth Barnwell)
  • The How To Guide To Learning HTML- This resource is designed to teach you everything you need to know so that you can begin building your own customized web pages in standard HTML 4.   (contributed by G R)
  • HTML Cheat Sheet (For Beginners)- An easy to understand HTML cheat sheet for people who wants to learn the basics. Available in a downloadable PDF document and PNG file.   (contributed by John Stevens)
  • HTML Beginners Guide- This guide walks you through the basics of HTML (including the most recent developments of HTML5 and CSS3). Also included is a handy HTML5 Cheat Sheet that students can use.   (contributed by John Stevens)
  • LoadView- Load Testing & Website Performance Tools that
    stress test your website, web-apps, mobile and API.   (contributed by Glenn Lee)
  • Ultimate HTML 5 Cheat Sheet- WPKube has prepared an extensive, up-to-date, ultimate cheat sheet on HTML 5— it includes all the tags listed in alphabetical order. We also included the availability of the tag from the previous HTML 4 version for comparison.   (contributed by Dev Sharma)


Add a link

Source Article