Introduction to Electronic Publishing

Electronic publishing is a publishing method where the final release of the document or media is released electronically. For example, documents or other media placed on a CD-ROM, the Internet or through e-mail.

Telling stories with digital tools can be as easy or as complex as you wish. In its most basic form, digital storytelling can be as simple as a student-published book or poem that is either printed or shared online. Once you and your students gain experience, you can move on to creating stories with video, animation, and sound. Let’s begin with the basics – stories told in print with digital photos and illustrations. To get started, you’ll want to explore a bit and see which tools you have available. Here’s a list of suggested tools to get you started:

Computer and Word Processing Software

The good news is that you don’t need the latest and greatest to tell stories in print. A computer that includes a word processor like Microsoft Word, AppleWorks, or Pages is a good place to start. TextEdit for Mac OS X or WordPad for PC are basic word processors included for free on your computer.

Photo-Editing Programs

All Apple computers sold in the last five years have included iPhoto, a handy photography program that also includes a great book creation tool. PC users with Windows XP may want to check out Picasa and Picaboo as excellent free alternatives to edit photos and create books.

Drawing Programs

While programs like AppleWorks and Word have basic drawing tools, you might want to look at more kid-friendly alternatives such as Kid Pix. There are also two great free drawing programs that work on Mac and PC computers: TuxPaint is a software program that is perfect for younger students – it includes several fun drawing tools and stamps to help kids create personal works of art. For older students, check out ArtRage from Ambient Design.

Other Options

If you want to incorporate photographs but don’t have the time or equipment to have students take their own pictures, try using one of these free online photography sites (PDF).

Advantages of electronic publishing

Low cost affair – The first and foremost benefit of electronic publishing is the low cost associated with it. The publisher need not spend extra bucks to get the books or papers printed. Also, the sum of expenses involved in setting up a few state-of-the-art computers and offering online access to readers is way less compared to printing volumes of material. Moreover, the distribution cost is absolutely zero in case of electronic publishing.

Environment-friendly – Digital printing is a more eco-friendly method as there is negligible use pf paper. Unlike traditional printing where reels of paper are wasted in printing the first copy, the proofread copy and finally, the printable version, electronic publishing manages this using various publishing tools and sharing them over mails or on the Web.

Improved quality – The final product offered to the reader through electronic publishing is of a premium quality not only in terms of the animations, live links, interactive content, videos, and graphics used, but also from the point of view of layout. A well-completed digital magazine engages a reader more than a physical magazine in hand, thanks to the appealing look of the published document.

Easy to update and edit documents – Articles and journals often need regular edits to have a better impact on the readers. A number of documents such as repair manuals also need to be kept up to date always so that they are of help to the person referring them. In case of print manuals, it is extremely expensive and tedious to keep the content updated. However, with electronic publishing, you just have to edit portions an existing manual and your document is ready to go live.

Wider reach – Printed material are limited and they reach only a specific number of readers. However, in the case of electronically published documents, the number of readers multiply extensively. In other words, digital publishing means an ever-growing reader base, provided the published document is worth a read.

Disadvantages of electronic publishing

Marketing efforts are high – Although the distribution costs associated with digital publishing is almost zero, the effort required to market the publication is quite high. Although nowadays social media marketing plays a pivotal role in such scenarios, growing readership may take time.

Chances of plagiarism – Most of the electronically published articles are available easily on the Internet. As such, several authors remain apprehensive of losing their copyright on the content. There is no dearth of people who are ready to plagiarize quality material, and electronic publishing, in an indirect way, makes way for it.


Publishing Basics

Self-Publishing Basics

I’m assuming that you are either considering self-publishing, getting ready to self-publish, or approaching that point. If that’s not the case, and you’re just procrastinating, get back to writing. Go, scoot. You can worry about this stuff when you are ready to publish.

While we are on the topic, by “ready to publish”, I mean that your manuscript is finished, you have printed it out, read it, read it aloud, made all the necessary changes, have given it to at least one fellow writer for an objective critique.

I’m assuming they gave it a thorough going over, that you incorporated their suggestions (or batted them away with compelling counter-logic), and they have given the thumbs up to the final version. I’m also hoping that there was at least one point between the last draft and being ready to publish where you let the manuscript sit for a few weeks, so that you could read it with fresh eyes.

Okay, so you’re ready. Now you need to turn that unedited manuscript into a professional-looking e-book – meaning you need an editor, a cover designer, and either to learn how to format e-books yourself, or a hire a professional.

Self-Publishing Service Companies

Before we get into the details, some of you may be tempted to use one of the many self-publishing service companies that have sprung up. In short, avoid. Out of the three main steps – writing, publishing, and marketing – publishing is by far the easiest and these companies will give you no help with writing or marketing. And anyway, the value of the assistance they provide with publishing is questionable. Most of these companies overcharge for basic services both through a hefty upfront fee, and by taking a big chunk of your royalties.

Avoiding these companies doesn’t mean you have to do everything yourself; you can hire help at every step along the way, for much more reasonable fees, and with far better results. Finally, you really want to do the actual uploading yourself.

Not doing so will mean that your account is in the control of a third party, and they, rather than you, will have access to crucial data such as your live sales figures, and will be telling Amazon where to send the monthly checks. If you really, really must use one of these services (and I really don’t think it’s necessary), at least go with someone like BookBaby, who only charge a (comparatively reasonable) upfront free, and don’t touch your royalties.


The publishing landscape is changing continually; it’s essential to keep in touch with developments so that you can exploit opportunities as they arise. The following blogs are must-reads: Dean Wesley Smith, Joe Konrath, and The Passive Voice. The Creative Penn and Lindsay Buroker’s blog are also highly recommended.

The Kindle Boards Writer’s Cafe is the most popular hang-out for self-publishers – a real mixture of those starting out and those that have already sold tens or hundreds of thousands of books. It’s a great place to find editors, cover designers, formatters, artists, and to get advice on all aspects of self-publishing. I have found that the best way to get recommendations for any service provider is to ask a fellow writer.


Your book’s cover is the face it shows the world. You want to make a good first impression, don’t you? A smart, professional cover makes all the difference. People really do judge a book by its cover. I can hear the complaints: this stuff shouldn’t matter; it’s all about the writing. Look, the world is unfair.

Get over it. If you want your book to stand out from the crowd, if you want to send the reader a signal that you have taken as much care with the inside of the book, you better make sure the outside looks good. In short, get a professional to design your cover.

Joel Friedlander has an excellent post on common mistakes book cover designers make, and I have a post here covering design basics and the process I go through with my designer. I recommend reading them both (note: Joel’s site has an astonishing number of fantastic posts on all aspects of self-publishing. He also runs monthly cover design awards, providing expert commentary on most entries).


This might be where self-publishers skimp most of all. Unfortunately for them, readers will spot the errors straight away. But even if they have eliminated the obvious stuff, such as typos or grammar issues, there may well be deeper problems.

If you aren’t planning to hire a professional editor, I strongly urge you to reconsider, and to read this article on the importance of editing, as well as this example of how much an edit can alter and improve a story. The following three posts by experienced editors (here, here, and here) are also essential reading. I hope we’re agreed now that you need an editor. If not, please re-read these posts. Editing is crucial.

When you dip your toe into the indie marketplace, you will notice many editors charging surprisingly low rates. I’m all for a good deal, but I would urge extreme caution here. An editor that is charging $200 is probably only going to do some quick proofing. Self-publishers get dinged in reviews about editing more than anything else. By employing a qualified, experienced editor to give your manuscript a real edit, you will be ahead of the pack.

Every time I get an MS back from my editor, her suggestions have improved the work immeasurably. But much more importantly, I learn something. You aren’t just investing in your book, but in yourself as a writer. If you don’t engage a professional editor, you will regret it. Most readers sample a work first and most e-book retailers allow them to download a chunk for free to decide if they want to buy. The size of the sample is around 10% on Amazon, but can be larger on the other sites.

I know what the main objection will be – price. But you must consider it an investment rather than a cost. If you can’t afford it, find a way. Save, barter, crowdfund, agree a payment plan with your editor, give up that diamante-encrusted ham you are so fond of – whatever it takes (although I would draw the line at getting into debt and/or an elaborate heist).


Once your manuscript is edited, and your cover is ready, you will need to turn that into a neatly formatted e-book that will wrap and flow and resize, and have nifty features like a clickable table of contents (and links to your other books). If you don’t own an e-reader, and you have never read an e-book, I recommend downloading the Kindle software so you can play around with it (available for any computer, tablet or smartphone).

Grab some free books while you are at it, and look at the formatting. Watch how the text reflows when you make the font bigger and smaller. If it is formatted correctly, it will all be quite neat.

Formatting can be a painful process the first time, but it gets much easier after that. I think my first book took a few days to figure everything out. My next one took a few hours. Now, after four books, I can even format a complicated non-fiction book very quickly. Essentially, it involves downloading some free software and playing around with basic HTML. Some of the most non-technical people I know have been able to master it, but if it’s too much, you can hire someone cheaply (for between $100 and $200).

What you absolutely mustn’t do is use one of the shortcuts that many self-publishers employ. Some of these “tricks” involve simply saving your MS as HTML document then uploading directly to Amazon. I can’t emphasize enough what a mistake this is. Your book will most likely contain serious formatting errors. And even if it looks fine on the Kindle, it may not look okay on an iPad or iPhone.

Many self-publishers who used this shortcut were surprised when their books were a mess on the new Kindle Fire. I wasn’t, as this shortcut will leave all sorts of hidden code in your book that can cause problems as it is interpreted differently by different devices. To do things the right way, to learn how to format your books so they look perfect on every device, or to get some recommendations on paid formatting services, check out my formatting page.


Since December, self-publishers have a big decision to make when they upload: should they go exclusive with Amazon? The community is divided on this topic, and I tried to present both sides on my blog. Here is my initial post outlining the pros and cons of KDP Select, and explaining why I wasn’t participating. Here’s one author’s guest post on their success with the program, and here’s another.

Finally, here’s one from an author who has seen sales increase by staying out. You really have to make your own mind up (and this is where keeping in touch with the latest developments on the above blogs and the Kindle Boards forum comes in handy).

If you don’t decide to go exclusive, the main sites you need to upload to are Amazon (details below), and Smashwords (who will distribute your work to Barnes & Noble, Sony, Diesel, iTunes and all the global Apple sites, as well as Kobo and all their partners, such as FNAC and WHSmith). Additional sites such as DriveThruFiction, AllRomance/OmniLit, and Xinxii (who will also distribute to Casa del Libro in Spain) are worth considering, but please note the first two require ISBNs (Smashwords provide free ISBNs and Amazon and Xinxii don’t require them). I’ve detailed the steps below for Amazon, but all the sites are quite similar.

If you have decided to go exclusive with Amazon, then you only have one site to upload to. All you need is your cover and your formatted e-book file. Setting up an account is simple, and only takes a minute or two. You will need to fill out your name and address and your payment information for royalties (if you are in the UK, you will be paid by bank transfer for UK sales, but by check for everything else).

When you click “Upload New Title” you will be taken to a new page, where you will fill out all the information about your book: the title, the author’s name, the description/blurb (nifty advice here), and the publisher’s name – you can leave that blank, enter your own name or that of your publishing company. Whether you set up your own publishing company or not is something you must decide for yourself, and there is some advice on that, and other practicalities such as tax, copyright, and ISBNs here.

Next you must enter the book’s categories. You only get two, so choose them wisely. Try and drill down as much as possible. For example, under Fiction, you will find a sub-section for Romance and further sub-sections for Regency, Historical, Time-Travel etc.

Picking Regency will automatically include you in the general Romance and Fiction categories. You also get to pick seven keywords. Try not to make these too generic, as these decide whether your book shows up on searches on Amazon. As such “fiction” won’t be much use to you. “Paranormal romance” or “cozy mystery” will be much better. There is more detailed advice on the importance of choosing the right categories here.

The last item on the page is the box to upload your book. I strongly urge you not to enable DRM. It does nothing to prevent piracy and only antagonizes readers (some boycott books with DRM).

On the second and final page, you must select which territories you wish to sell your book in (which is usually all, but some self-publishers may have sold UK or US rights to a publisher and will then have to exclude that territory).

Next you set your price. Amazon pay 70% royalties on books priced between $2.99 and $9.99 (and £1.49 and £6.49 in the UK) and 35% outside that. Note that the 70% royalty rate only applies to sales in the US, Canada, the UK, and the countries served by the German, French, Italian, and Spanish Kindle Stores. Sales in all other countries, like Australia, Ireland, or South Africa, will only pay 35% royalties, no matter what price you set. For detailed advice on pricing, please read this extensive post.

The final box refers to e-book lending (Kindle owners can lend books to each other for two week periods, but can’t read the book themselves while it is lent out). I recommend enabling this feature. And then you are done. The book will take anything between a few hours and three days to go live on Amazon, but lately it takes less than 24 hours. When it does appear, congratulate yourself: you are now a published author.

Mailing List

One of the most crucial (and under-used) tools is the mailing list. I have a clickable link at the back of all my books which takes readers to a newsletter sign-up. I only mail this list when I have a new release, and now it’s responsible for significant sales every time I launch a book. Get it going from the start. I use MailChimp – it produces very pretty emails, you can track who opens your email or clicks on your links, and it’s free. It’s also a great way to announce your newly published book to all your friends, family, and colleagues. Don’t forget to include links to the free Amazon apps, as many people don’t know they can enjoy e-books on their smartphones and laptops.

Marketing and Promotion

This is the stage which has most writers throwing their hands in the air. All those hours whittling prose can render one somewhat anti-social and technophobic. Some writers won’t even contemplate self-publishing because they want to avoid this stuff, and shoot for a traditional publishing deal which they think will allow them to just write. That, I’m afraid, is a myth. Publishers these days expect writers to be active on social media. Writers are expected to shoulder the burden of connecting with readers, whether they self-publish or not.

For those averse to this side of things, I have some good news. The most important marketing you can do is to ensure you have a professional product on the market: a great story, an arresting cover, a professionally edited book, and proper formatting.

On top of that, many successful self-publishers such as Bob Mayer advise that promotion is more-or-less pointless until you have a few titles out, as you are unlikely to get the return on your efforts to justify the time spent. This makes sense. If readers enjoy your work, they tend to purchase everything you have published. If you only have one book out, they can’t buy anything else.

Social Media

There are plenty of people out there who will tell you that blogging is essential, or you’ll never make it without a huge number of Twitter followers, or without a slick website, or an active Facebook Page, or by being active on Tumbler or Pinterest, or whatever the latest fad is. Frankly, there are too many counter-examples to agree with sweeping statements like that. There are plenty of bestselling self-publishers who built themselves up from nothing by doing very little of all that stuff.

Print Versions

Many self-publishers don’t bother with print versions as it’s next-to-impossible to get their work into bookstores. This is a mistake. The overwhelming majority of readers still read print books.

Keep Writing!

Much of the above might sound like hard work, but it really doesn’t have to be. Focus on the basics: a well-written story, a smart cover, a proper edit, clean formatting, and an enticing blurb. You only have to do that stuff once, and then you can get back to working on the next book.


Document Layout

In computer vision, document layout analysis is the process of identifying and categorizing the regions of interest in the scanned image of a text document. A reading system requires the segmentation of text zones from non-textual ones and the arrangement in their correct reading order.

Detection and labeling of the different zones (or blocks) as text body, illustrations, math symbols, and tables embedded in a document is called geometric layout analysis. But text zones play different logical roles inside the document (titles, captions, footnotes, etc.) and this kind of semantic labeling is the scope of the logical layout analysis.

Document layout analysis is the union of geometric and logical labeling. It is typically performed before a document image is sent to an OCR engine, but it can be used also to detect duplicate copies of the same document in large archives, or to index documents by their structure or pictorial content.

Overview of methods

There are two main approaches to document layout analysis. Firstly, there are bottom-up approaches which iteratively parse a document based on the raw pixel data. These approaches typically first parse a document into connected regions of black and white, then these regions are grouped into words, then into text lines, and finally into text blocks. Secondly, there are top-down approaches which attempt to iteratively cut up a document into columns and blocks based on white space and geometric information.

The bottom-up approaches are the traditional ones, and they have the advantage that they require no assumptions on the overall structure of the document. On the other hand, bottom-up approaches require iterative segmentation and clustering, which can be time consuming.

Top-down approaches are newer, and have the advantage that they parse the global structure of a document directly, thus eliminating the need to iteratively cluster together the possibly hundreds or even thousands of characters/symbols which appear on a document. They tend to be faster, but in order for them to operate robustly they typically require a number of assumptions to be made about on the layout of the document.

There are two issues common to any approach at document layout analysis: noise and skew. Noise refers to image noise, such as salt and pepper noise or Gaussian noise. Skew refers to the fact that a document image may be rotated in a way so that the text lines are not perfectly horizontal. It is a common assumption in both document layout analysis algorithms and optical character recognition algorithms that the characters in the document image are oriented so that text lines are horizontal. Therefore, if there is skew present then it is important to rotate the document image so as to remove it.

It follows that the first steps in any document layout analysis code are to remove image noise and to come up with an estimate for the skew angle of the document.

Example of a bottom up approach

In this section we will walk through the steps of a bottom-up document layout analysis algorithm developed in 1993 by O`Gorman. The steps in this approach are as follows:

  1. Preprocess the image to remove Gaussian and salt-and-pepper noise. Note that some noise removal filters may consider commas and periods as noise, so some care must be taken.
  2. Convert the image into a binary image, i.e. convert each pixel value to completely white or completely black.
  3. Segment the image into connected components of black pixels. These are the symbolsof the image. For each symbol, compute a bounding box and centroid.
  4. For each symbol, determine its k nearest neighbors where k is an integer greater than or equal to four. O`Gorman suggests k=5 in his paper as a good compromise between robustness and speed. The reason to use at least k=4 is that for a symbol in a document, the two or three nearest symbols are the ones right next to it on the same text line. The fourth-nearest symbol is typically on a line right above or below, and it is important to include these symbols in the nearest neighbor calculation for the following.
  5. Each nearest neighbor pair of symbols is related by a vector pointing from one symbol’s centroid to the other symbol’s centroid. If these vectors are plotted for every pair of nearest neighbor symbols, then one gets what is called the docstrum for the document (See figure below). One can also use the angle Θ from the horizontal and distance D between two nearest neighbor symbols and create a nearest-neighbor angle and nearest-neighbor distance histogram.
  6. Using the nearest-neighbor angle histogram, the skew of the document can be calculated. If the skew is acceptably low, continue to the next step. If it is not, rotate the image so as to remove the skew and return to step 3.
  7. The nearest-neighbor distance histogram has several peaks, and these peaks typically represent between-character spacing, between-word spacing, and between-line spacing. Calculate these values from the histogram and set them aside.
  8. For each symbol, look at its nearest neighbors and flag any of them that are a distance away which is within some tolerance of the between-character spacing distance or between-word spacing distance. For each nearest neighbor symbol which is flagged, draw a line segment connecting their centroids.
  9. Symbols connected to their neighbors by line segments form text lines. Using all the centroids in a text line, one can compute an actual line segment representing the text line with linear regression. This is important because it is unlikely that all the centroids of symbols in a text line are actually collinear.
  10. For each pair of text lines, one can compute a minimum distance between their corresponding line segments. If this distance is within some tolerance of the between-line spacing calculated in step 7, then the two text lines are grouped into the same text block.
  11. Finally, one can calculate a bounding box for each text block, and the document layout analysis is complete.

Layout analysis software

  • OCRopus – A free document layout analysis and OCR system, implemented in C++ and Python and for FreeBSD, Linux, and Mac OS X. This software supports a plug-in architecture which allows the user to select from a variety of different document layout analysis and OCR algorithms.
  • OCRFeeder – An OCR suite for Linux, written in python, which also supports document layout analysis. This software is actively being developed, and is free and open-source.


Document Enhancement

When you scan documents, you can sharpen the text and increase accuracy by using the Text Enhancement feature in Epson Scan.

See one of these sections for instructions on text enhancement.

Text Enhancement Using the PDF Button

Office Mode: Text Enhancement in Office Mode

Home Mode: Text Enhancement in Home Mode

Text Enhancement Using the PDF Button

Place your document on the document table. See Placing Documents or Photos for instructions.

Press the  PDF button on the scanner.

When you see the Scan to PDF window, click Settings. You see the Scan to PDF Settings window.

In the Scan to PDF Settings window, select the Image Type and Destination settings. For details, see Scanning to a PDF File Using the PDF Button.

Click the Text Enhancement check box.

Make any other necessary image adjustments. See Adjusting the Color and Other Image Settings for details.

Click File Save Settings. Make file save settings as necessary and click OK. See Scanning to a PDF File Using the PDF Button for details.

Click Close to close the Scan to PDF window, then click Scan or press the Start button on the scanner. Epson Scan scans your page.

When you are finished scanning all of your pages, click Finish or press the PDF button on the scanner. Your document is saved as a PDF file in the Pictures or My Pictures folder, or in the location you selected in the File Save Settings window.

Text Enhancement in Office Mode

Place your document on the document table. See Placing Documents or Photos for instructions.

Start Epson Scan. See Starting Epson Scan for instructions.

In the Office Mode window, select the Image TypeDocument SourceSize, and Resolution settings. For details, see Scanning in Office Mode.

Click the Text Enhancement check box.

Click Preview to preview your document, then select your scan area. For details, see previewing and Adjusting the Scan Area.

Make any other necessary image adjustments. See Adjusting the Color and Other Image Settings for details.

Click Scan. The File Save Settings window appears.

Make File Save and PDF settings as necessary. See Scanning to a PDF File in Office Mode for details.

Click OK.

Text Enhancement in Home Mode

Place your document on the document table. See Placing Documents or Photos for instructions.

Start Epson Scan. See Starting Epson Scan for instructions.

In the Home Mode window, select the Document TypeImage Type, and Destination settings. For details, see Scanning in Home Mode.

Click the Text Enhancement check box.

Click Preview to preview your document, then select your scan area. For details, see previewing and Adjusting the Scan Area.

Make any other necessary image adjustments. See Adjusting the Color and Other Image Settings for details.

Click Scan. The File Save Settings window appears.

Make File Save and PDF settings as necessary. See Scanning to a PDF File in Home Mode for details.

Click OK. Epson Scan scans your document.

The enhancement techniques in document capture

Two main concerns for any document imaging exercise are the image quality and the file size. Anyone will need to get the best possible image quality while keeping the file size to a minimum for obvious reasons. Thus image enhancement has become an essential step in a well-defined capture workflow. The purpose of image enhancement (image cleanup / image processing) is to make the images more readable, and also to remove unwanted noise reducing the storage requirements. This is especially important for forms processing / OCR applications in order to improve character recognition. There are number of image enhancement techniques available today. Described below are 8 such image processing techniques.

1. Deskewing

In a production scanning set up, document pre-processing is the  most time consuming step. One objective of this step is to arrange the documents correctly by rotating (incorrectly filed documents) and aligning them together.  The De-skew facility in production capture applications helps to reduce this effort by automatically de-skewing misaligned images. The De-skew process can straighten pages which were misaligned during the document feeding process, within a specified range of degrees.

A more advanced feature is available with Kofax VRS called content based rotation. VRS can analyze the content of the image and correct the orientation accordingly.

2. Black border cropping & removing

Cropping refers to the removal of the outer parts of an image. In document scanning, black border cropping is one technique that is used to remove the unnecessary black colour borders from an image. Border cropping removes black borders from the image completely also resulting in the reduction of image height and width. However this does not reduce the resolution of the image. (This is an Illustration of border cropping).

The other technique is to replace the black coloured pixels in the borders with white colour pixels which is called black border removal. Unlike cropping this does not reduce the image size.

3. De-speckling / Noise reduction

When scanning old documents we usually get unwanted dots (speckles) in the background. This could be in two forms; black speckles in a white background as well as white speckles in a black background. This is also known as Salt and pepper noise(This is an example for an image with salt and pepper noise)

Whatever the form, this affects the image compression and increases the file size. De-speckling (also known as noise reduction) is the process of removing such unwanted speckles from the image background. (Illustration : noise removal)

4. Colour drop out

Colour dropout is a proven useful technique for forms processing applications such as census projects. The idea is to discard the text boxes and lines of a scanned image. This will increase the recognition rate of OCR. Earlier scanners used specific colored lamps to achieve this. (eg: Blue Imaging Color Drop-Out Element for Kodak 9520/9500). Now this has been improved and is achieved by software.

Colour drop out accuracy directly depends on the printing quality of the forms. Only selected colors (shades of red, blue and green) can be dropped, which depends from scanner to scanner. Therefore it is essential to use the recommended color pantone (e.g.: Fujitsu PANTONE Dropout Confirmation Listing) for printing the forms.

5. Thresholding

Thresholding is a technique used when scanning grayscale images and saving as Black & white.  A grayscale image will have 16 bits per pixel (representing 65,536 shades of gray) and a black & white image will have 1 bit per pixel (representing either black or white). When converting from grayscale to black & white (example:  scanning a photograph in black & white mode), each pixel having a different shade of gray should be converted in to either black or white. This point of separation is called the threshold. By changing the threshold value the output image quality will change

As shown in the above illustration this is a fixed thresholding, which is ideal for separating solid colors (e.g.: text) from background. However for images with various shades of gray advanced version of thresholding called adaptive thresholding is used. In adaptive thresholding the threshold value is calculated independently from pixel to pixel based on the contrast. Different scanner manufacturers and capture applications have come up with many different technologies and algorithms on this such as Kodak ithresholding developed on Adaptive Threshold Processing – ATP)

6. Line Removal

Line removal is a very useful feature especially for OCR applications. This feature is used to remove unwanted lines from scanned images. These lines could be either actual content or noise. Most application forms such as credit cards, account opening etc…consist of text boxes. Although such lines are actual content of the document, they interfere in the character recognition process hence are unwanted.

Also when scanning documents that are folded or when scanning fax copies, there is a high possibility of getting unwanted horizontal lines in the scanned image. These lines, especially vertical ones can interfere in the OCR process. Also if there are any texts that intersect with these lines, they appear as broken in the scanned image resulting in incorrect text recognition.

When line removal is used, these unwanted lines will not be included in the scanned image resulting in a clean image optimized for character recognition. Also characters that are broken due to horizontal lines will be corrected. Further line removal will also reduce the image size.

7. Punch Hole filling

When filed documents having punched holes are scanned, most of the images will show these holes as black spots. In addition to the distracted appearance of the image, this results in two main problems. First is If the file contains large number of documents and the left margin is not adequate, these black spots could interfere with the actual content of the document.

The second issue is that having such black spots in blank pages could interfere with the automatic blank page deletion, since they could be recognized as actual content. Earlier these black marks were removed manually which required lot of time and effort. With the advancement of image processing applications such as Kofax VRS, this can be now automated.

This feature will change the color of such black spots with the surrounding image color. Most such applications take in to consideration the dimensions and locations of such black spots and compare with the different manufacturer specifications and standards.

8. Blank Page Deletion

Blank page deletion is useful when scanning in duplex mode where some documents contain information in both sides of the document as it requires the scanner operator to manually delete the blank pages. Automatic blank page deletion will delete the pages based on a threshold value (in bytes) specified.

When a page size is less than the threshold value specified, it is considered as a blank page and will be automatically deleted. Selecting this value depends on the document type and the scanner being used and usually done after some testing with few experimental values. For blank page removal to be effective, it is essential to use some of the features described above such as black border removal, de-speckling, line removal and punch hole filling.

A common issue faced when using blank page deletion is the bleed-through effect, where content in one side of the paper appearing in the other side of the page, especially in very thin papers. Because of this the blank page is mistakenly recognized as having actual content. Advanced capture applications such as Kofax VRS, tries to address this by differentiating actual content and bleed through.



Data generated by a computer is referred to as output. This includes data produced at a software level, such as the result of a calculation, or at a physical level, such as a printed document. A basic example of software output is a calculator program that produces the result of a mathematical operation. A more complex example is the results produced by a search engine, which compares keywords to millions of pages in its Web page index.

Devices that produce physical output from the computer are creatively called output devices. The most commonly used output device is the computer’s monitor, which displays data on a screen. Devices such as the printer and computer speakers are some other common output devices.

The opposite of output is input, which is data that is entered into the computer. Input and output devices are collectively referred to as I/O devices.

Web Publishing

Web publishing, or “online publishing,” is the process of publishing content on the Internet. It includes creating and uploading websites, updating webpages, and posting blogs online. The published content may include text, images, videos, and other types of media.

In order to publish content on the web, you need three things: 1) web development software, 2) an Internet connection, and 3) a web server. The software may be a professional web design program like Dreamweaver or a simple web-based interface like WordPress. The Internet connection serves as the medium for uploading the content to the web server. Large sites may use a dedicated web host, but many smaller sites often reside on shared servers, which host multiple websites. Most blogs are published on public web servers through a free service like Blogger.

Since web publishing doesn’t require physical materials such as paper and ink, it costs almost nothing to publish content on the web. Therefore, anyone with the three requirements above can be a web publisher. Additionally, the audience is limitless since content posted on the web can be viewed by anyone in the world with an Internet connection. These advantages of web publishing have led to a new era of personal publishing that was not possible before.

NOTE: Posting updates on social networking websites like Facebook and Twitter is generally not considered web publishing. Instead, web publishing generally refers to uploading content to unique websites.

A publisher requires three things to publish content on the Internet:

  • Website development software
  • Internet connection
  • A web server to host the website

