Navigating the World of Document Management: Learning the Lingo Part 1

Jack Arnston
Sep 1, 2016
4 min read

Updated: Jul 10, 2018

Are you looking to improve your company’s workflow or eliminate some of the paper clutter laying around the office? Maybe you’ve tried researching document management before and found yourself in a muddled state of mind from all of the industry jargon.

The document management industry has always prided itself in its use of acronyms and unique words and phrases, which is great but it can be a bit confusing to people new to the industry or those looking to implement document management technology in their business. In an effort to help navigate the industries unique phraseology the following paragraphs will help decipher the language. Think of the following definitions as an enigma machine.

We will be splitting this crash course in document management lingo into two posts. Our discussion today will focus on what document management is and the processes that go into document capture.

DM vs. EDMS vs. ECM

Document management (DM) refers to the scanning and storing of paper documents. These documents are usually simple forms, which are scanned and then organized as digital information. This generally requires the use of software to help index data, otherwise known as metadata, which is unique characters (typically words and/or numbers) used to save and retrieve the document.

Enterprise Document Management Systems (EDMS) and Enterprise Content Management Systems (ECM), are interchangeable terms, which refer to a collection of technologies, typically hardware and software that when integrated create a complete and comprehensive solution for scanning, indexing, storing, routing, retrieving and disposing of digital records and information.

Document Capture

The first step to integrating your documents in a DM, EDMS or ECM solution is to identify which ones you wish to save and work with. Years ago this was primarily reserved for paper copies, but today includes a variety of electronic files as well. If you already receive your information electronically, such as e-forms, PDF files, TIFF documents, JPEG images, email or web pages, there is no need to print and then rescan the image. Today’s systems allow you to drag and drop images into an application, which processes information through software to intelligently recognize the information and store it. To make further sense of this procedure let’s take a look at some of the most used industry terms.

Scanning: This requires the use of a mechanical device, typically a desktop scanner of an office multi-function printer (MFP), or a copier, for scanning paper. The device typically contains a feeder for placing multiple sheets of paper into it and scans them through. Once scanned, the paper is converted to a digital format, usually a PDF file or TIFF document.

Forms Processing: Refers to a structured document that you would enter data into. The form can be either paper or electronic, and can be complex, such as a tax return, or a simple form you fill out via a website. The system can “read” and interpret the information on the form, thus eliminating the need to have a person rekey the information into the system. This is especially useful when you need to use the data found within the form to accomplish a task. The extracted information can be used to index and save the form image into the DM system, or it can be shared with an outside system, such as accounting software or a line-of-business application. Electronic forms, or e-forms, are becoming increasingly useful to organizations as they limit the information gathered to only what is necessary, while creating a look and feel that is appealing and easy to use for the person filling in the information. The completion of a form typically generates an additional action, which can include a response back to the submitter and the start of a workflow process to complete a request.

Optical Character Recognition: Typically referred to as OCR, is software that can convert a static image into machine recognizable characters. OCR is used to automate the manual processes for keying index values. Advanced OCR features can recognize specific forms and look at specific places on that form to read and extract data. This process requires that either a template be created so that the software knows where to look on the page or the use of regular expressions. Regular expressions look for key words or phrases on an invoice, such as “Purchase Order Number” or “P.O. Number,” and then reads the corresponding value found in that phrase to locate that number.

Import Tools: These vary slightly depending on the DM solution but they basically all do the same thing, which is import data or images into the system. Tools that import data typically rely on a delimited file to pull the data from. Think in terms of an Excel spreadsheet and you will get the idea. The delimiter can be a pipe (“|”) a comma or other unique character. This data typically is added to the database as new data or an update to existing metadata.

Import tools can also process images into the system. This method typically includes images tagged with a specific file name or metadata, print files from a host system or system reports. The images are saved to the system while the metadata populates the database with index values.

This wraps up our first post about the vocabulary of document management. In our next post, we will delve a little deeper into ways that you can make document management work for your company. In the meantime, let us know if you have any questions or comments below, and if you liked what you read today to make sure to share and follow us on Facebook, Google+ and LinkedIn.

Navigating the World of Document Management: Learning the Lingo Part 1

DM vs. EDMS vs. ECM

Document Capture

Recent Posts

Comments