What it does
Textual analysis software which uses the Natural Language Processing method to process textual data from verbatim response to surveys which will categorise or group responses, find latent associations and perform classification or coding, if required.
Ease of use
Compatibility with other software
Value for money
One-off costs: standalone user £2,794; optional annual maintenance £559; single concurrent network user: £6,985 software, plus maintenance £1,397
- Flexible – can use it to discover and review your verbatims individually, or to produce coded data automatically under your supervision
- User interface is simple, straightforward and productive to use, once you are familiar with the concepts
- Lets you relate your open-ended data to closed data other questions or demographics
- Easy import and exports from SPSS data formats or Microsoft Excel
- This is an expert system which requires time and effort to understand
- System relies on dictionaries, which need to be adjusted for different subject domains
- Rules-based approach for defining coded data requires learning and using some syntax
One of the greatest logistical issues with online research is handling the deluge of open-ended responses that often arrive. While much of the rest of the survey process can be automated, analysing verbatim responses to open questions remains laborious and costly. If anything, the problem is gets worse with Web 2.0-style research. A lot of good data gets wasted simply because takes too long and costs too much to analyse – which is where this ingenious software comes in.
PASW Text Analytics for Surveys (TAfS) operates as either an add-on to the PASW statistical suite – the new name for the entire range of software from SPSS (see box) – or as a standalone module. It is designed to work with case data from quantitative surveys containing a mixture of open and closed questions, and will help you produce a dazzling array of tables and charts directly on your verbatim data, or provide you with automatically coded data.
A wizard helps you to start a new project. First, you specify a data source, which can be data directly from PASW Statistics or PASW Data Collection (the new name for Dimensions an ODBC database, or an Excel file (via PASW Statistics). Next, you select the variables you wish to work with, which can be a combination of verbatim questions, for text analysis, and ‘reference questions’ which are any other closed questions you would like to use in comparisons, to classify responses or to discover latent relationships between text and other answers. Another early decision in the process is the selection of a ‘text analysis package’ or TAP.
SPSS designed TAfS around the natural language processing method of text analysis. This is based on recognising words or word stems, and uses their proximity to other word fragments to infer concepts. The method has been developed and researched extensively in the field of computer-based linguistics, and can perform as well if not better than human readers and classifiers, if used properly.
A particular disadvantage of using NLP with surveys is the amount of set-up that must be done. It needs a lexicon of words or phrases and also a list of synonyms so that different ways of expressing the same idea converge into the same concept for analysis. If you wish to then turn all the discovered phrases and synonyms into categorised data, you need to have classifiers. The best way to think of an individual classifier is as a text label that describes a concept – and behind it, the set of computer rules used to determine whether an individual verbatim response falls into that concept or not.
TAfS overcomes this disadvantage by providing you with ready-built lexicons (it calls them ‘type’ dictionaries), not only in English, but in Dutch, French, German, Spanish and Japanese. It also provides synonym dictionaries (called ‘substitution dictionaries) in all six supported tongues, and three pre-built sets of classifiers – one for customer satisfaction surveys, another for employee surveys and a third for consumer product research. It has developed these by performing a meta-analysis of verbatim responses in hundreds of actual surveys.
Out of the box, these packages may not do a perfect job, but you will be able to use the analytical tools the software offers to identify answers that are not getting classified, or those that appear to be mis-classified, and use them to fine tune them or even develop your own domain-specific packages. So, selecting dictionaries and classifiers is done in just couple more clicks in the wizard, the software then processes your data and you are ready to start analysing the verbatims.
The main screen is divided into different regions. One region lets you select categories into which the answers have been grouped, another lets you review the ‘features’ or words and phrases identified , and in the largest region, there appears a long scrolling list of all your verbatim responses to the currently selected category or feature. All of the extracted phrases are highlighted and colour coded. The third panel shows the codeframe or classifers, which is a hierarchical list. As you click on any section of it, the main window is filtered to show just those responses relating to that item. However, it also shows you all of the cross-references to the other answers, which is very telling. There is much to be learned about your data just from manipulating this screen, but TAfS has much more up its sleeve.
One potentially useful feature is sentiment analysis, in which each verbatim is analysed according to whether it is a positive or a negative comment. Interface was not able to test the practical reliability of this, but SPSS claim that it works particularly well with customer satisfaction type studies. In this version, sentiment analysis is limited to the positive/negative dichotomy, though the engine SPSS uses is capable of other kinds of sentiment analysis too.
The software also lets you use ‘semantic networks’ to uncover connections within the data and build prototype codeframes from your data, simply by analysing the frequency of responses to words and phrases and combinations of words and phrases, rather like perform a cluster analysis on your text data – except it is already working at the conceptual level, having sorted out the words and phrases into concepts.
You can build codeframes with, or without help from semantic networks. It’s a fairly straightforward process, but it does involve building some rules using some syntax. I was concerned about how transparent and how maintainable these would be as you handed project from one researcher to another.
Another very useful tool, which takes you beyond anything you would normally consider doing with verbatim data, is a tool to look for latent connections between different answers, and even the textual answers and closed data, such as demographics or other questions.
This may be a tool for coding data, but it is not something you can hand over to the coding department – the tool expects the person in control to have domain expertise and moreover, to possess not a little understanding of how NLP works, otherwise you will find yourself making some fundamental errors. If you put in a little effort, though, this tool not only has the potential to save hours and hours of work, but to let you dig up those elusive nuggets of insight you probably long suspected were in the heaps of verbatims, if only you could get at them.
A version of this review first appeared in Research, the magazine of the Market Research Society, June 2009, Issue 517