Text and Data Processing

 

Introducing the Best Text and Data Processing in the World

It's a Bold Claim But We Can't Help That its True

It’s one of LiveCode’s greatest strengths. Unique among multi-platform programming languages, LiveCode understands text in the same way you do: characters, words and lines. Read on to learn all about one of LiveCode’s greatest core strengths and how it can save you lots of time, money and effort.

Text and Data – at the Heart of Business

Text and data processing is a component in virtually every application or IT solution. As may be well aware, in many applications it's the core functionality, a major component or one of the main outputs. That's hardly surprising when you consider that Merrill Lynch estimates that more than 85 percent of all business information exists as unstructured data. Unstructured data comes from many sources, commonly appearing in e-mails, news, Web pages, memos, notes from call centers and support operations, chat logs, system logs, surveys, user forums, reports, white papers, marketing material, research and presentations.
Unstructured data is not always easy to find, access, analyze or use. Yet LiveCode makes it effortless.

Lets Talk Text

Imagine retrieving words from a paragraph, inserting text after a line, or finding whether text begins with a particular phrase using simple natural language? That's exactly how LiveCode works. It’s hard to appreciate how useful this feature is until you’ve tried it. You will create code that is easier to write, debug, maintain, read, update and hand off to another team member. To give you some idea of the power of our approach, here are some examples.

Text and Data Processing Examples

Does a Piece of Text Contain a Phrase?

Check if a phrase is found within a piece of text that has been stored in a variable. This sort of test comes up all the time when writing applications or IT systems that use data in any way.
In LiveCode:
put “Hello” is in theText
The put command in LiveCode is used to place something into a container such as a variable, file or web page. Note that you don’t need to declare theText variable or specify it’s type. We can choose a destination by adding “into”, e.g. put x into y. We could compare this command in a number of contexts but in this case we will compare the results from using this command in LiveCode Server, where the output will be output directly to the webpage being loaded.
Compared with JavaScript:
document.write(theText.match("World")!=null);
If you’re experienced with JavaScript then the second example may look more familiar to you. Any talented programmer can construct JavaScript like this with ease. But familiarity with something doesn’t always make it the best way to do it. Consider just how much more complex this line of JavaScript is to construct. And how much more prone it is to error: if you leave out a symbol or the semicolon at the end you’ll have to go back and correct it. It also requires considerably more effort to read and maintain later. And this is a very simple example.
Retrieve Words From a Paragraph
In this example we retrieve words 3 to 6 from a paragraph. This problem comes up in applications that pull data together and format from difference sources. For example, you might use code like this to write an email merge application, a log processing utility or a data analysis tool.
In LiveCode:
put word 3 to 6 of “The quick brown fox jumped over the lazy dog.” into theVariable
Compared with Java:
String [] theWords = "The quick brown fox jumped over the lazy dog".split(" ");
String theVariable = "";
for(int i = 2; i <= 5; i++) {
theVariable += theWords[i] + " ";
}
As well as being longer and more complex, the Java example contains many more symbols. Generally speaking symbols detract from the readability of code, and there is a direct correlation between the number of symbols and the number of mistakes it’s possible to make before even running it.

See if Text Begins with a Phrase

In this example we are checking to see if some text begins with a phrase. This problem comes up when checking if the format of unstructured data matches what you are expecting.
In LiveCode:
theText begins with “Hello”
In JavaScript:
theText.substring(0, "Hello".length).match("Hello") != null;
Even an experienced programmer will need to spend a moment constructing that line in JavaScript. It’s not readable and thus if you plan to maintain your code or hand it to another developer you will need to add a comment to the JavaScript example. Most developers spend more time reading code than they do writing it.
Now look at what happens when we add a little more detail. The specification changes and you realize you want to check if line 3 of the text begins with “Hello”, rather than the text as a whole. In LiveCode this comes out exactly as you would expect:
line 3 of theText begins with “Hello”
Now look at what happens to the JavaScript:
var theLines = theText.split("\n");
theLines[2].substring(0, "Hello".length).match("Hello") != null;
Even an experienced programmer may not always write that correctly first time. And if you need to change that code in the future you will spend at least a moment studying it to understand what each of those function calls is doing before you can change it.
Those of you that are experienced with JavaScript may spot an additional problem with the JavaScript example: it will return an error if there are less than 3 lines. The LiveCode example won’t, it will simply return “false” as you would expect. So you should really add a check to see that you have 3 lines to the JavaScript version, at which point it makes more sense to write this as a JavaScript function:
function beginsWith(theText, theSearch, theLine) {
var theLines = theText.split("\n");
return (theLines.length >= theLine && theLines[theLine - 1].substring(0, theSearch.length).match(theSearch) != null);
}
At this point we would argue that LiveCode is starting to run some rings around JavaScript. Whether you need to produce a quick utility that parses data in a few text files, or a complex scalable enterprise IT solution, it could be much faster, easier (even more fun!) to write it in LiveCode.
For those of you that are familiar with PHP, here is the equivalent in that language:
function beginsWith($theText, $theSearch, $theLine) {
$theLines = explode("\n", $theText);
return (substr($theLines[$theLine - 1], 0, strlen($theSearch)) == $theSearch);
}

Check for an Exact Match of a Line, Word or Phrase

In this example we demonstrate how to check if a word or phrase is found exactly within text that is stored as a list on multiple lines. For example, if we have a file containing a list of email addresses one per line, this type of matching could be used to add new email addresses to the list while checking for duplicates. Alternatively it might be used to search for an entire cell within a CSV file (a text file separated by commas).
In LiveCode:
put theText is among the lines of theList
This code will return true if any line within theList is exactly the same, from start to finish, as theText. Now you can change the search delimiter, for example to search cells within an CSV file, you would use item instead of line. You can even change what constitutes an item from the default of a comma to something else, by setting the itemDelimiter.
In PHP:
function among($theText, $theLines) {
$theLines = explode("\n", $theLines);
return in_array($theText, $theLines);
}
In JavaScript:
function among(theText, theLines) {
theLines = theLines.split("\n");
for(i = 0; i < theLines.length; i++)
if(theLines[i] == theText) return true;
return false;
}
In Java:
public static boolean among(String theText, String theLines) {
String [] lines = theLines.split("\n");
for(int i = 0; i < lines.length; i++)
if(lines[i].equals(theText)) return true;
return false;
}

Count the Number of Words in Some Text

Many applications need to be able to perform a word count. For example, this feature might be needed in a custom text editor, email client or print layout program. In this example we are going to count the number of words in the fifth line of a piece of text. You might want to do this when validating entry to a web form.
In LiveCode:
put the number of words in line 5 of theText
In PHP:
function line_word_count($theText, $theLine) {
$theText = explode("\n", $theText);
if($theText[$theLine - 1]) return count(explode(" ", $theText[$theLine - 1]));
else return 0;
}
In JavaScript:
function line_word_count(theText, theLine) {
theText = theText.split("\n");
if(theText[theLine - 1]) return theText[theLine - 1].split(" ").length;
else return 0;
}
In Java:
public static int line_word_count(String theText, int theLine) {
String [] lines = theText.split("\n");
if(theLine > lines.length) return 0;
else return lines[theLine - 1].split(" ").length;
}

Insert Text after a Specific Word

In this example we modify “The fox jumped over the lazy dog.” to include “quick brown” after the second word. This sort of text manipulation is typically performed when formatting data from one data source before inserting it a different system that requires another format.
In LiveCode:
put “The fox jumped over the lazy dog.” into theText
put “ quick brown” after word 2 of theText
In PHP:
$theText = "The fox jumped over the lazy dog.";
$theText = explode(" ", $theText, 2);
$theText = implode(" ", array_slice($theText, 0, 1)) . " quick brown " . implode(" ", array_slice($theText, 1));
In JavaScript:
theText = "The fox jumped over the lazy dog.";
theText = theText.split(" ");
theText = theText.slice(0, 1).join(" ") + " quick brown " + theText.slice(2, -1).join(" ");

Sort text

In the final example, we’ll look at the sort function if LiveCode. Sort is a vital component of many applications. For example you may need to sort the results of a file listing, or sort the contents of a table – and there are many possible criteria you might need to apply. In this example we’ll assume you want to sort lines of a block of text by the third item (piece of text separated by a comma) in each line.
In LiveCode:
sort lines of theText descending by item 3 of each
JavaScript:
theText = theText.split("\n");
theText = theText.sort(sort_item_3).join("\n");
function sort_item_3(line1, line2) {
line1 = line1.split(",");
line2 = line2.split(",");
if(line1[2] == line2[2]) return 0;
else if(line1[2] > line2[2]) return -1;
else return 1;
}
In PHP:
$theText = explode("\n", $theText);
uasort($theText, "sort_item_3");
$theText = implode("\n", $theText);
function sort_item_3($line1, $line2) {
$line1 = explode(",", $line1);
$line2 = explode(",", $line2);
if($line1[2] == $line2[2]) return 0;
else if($line1[2] > $line2[2]) return -1;
else return 1;
}
In Java:
import java.util.*;

class Item3Comparator implements Comparator<String> {
public int compare(String line1, String line2) {
String [] line1Items = line1.split(",");
String [] line2Items = line2.split(",");
return -1 * line1Items[2].compareTo(line2Items[2]);
}
}

public class StringSort {

public static void main(String args[]) {   
String theText = "some, itemized, text, to\nsort, for, testing, purposes";
String [] lines = theText.split("\n");
Arrays.sort(lines, new Item3Comparator());
theText = join(lines, "\n");
}

public static String join(String [] items, String glue) {
String theText = "";
for(int i = 0; i < items.length; i++)
theText += items[i] + glue;
return theText;
}

}
Now you could argue that in Java you would import a pre-existing sorting class to do this, which would shorten the amount of code you have to write. While this is true, we think it is missing the point for a couple of reasons.
First, this example is a very specific type of sort, which you might not easily find in a pre-existing library. A third party class probably exists and of course you would write a set of classes to do this if you were doing it regularly. But when you start to look at the flexibility of LiveCode’s built-in sort function you’re looking at a lot of Java code, a lot of classes to import and some of it you are going to have to write.
Second, like the other text processing examples above, the sort command is built right into the LiveCode language. It’s part of the core feature set, you don’t need to install anything or add anything. Take the LiveCode example and add another 5 or 10 lines of processing using some of the other text processing features and you have a utility that's typically good enough to reformat some pretty complex data. Do the same with Java and you’ll be defining classes, writing code and debugging for a long time – the whole process is an order of magnitude more complex.
Don’t get us wrong: Java is a great language and there are times when it’s going to be the perfect choice for your project. But we do invite you to take a look at your next project and see if LiveCode might be a better fit or if it could fit in as part of the overall solution. If that's the case, it could save you a great deal of time and effort.

Limitations

We should mention that currently it’s perfectly possible to build applications that use Unicode in LiveCode, but there are some limitations. These can make this type of application a little harder to develop. Please check out this article for more information. We’re hard at work adding beautiful, seamless and complete Unicode support for a future version so please check back soon if you’re interested in that.

Other Text and Data Processing Routines

LiveCode comes with a plethora of additional text and data processing routines. These include syntax for replacing text, performing filtering, an XML library, commands to split your larger data sets up into arrays, array manipulation commands, and full support for PERL compatible regular expressions. Generally we recommend you use LiveCode’s chunk expressions instead of regular expressions as they are much easier to construct and to read, but there are times when its useful to have regular expressions around. When working with larger data sets LiveCode’s arrays really speed up processing time and we have full support for nested arrays so you can easily represent and work with complex data structures.
We invite you to take a look at our lessons portal or the language dictionary for more information.







































































































About LiveCode

Related LiveCode