It is really interesting to notes how far simple plain text has gone.
Ancient humans has registered numbers to record keeping their inventory. Humanity became able to record their memories and knowledge with the rise of written language (text) by 3200 BC, and then register, pass information and knowledge ahead.
Gutenberg did a notable achievement creating the press, and rise the book and print revolution around 1439. With the invention of the IBM PC in 1981 and its popularization in business and home leter, new challenges come up to find, keep and publish information by Internet since ‘80s.
And now at the information age we live a “BOOM!!!” everybody has a last 3 connected devices, the PC more comfortable to work, an inseparable smartphone to communicate most by text applications, and a tablet for reading gaming and so on… of course it is important to mention that almost 60% of world’s population don’t have Internet access at all or to any device, and some didn’t have electric power!!! It means that a lot of people will engage in technology by the next decades.
Text files can be encoded in a couple of standards as EBCDIC in mainframes, ASCII as a lingua franca in PC age technology and with the sophisticated UNICODE, that covers the beautiful and elegance of characters of many natural languages such as greek “αγάπη” and japanese “愛” (love). The cost of UNICODE is that it uses 2 to 4 bytes to represent the characters facing just one byte by ASCII, and so, if you have 2 terabytes of ASCII data it gonna get 4 to 8 terabytes in UNICODE to tell the same story.
But it can be fixed by using some UNICODE subsets such as UTF-8 or latin languages in ISO-8859-1 standard that cover the characters for Portuguese, Spanish and French for example, and then makes possible to elegantly writes computação (computation in Portuguese), mañana (tomorrow in Spanish) and accélérées (accelerated in French) and symbols “☮”. And now with those subsets we can use one byte per character to attend perhaps any language in our planet, in plain text preserving theirs original symbols.
In many UNIX and Linux books that starts explaining what the OS is, usually states that the text editor is an important part of the OS. At first look it seems that the text editor is just an application over the OS, but that is not true. The text editor became important to write code in C, C++ and other languages, that writes the OS source code itself, this is a paradox, isn’t it? Moreover we need the text editor to edit parameter files that setup the OS, lots of shell scripts and applications. With text editors such as Vi/Vim or Emacs we can use regexp (regular expressions) to find, chage, validate text data filled in by users or check if the text syntax are correct as expected.
Any new music student will be able to play this Beatle song “Hello, Goodbye” in their instrument very easily with this text:
And so, you can with text write your music lyrics and melody chords!! If you want the music sheets to play it with an orchestra it can be very trick, but UNICODE has a set of characters to write down all the notes needed!
All the good job that Google, Bing and other search engines do that make our life a lot simpler, is indexing all the WEB text.
This is the Linux mascot Tux, it is a vectorial draw in plain text SVG file.
At the beginning to transfer data between systems running in different servers we started using CSV or tab separated values or fixed column TXT files, most by FTP, this technique is largely used nowadays. And this experience come up with some intricate simple text solutions such as reader (BOF – Begin Of File) and trailer (EOF – End Of File). Just adding BOF at the beginning of a text file and EOF at the last line, permits to verify before loading the file to a target application if all the data file is received! And avoid load partially the data if the file transfer was interrupted or the text file is broken for any reason. Those data transfers was usually done as batch process by night using shell scripts.
Now at the age of machines, we can’t wait until a night batch processing to get the information update, send files its not enough. We need to make system calls dynamically all the time, among distinct systems, and in additional we need to communicate with different platforms according each company architecture:
Company A. has Windows systems with .NET applications written in C# in SQL Server Database;
Company B. has all in Linux with PHP and Mysql database;
Company C. uses Java with Oracle Database;
Company D. has NodeJS with CouchDB.
Let’s do an integration with a robust SOA layer! That uses XML, SOAP messages, that are all text!! The messages can be send any time a day, and millions messages are processed by second. Doesn’t matter what is your preferred platform, they can talk by text messages using a defined common standard.
But now I have lots of IoT devices and smartphones that by their processing limited capabilities can’t process a SOA layer integration! No problem! Let’s do a lighter and more simple version of EAI (Enterprise Application Integration) to exchange the data and system calls. Use REST/JSON messages, well this is the text solution again!
Now our new social media has exploded and lots of users are working on it, we need to scale up! The standard SQL RDBMS don’t seems to be a good option face the unstructured data characteristic of feed messages, nevertheless we need a full text search engine. We can use the same JSON data structure at the client side (Browser) to store the data at the backend without the need of normalize the data or recreate them as objects in the middleware. CouchDB can do it! It saves the plain text JSON file message created at the Browser and manage it, we can retrieve the data with mapreduce views or full text search. Or we can adopt MongoDB that store a binary version of the JSON text message as BSON. The JSON is closer to the WEB unstructured data nature, those NoSQL databases organize the data as documents.
Just to recall, many developers get surprised when we sniff a relational database connection and then we can get the SQL query sent to the database and the returned data set… all in plain text. There are some control characters on it, depending on the database, but all the data is there human readable. Most companies do not encrypt data between the database and the application server.
Text can be used to store engineering design and projects. The DXF format is a text file used to exchange vectorial drawing among different CAD softwares. And it is possible to build an application that creates the CAD projects and generate the DXF text files.
Binary data can be encoded as text too. In the beginning of e-mail, attach and send binary data file was trick, and so the UUENCODE comes up. UUENCODE can convert any binary file in a text block, that can be send by e-mail. The NoSQL databases can store any binary file such as JPG or PNG images with base64 algorithm, that converts binary to text such as UUENCODE.
Text messages and files can be compressed, and usually it has a high compression ratio that can ranges from around 80% (3:1 to 4:1) if it has lots of redundancy, such as books or contracts. But if you have encoded image binary file, the compression ratio would be very low. In addition there are network appliances for application acceleration. They are used in pairs, one at each end of a telecommunication links, it learns about your traffic and create an index of repetitive data, and then it sends the index number instead of recurrent text blocks to the data rebuild.
One more time the text file save the day!!
Text is the king! It can be used to write books, contracts or melodies, to write simple HTML pages or sophisticated and well designed portals with vectorial graphics. Create mobile apps, 3D games, and exchange data between companies enabling system to system communication. Can support almost any language we know and preserve their elegant symbols. Furthermore in world places with limited technological resources, dump phones allows people to communicate by text messages such as SMS and use financial text based apps too.
God save the electronic text!