PostgreSQL: Book of The Elephant
PostgreSQL, lovingly referred to as Postgres (pronounced post-gress), is an open source database management system. It's the beautiful love-child between the Ingres project formed at the University of California, Berkeley and the open source community that took hold of the reins starting in 1994.
Who Uses Postgres?
Postgres is being used for large and small projects everywhere. Here is a look at five you are probably familiar with:
- Reddit – This massive social networking community uses Postgres in conjunction with Python.
- MySpace – MyWho? This failed contender in the social profile scene uses an unmodified version of Postgres.
- Skype – Your favorite VoIP application uses Postgres.
- IMDB – The Internet Movie Database serves up flame wars from a Postgres database.
- WhitePages.com – The database holding the name and number of everyone who is anyone. They handle over 20 million unique users per month.
It's nice to see Internet giants using Postgres, as it shows it's a package that can scale if and when an application requires it.
It's also important to note who doesn't use Postgres. You'll have to support the likes of MySQL for projects such as WordPress or Joomla!, but many large open source projects try to include support for multiple database systems. Beyond these two, you likely won't need another database management system.
Postgres Features at a Glance
Postgres is known for being feature-rich. Indeed, from the initial concept of the database system it was intended to be packed to the brim with features.
Some of the major features Postgres is known for are below:
- Data types – Data types aren't usually a feature, but an integral part of the database management system. Postgres excels in this area by offering a wider variety of data types than alternatives. Don't see what you need? Postgres also lets developers create their own data types to support anything from geographical data to IP addresses.
- Subqueries – Subqueries allow developers to embed a SELECT query inside another one. This is useful for a number of reasons. Using subqueries, one is able to bypass JOIN queries, use logic in the query, or return multiple rows and columns in a single query. It's useful in simplifying code and also accomplishing results not previously possible.
- User-defined Functions – Postgres comes with PL/pgSQL, which is a common programming language for Postgres functions in particular. Using custom functions saves processing power when calling between multiple tables, enacting multiple queries, and also organizes logic.
- Triggers – Triggers are events that signal a certain action or function to take place. An example would be with an INSERT statement that triggers a validation function. Also available are “rules” that manipulate queries or generates additional queries.
- Replication – This is the act of duplicating, or replicating, a database for backup or for use on another server. Postgres has many third-party plugins that may handle replication. Each have their benefits and flaws, and some are even commercial. Not every package is the right fit for the job; an example is with Slony-l, which has gained attention as a poor choice for clustered servers due to considerably slow execution.
- Reliability – Data is guaranteed in PostgreSQL, as data is checked to be valid before it is allowed entry into the database. This sounds trivial, although alternatives to Postgres prove that data integrity sometimes come back to bite them. Postgres also offers a rollback feature to undo transactions. Much like Windows XP and its restore program, this feature allows accidents to be glossed over.
- Front-end Administration – Postgres has a few great free applications for administration the database with a GUI. The most popular would be pgAdmin and phpPgAdmin. Open Office, which ships with operating systems like Ubuntu, is also able to be used as a front-end. For the command line geeks, psql is the command-line program that allows SQL queries to be entered directly.
PostgreSQL does have features that seem to be missing compared to other relational database management systems. A prime example would be with checking if a row exists in the database, and if not, then to create the row. This is known as the INSERT IGNORE and REPLACE statement in MySQL.
Generally, anything that Postgres lacks can be recreated with triggers and user-defined functions. The previous example with the INSERT IGNORE and REPLACE statements is easily duplicated by looking at community examples.
Performance Issues
PostgreSQL started with the goal of creating rich features, while alternatives like MySQL originally started out focusing on speed. Although in the early days Postgres was seen as bloated in comparison, major releases since then have dramatically improved performance.
There are several performance-boosting features that Postgres has implemented over past releases the help in minimizing overhead. This involves caching, compressing data on the fly, and offering support for scaling. PostgreSQL is quickly becoming a competitor to MySQL in terms of speed, and in some areas, showing faster benchmarks.
Note that there are some apparent flaws that are pointed out. Most notable is the count(); performance when compared to MySQL, which caches the count for a quick value return. PostgreSQL is not able to do this because of the MVCC design, but a workaround involving PL/pgSQL functions and triggers is available. It's best to benchmark applications before putting them on a production server to catch little “gotchas” like this.
pgAdmin is a powerful tool that allows for queries to be analyzed and optimized. It's recommended all newcomers to PostgresQL get familiar with this GUI interface, or the command-line, in order to better troubleshoot the cost of queries in an application. pgAdmin is a powerful tool that is essential in learning best practice of accomplishing different tasks.
Culture and Support
Postgres is completely supported and operated by the community. It's considered an open source project, as compared to an open source product such as MySQL would be. Postgres has a large number of supporters, ranging from corporate to individual support.
The best place to get commercial support would be with local development and hosting companies. Individuals are able to freely find support documents, tutorials, and personalized help on forums and IRC chat rooms. Hopping in the #postgresql room on the freenode server is a good place to start for questions.
The Postgres culture is as rich as its features. Itis a fierce “not for sale” project that is released under the MIT-style license. It is verified under the Copyfree license, which states the project is free to use, distribute, modify, and fork as needed.
Perhaps one of the most noble mascots, Postgres also features Slonik: the elephant also known as BEH (Blue Elephant-Head). It's been rumored that Slonik also has a diet of dolphin when the need to feed comes on.
Moving Forward
No relational database management system is perfect. You will see that PostgreSQL does have its flaws, despite having a large number of great selling points. It's important to investigate several databases in your career as a web developer. Form your own opinions on which you like best, which to use in certain scenarios, and why.
There are plenty of flame wars on the Internet and in communities. Feel free to skip those and look through more Postgres tutorials. Once you are able to graduate an expert, give the same opportunity to several other relational database management systems. After all, for every major website using Postgres, there are hundreds more using MySQL, SQLite, Oracle, NoSQL, and even a few using MS Access. (Feel free to shudder at that thought and attempt to move on)
Further Reading
Installing PostgreSQL On a Linux Server
A guide for installing PostgreSQL on several flavors of Linux distributions. Includes CentOS, Ubuntu, Debian, Red Hat, Fedora, Gentoo, and SuSE. Get Postgres running and join in on the fun!