Eirik Hoem on his blog provides an overview of PHP’s register_shutdown_function, and suggests using it for the cases when for whatever reason your Web page ran out of memory, fatal’ed, and you don’t want to display a blank page to the users.
register_shutdown_function is also useful for command-line scripts with PHP. Pretty frequently your script has to do some task like parse a large XML file, and the test examples when it was originally written did not account for the XML file possible being huge. Therefore your script dies with like 23% completion, and you’re left with 23% of the XML file parsed. Not ideal, but a quick duct-tape-style fix, would be to introduce a register_shutdown_function call to system(), to which you pass the script itself.
If you happen to keep track of which line you’re on while parsing, you can pass the line number as the first parameter to your own script, and make it start off after that 23% mark, or wherever it died. The script then needs to be launched with 0 passed as the first parameter. It will run out of memory, die, launch register_shutdown_function, which will launch another copy of the script (while successfully shutting down the original process) with a new line number, which will repeat the process.
Again, this is a duct tape approach to PHP memory consumption issues while working with large data sets.
Saturday, November 8, 2008
register_shutdown_function possible use cases
24 Web site performance tips
Yahoo! Developer Network blog had an entry by Stoyan Stefanov and presentation from PHP Quebec conference. A few points to take away, in case you don’t feel like going through 76-slide presentation:
1. A drop of 100ms in page rendering time leads to 10% in sales on Amazon. A drop of 500 ms leads to 20% less traffic to Google.
2. Make fewer HTTP requests - combine CSS and JS files into single downloads. Minify both JS and CSS.
3. Combine images into CSS sprites.
4. Bring static content closer to the users. That usually means CDNs like Akamai or Limelight, but sometimes a co-location facility or data center in a foreign country is the only option.
5. Static content should have Expires: headers way into the future, so that they’re never re-requested.
6. Dynamic content should have Cache Control: header.
7. Offer content gzip’ed.
8. Stoyan claims nothing will be rendered in the browser till the last piece of CSS has been served, and therefore it’s critical to send CSS as early in the process as possible. I happen to have a document with CSS declared at the very end, and disagree with this statement - at least the content seems to render OK without CSS, and then self-corrects when CSS finally loads.
9. Move the scripts all the way to the bottom to avoid the download block - Stoyan’s example shows placing the javascript includes right before and , although it’s possible to place them even further down (well, you’d break XHTML purity, I suppose, if you declare your documents to be XHTML).
10. Avoid CSS expressions.
11. Consider placing the minified CSS and JS files on separate servers to fight browser’s default pipelining settings - not everybody has FasterFox or tweaked pipeline settings.
12. For super-popular pages consider inlining JS for fewer HTTP requests.
13. Even though placing content on external servers with different domains will help you with HTTP pipelining, don’t go crazy with various domains - they all require DNS lookups.
14. Every 301 redirect is a wasted HTTP request.
15. For busy backend servers consider PHP’s flush().
16. Use GET over POST any time you have a choice.
17. Analyze your cookies - large number of them could substantially increase the number of TCP packets.
18. For faster JavaScript and DOM parsing, reduce the number of DOM elements.
19. document.getElementByTagName(’*').length will give you the number of total elements. Look at those abusive
20. Any missing JS file is a significant performance penalty - the browser will browse the 404 page you generate, trying to see if it has valid
FirePHP for PHP and AJAX development
FirePHP is a package consisting of a Firefox extension and server-side PHP library for quick PHP development on top of Firebug. It allows you to include the PHP library, and issue logging calls like
fb('Log message' ,FirePHP::LOG);
fb(’Info message’ ,FirePHP::INFO);
fb(’Warn message’ ,FirePHP::WARN);
fb(’Error message’,FirePHP::ERROR);
This is visible only to the Firefox version that has FirePHP installed on top of Firebug. You can also dump entire array and objects to the fb() function call, and have them displayed in Firebug UI.
PHP Top scalability mistakes
1. Define the scalability goals for your application. If you don’t know how many requests you’re shooting for, you don’t know whether you’ve built something that works, and how long it’s going to last you.
2. Measure everything. CPU usage, memory usage, disk I/O, network I/O, requests per second, with the last one being the most important. If you don’t know the baseline, you don’t know whether you’ve improved.
3. Design your database with scalability in mind. Assume you’ll have to implement replication.
John Coggeshall, CTO of Automotive Computer Services, and author of Zend PHP Certification Practice Book and PHP5 Unleashed, gave a talk at OSCON 2008 on top 10 scalability mistakes. I wasn’t there, but he posted the slides for everybody to follow. Here’re some lessons learned.
1. Define the scalability goals for your application. If you don’t know how many requests you’re shooting for, you don’t know whether you’ve built something that works, and how long it’s going to last you.
2. Measure everything. CPU usage, memory usage, disk I/O, network I/O, requests per second, with the last one being the most important. If you don’t know the baseline, you don’t know whether you’ve improved.
3. Design your database with scalability in mind. Assume you’ll have to implement replication.
4. Do not rely on NFS for code sharing on a server farm. It’s slow and it’s got locking issues. While the idea of keeping one copy of code, and letting the rest of the servers load them via NFS might seem very convenient, it doesn’t work in practice. Stick to some tried practices like rsync. Keep the code local to the machine serving it, even if it means a longer push process.
5. Play around with I/O buffers. If you’ve got tons of memory, play with TCP buffer size - your defaults are likely to be set conservatively. See your tax dollars at work and use this Linux TCP Tuning guide. If your site is written in PHP, use output buffering functions.
6. Use Ram Disks for any data that’s disposable. But you do need a lot of available RAM lying around.
7. Optimize bandwidth consumption by enabling compression via mod_deflate, setting zlib.put_compression value to true for PHP sites, or Tidy content reduction for PHP+Tidy sites.
8. Confugure PHP for speed. Turn off the following: register_globals, auto_globals_jit, magic_quotes_gpc, expose_php, register_argc_argv, always_populate_raw_post_data, session.use_trans_sid, session.auto_start. Set session.gc_divisor to 10,000, output_buffering to 4096, in John’s example.
9. Do not use blocking I/O, such as reading another remote page via curl. Make all the calls non-blocking, otherwise the wait is something you can’t really optimize against. Rely on background scripts to pull down the data necessary for processing the request.
10. Don’t underestimate caching. If a page is cached for 5 minutes, and you get even 10 requests per second for a given page, that’s 3,000 requests your database doesn’t have to process.
11. Consider PHP op-code cache. This will be available to you off-the-shelf with PHP6.
12. For content sites consider taking static stuff out of dynamic context. Let’s say you run a content site, where the article content remains the same, while the rest of the page is personalized for each user, as it has My Articles section, and so on. Instead of getting everything dynamically from the DB, consider generating yet another PHP file on the first request, where the article text would be stored in raw HTML, and dynamic data pulled for logged-in users. This way the generated PHP file will only pull out the data that’s actually dynamic.
13. Pay great attention to database design. Learn indexes and know how to use them properly. InnoDB outperforms MyISAM in almost all contexts, but doesn’t do full-text searching. (Use sphinx if your search needs get out of control.)
14. Design PHP applications in an abstract way, so that the app never needs to know the IP address of the MySQL server. Something like ‘mysql-writer-db’, and ‘mysql-reader-db’ will be perfectly ok for a PHP app.
15. Run external scripts monitoring the system health. Have the scripts change the HOSTS if things get out of control.
16. Do not do database connectivity decision-making in PHP. Don’t spend time doing fallbacks if your primary DB is down. Consider running MySQL Proxy for simplifying DB connectivity issues.
17. For super-fast reads consider SQLite. But don’t forget that it’s horrible with writes.
18. Use Keepalive properly. Use it when both static and dynamic files are served off the same server, and you can control the timeouts, so that a bunch of Keep-alive requests don’t overwhelm your system. John’s rule? No Keep-alive request should last more than 10 seconds.
19. Monitor via familiar Linux commands. Such as iostat and vmstat. The iostat command is used for monitoring system input/output device loading by observing the time the devices are active in relation to their average transfer rates. The iostat command generates reports that can be used to change system configuration to better balance the input/output load between physical disks. vmstat reports information about processes, memory, paging, block IO, traps, and cpu activity.
20. Make sure you’re logging relevant information right away. Otherwise debugging issues is going to get tricky.
21. Prioritize your optimizations. Optimization by 50% of the code that runs on 2% of the pages will result in 1% total improvement. Optimizing 10% of the code that runs on 80% of the pages results in 8% overall improvement.
22. Use profilers. They draw pretty graphs, they’re generally easy to use.
23. Keep track of your system performance. Keep a spreadsheet of some common stats you’re tracking, so that you can authoritatively say how much of performance gain you got by getting a faster CPU, installing extra RAM, or upgrading your Linux kernel.
Sphinx search ported to PECL
Anthony Dovgal reported on adding open source SQL full-text search engine sphinx to PECL. The documentation is available on the PHP site, the engine is available upon including sphinxapi.php in your application.
You know the usual InnoDB vs. the MyISAM trade-offs, where the former is faster, but the latter has the full-text search? Sphinx is a free open-source full-text search engine that works with many RDMBS, and now is pretty easy to incorporate into PHP. A simple example of calling Sphinx is available here:
$s = new SphinxClient;
$s->setServer("localhost", 6712);
$s->setMatchMode(SPH_MATCH_ANY);
$s->setMaxQueryTime(3);
$result = $s->query("test");
Best practices in PHP development
1. Use source control
1. First, choose between distributed and non-distributed
2. Then, if you chose non-distributed, choose between CVS and SVN
3. In Subversion, use trunk/ for ongoing development and bug fixes, branches/ for ongoing large projects that later need to be merged in, and tags/ for releases
4. Use svn externals to connect to remote repositories
5. Subversion supports pre-commit and post-commit hooks for better code maintainability and checks
2. Implement coding standards
1. Develop class, variable, function, package, etc. naming conventions
2. Agree on common formatting as far as spacing, braces, etc.
3. Implement comment standards
4. PHP_CodeSniffer can run on pre-commit to check whether the commit adheres to the standards
5. Don’t forget to enforce coding standards on any outsourced projects
3. Unit testing and code coverage
1. Use PHPUnit for unit testing
2. For continuous integration, check out phpUnderControl
3. For integration testing, check out Selenium, a general Web application testing suite
4. Documentation
1. Don’t invent your own standards, see what phpDocumentor has to offer. Doxygen also supports phpDoc tags
2. For documenting the software project, try DocBook - XML-based format that allows you to quickly publish a PDF document, or a Website with documentation
5. Deployment
1. Have a standard deployment process that a rookie can familiarize with quickly
2. Support 3 environments - development, staging, and production
3. Deploy code only from repository tags, don’t run trunk, or allow code editing on server
4. Check out a new tag from SVN, point the symlink to it. If something goes wrong during release, change the symlink back to the previous version - easy rollback strategy
5. Everything that needs to be done on the production servers needs to be automated
6. You can do another Selenium test after the release is deployed
7. Check out Monit and Supervisord for deployment monitoring
12 PHP optimization tips
1. If a method can be static, declare it static. Speed improvement is by a factor of 4.
2. Avoid magic like __get, __set, __autoload
3. require_once() is expensive
4. Use full paths in includes and requires, less time spent on resolving the OS paths.
5. If you need to find out the time when the script started executing, $_SERVER['REQUEST_TIME'] is preferred to time()
6. See if you can use strncasecmp, strpbrk and stripos instead of regex
7. str_replace is faster than preg_replace, but strtr is faster than str_replace by a factor of 4
8. If the function, such as string replacement function, accepts both arrays and single characters as arguments, and if your argument list is not too long, consider writing a few redundant replacement statements, passing one character at a time, instead of one line of code that accepts arrays as search and replace arguments.
9. Error suppression with @ is very slow.
10. $row['id'] is 7 times faster than $row[id]
11. Error messages are expensive
12. Do not use functions inside of for loop, such as for ($x=0; $x < count($array); $x) The count() function gets called each time.
Thursday, October 30, 2008
Searching for All titles inside a directory of a page using PHP
We have seen in Part 1 of this tutorial how to read the title tags of an html file. Now we will develop a script for reading all the title tags of the files present inside a directory. The basic script remains same and we will only be keeping this basic script within a while loop. This while loop will list all the files present inside a directory.
You must read how in part 1 we have developed the code to handle a file in read mode and collect the text between title tags. Also read how the directory handler works to display all the files.
Here is the code to handle the directory listing.
$path="../dir-name/";// Right your path of the directory
$handle=opendir($path);
while (($file_name = readdir($handle))!==false) {
We can open files of particular type by using its extensions. Here we will use one if condition to add or exclude different types of files. ( read more on stristr())
if(stristr($file_name,".php")){ // read the file now }
Rest of the code is same as part 1. So here is the complete code.
<?
///////////////function my_strip///////////
function my_strip($start,$end,$total){
$total = stristr($total,$start);
$f2 = stristr($total,$end);
return substr($total,strlen($start),-strlen($f2));
}
/////////////////////End of function my_strip ///
///////////// Reading of file content////
$i=0;
$path="../dir-name/";// Right your path of the file
$handle=opendir($path);
while (($file_name = readdir($handle))!==false) {
if(stristr($file_name,".php")){
$url=$path.$file_name;
$contents="";
$fd = fopen ($url, "r"); // opening the file in read mode
while($buffer = fread ($fd,1024)){
$contents .=$buffer;
}
/////// End of reading file content ////////
//////// We will start with collecting title part ///////
$t=my_strip("<title>","</title>",$contents);
echo $t;
echo "<br>";
$i=$i+1;
}
}
echo $i;
?>
Article Source
Wednesday, October 29, 2008
Searching for title inside a file using PHP
We can collect the text written between two landmarks inside a file. These landmarks can be starting and ending of html tags. So what ever written within a tag can be copied or collected for further processing. Before we go for an example, read the tutorial on how to get part of a string by using one staring and ending strings.
Let us try to understand this by an example. We will try to develop a script which will search and collect the text written within the title tags of a page. Read here if you want to know more about title tags in an html page. Here is an example of title tag.
<title>This is the title text of a page</title>
As you can see within the page we can use starting and ending title tags or any pair of tags as two landmarks and collect the characters or string within it.
Now let us learn how to open a file and read the content. Here is the code part to do that.
$url="../dir-name/index.php";
$contents="";
$fd = fopen ($url, "r"); // opening the file in read mode
while($buffer = fread ($fd,1024)){
$contents .=$buffer;
}
Now as we have the content of the file stored in a variable $content , we will use our function my_strip ( read details about my_strip function here) to collect the title part only from the variable and print to the screen.
$t=my_strip("<title>","</title>",$contents);
echo $t;
With this we can give any URL and see the title of the file. This way like title tag we can read any other tag like meta keywords, meta description, body tag etc of a page. You can see many applications can be developed using this but let us try to develop few more things from this.
First is reading all the files of a directory and displaying all the titles of the files inside that directory
Second develop a hyperlink using these titles to query google for these titles. ( think why ? )
The above two codes we will discuss in next section. Before that here is the full code as we discussed in the above tutorial.
<?
///////////////function my_strip///////////
function my_strip($start,$end,$total){
$total = stristr($total,$start);
$f2 = stristr($total,$end);
return substr($total,strlen($start),-strlen($f2));
}
/////////////////////End of function my_strip ///
///////////// Reading of file content////
$url="../dir-name/index.php";// Right your path of the file
$contents="";
$fd = fopen ($url, "r"); // opening the file in read mode
while($buffer = fread ($fd,1024)){
$contents .=$buffer;
}
/////// End of reading file content ////////
//////// We will start with collecting title part ///////
$t=my_strip("<title>","</title>",$contents);
echo $t;
?>
Once we know the title text , we can go for str_ireplace command to replace the old title with new title and then write the content to file again.
Article Source
Getting the last updated time of the file in PHP
We can get the last updated date of any file by using filemtime function in PHP. This function returns date in UNIX Timestamp format and we can convert it to our requirement by using date function.
This function filemtime() uses the server file system so it works for local systems only, we can't use this function to get the modified time of any remote file system.
Here is the code to get the last modified date of any file. We are checking for a existing file ( test.php)
echo date("m/d/Y",filemtime(“test.php”));
The above code will display the modified date in month/day/year format.
Note that we have used date function to convert the Unix Timestamp time returned by filemtime() function.
Article Source
How to get the file name of the current loaded script using PHP ?
We can get the current file name or the file executing the code by using SCRIPT_NAME. This will give us the path from the server root so the name of the current directory will also be included. Here is the code.
$file = $_SERVER["SCRIPT_NAME"];
echo $file;
The above lines will print the present file name along with the directory name. For example if our current file is test.php and it is running inside my_file directory then the output of above code will be.
/my_file/test.php
We will add some more code to the above line to get the only file name from the above code. We will use explode command to break the string by using delimiter “/” .
As the output of this explode command is an array then we will collect the last element of this array to get our file name. Here the index of last element of the array is total element of the array minus one, because the index of the elements start from 0 ( not from one ). So index of the last element of the array = total number of elements – 1
Here is the code to get the last element of the array with the explode command to get the array.
$break = Explode('/', $file);
$pfile = $break[count($break) - 1];
Here $pfile is the variable which will have the value of present file name.
We can use $pfile in different application where current file name is required .
Here is the complete code.
$file = $_SERVER["SCRIPT_NAME"];
$break = Explode('/', $file);
$pfile = $break[count($break) - 1];
echo $pfile;
Article Source
How to delete all the files in a directory using PHP ?
We have seen how a file can be deleted by using unlink function in PHP. The same function can be used along with directory handler to list and delete all the files present inside. We have discussed how to display all the files present inside a directory. Now let us try to develop a function and to this function we will post directory name as parameter and the function will use unlink command to remove files by looping through all the files of the directory.
Here is the code to this.
function EmptyDir($dir) {
$handle=opendir($dir);
while (($file = readdir($handle))!==false) {
echo "$file
";
@unlink($dir.'/'.$file);
}
closedir($handle);
}
EmptyDir('images');
Here images is the directory name we want to empty
Article Source
How to delete a file using PHP ?
We can delete files by giving its URL or path in PHP by using unlink command. This command will work only if write permission is given to the folder or file. Without this the delete command will fail. Here is the command to delete the file.
unlink($path);
Here $path is the relative path of the file calculated from the script execution. Here is an example of deleting file by using relative path
$path="images/all11.css";
if(unlink($path)) echo "Deleted file ";
We have used if condition to check whether the file delete command is successful or not. But the command below will not work.
$path="http://domainname/file/red.jpg";
if(unlink($path)) echo "Deleted file ";
The warning message will say unlink() [function.unlink]: HTTP does not allow unlinking
Article Source
How to write to a file using PHP
We can write to a file by using fwrite() function PHP. Please note that we have to open the file in write mode and if write permission is there then only we can open it in write mode. If the file does not exist then one new file will be created. We can change the permission of the file also. You can read the content of a file by using fopen() function in PHP. This is the way to write entries to a guestbook, counter and many other scripts if you are not using any database for storing data. . Here we will see how to write to a file.
<?
$body_content="This is my content"; // Store some text to enter inside the file
$file_name="test_file.txt"; // file name
$fp = fopen ($file_name, "w");
// Open the file in write mode, if file does not exist then it will be created.
fwrite ($fp,$body_content); // entering data to the file
fclose ($fp); // closing the file pointer
chmod($file_name,0777); // changing the file permission.
?>
PHP File open to read internal file
WE can open a file or a URL to read by using fopen() function of PHP. While opening we can give mode of file open ( read, write.. etc ). By using fopen we can read any external url also. We can write to a file by using fwrite function. Let us start with reading one internal file ( of the same site ). We have a file name as delete.htm. We will use the command fopen() to open the file in read mode.
We will be using fread() function to read the content by using a file pointer. Fread() reads up to lengthbytes from the file pointer referenced by fd. Reading stops when length bytes have been read or EOF is reached, whichever comes first.
We have also used the function filesize() to know the size of the file and used it in the fread function.
We will be using all these functions to read the content of another file and print the content as out put. Here is the code.
<?
$filename = "delete.htm"; // This is at root of the file using this script.
$fd = fopen ($filename, "r"); // opening the file in read mode
$contents = fread ($fd, filesize($filename)); // reading the content of the file
fclose ($fd); // Closing the file pointer
echo $contents; // printing the file content of the file
?>
Article Source
Saturday, September 20, 2008
How a Perfect PHP Pagination Works ?
Pagination is a topic that has been done to death -- dozens of articles and reference classes can be found for the management of result sets ... however (and you knew there was a "however" coming there, didn't you?) I've always been disgruntled with the current offerings to date. In this article I offer an improved solution.
Some pagination classes require parameters, such as a database resource and an SQL string or two, to be passed to the constructor. Classes that utilize this approach are lacking in flexibility - what if you require a different formatting of page numbers at the top and bottom of your pages, for example? Do you then have to modify some output function, or subclass the entire class, just to override that one method? These potential "solutions" are restrictive and don't encourage code reuse.
This tutorial is an attempt to further abstract a class for managing result pagination, thereby removing its dependencies on database connections and SQL queries. The approach I'll discuss provides a measure of flexibility, allowing the developer to create his or her very own page layouts, and simply register them with the class through the use of an object oriented design pattern known as the Strategy Design Pattern.
What Is the Strategy Design Pattern?
Consider the following: you have on your site a handful of web pages for which the results of a query are paged. Your site uses a function or class that handles the retrieval of your results and the publishing of your paged links.
This is all well and good until you decide to change the layout of the paged links on one (or all) of the pages. In doing so, you're most likely going to have to modify the method to which this responsibility was delegated.
A better solution would be to create as many layouts as you like, and dynamically choose the one you desire at runtime. The Strategy Design Pattern allows you to do this. In a nutshell, the Strategy Design Pattern is an object oriented design pattern used by a class that wants to swap behavior at run time.
Using the polymorphic capabilities of PHP, a container class (such as the Paginated class that we'll build in this article) uses an object that implements an interface, and defines concrete implementations for the methods defined in that interface.
While an interface cannot be instantiated, it can reference implementing classes. So when we create a new layout, we can let the strategy or interface within the container (the Paginated class) reference the layouts dynamically at runtime. Calls that produce the paged links will therefore produce a page that's rendered with the currently referenced layout.
http://www.sitepoint.com/article/perfect-php-pagination/
What's new in PHP 5.3 ?
PHP 6 is just around the corner, but for developers who just can't wait, there's good news -- many of the features originally planned for PHP 6 have been back-ported to PHP 5.3, a final stable release of which is due in the first half of this year.
This news might also be welcomed by those that wish to use some of the new features, but whose hosting providers will not be upgrading to version 6 for some time -- hosting providers have traditionally delayed major version updates while acceptance testing is performed (read: the stability has been proven elsewhere first). Many hosting companies will probably delay upgrading their service offerings until version 6.1 to be released. A minor upgrade from 5.2.x to 5.3, however, will be less of a hurdle for most hosting companies.
This article introduces the new features, gives examples of where they might be useful, and provides demo code to get you up and running with the minimum of fuss. It doesn't cover topics such as installing PHP 5.3 -- the latest development release of which is currently available. If you'd like to play along with the code in this article, you should install PHP 5.3, then download the code archive. An article on installing PHP 5.3 can be
found on the Melbourne PHP Users Group web site.
Namespaces
Before the days of object oriented PHP, many application developers made use of verbose function names in order to avoid namespace clashes. Wordpress, for example, implements functions such as wp_update_post and wp_create_user. The wp_ prefix denotes that the function pertains to the Wordpress application, and reduces the chance of it clashing with any existing functions.
In an object oriented world, namespace clashes are less likely. Consider the following example code snippet, which is based on a fictional blogging application:
Read More Here
Friday, August 15, 2008
Web 3.0: Versions 4, 5, 6...
Is it too early to talk about Web 4.0? Of course not.
According to Danish editor Jens Roland, who's been tracking the increasingly common practice of assigning version numbers to the World Wide Web, at least one Internet pundit is already discussing Web 38.0. Roland hastens to point out that this discussion is most likely tongue-in-cheek. But even as Web 2.0 continues to mature and an assortment of ideas called Web 3.0 hits our collective consciousness, some people are actually giving serious thought to version 4.0. Go ahead. Google it.
Web 3.0 : An Introduction
Web 3.0: Tim, Lucy, and The Semantic Web
Web 3.0 :The Other Semantic Web
Web 3.0 :Semantics and Search
Web 3.0 : A Web Beyond Words
Web 3.0 : Tomorrow's Web, Today
Web 3.0 : An Idiot's Guide to Web 3.0
Web 3.0 : Questions of Semantics
Web 3.0 : Look, Ma, No Keywords!
Web 3.0: Versions 4, 5, 6...
Is it too early to talk about Web 4.0? Of course not.
According to Danish editor Jens Roland, who's been tracking the increasingly common practice of assigning version numbers to the World Wide Web, at least one Internet pundit is already discussing Web 38.0. Roland hastens to point out that this discussion is most likely tongue-in-cheek. But even as Web 2.0 continues to mature and an assortment of ideas called Web 3.0 hits our collective consciousness, some people are actually giving serious thought to version 4.0. Go ahead. Google it.
One of the first and most visible Web 4.0 pundits is Seth Godin, a technology-minded marketing guru with seven books to his name, including Unleashing the Ideavirus, billed as the most popular e-book ever. What does a marketing guru have to do with the future of Net? Everything. After all, these Web-wide version numbers have so much to do with spin.
Godin envisions Web 4.0, or Web4, as a place where you have even tighter online connections to your friends, family, and colleagues. "There are so many things the Web can do for me if it knows who my friends are, where they are, what they're doing, what they're interested in, how they can help me—and vice versa," he says.
On his future Web, if you start typing an e-mail proposing a particular business deal with Apple, a window pops up, telling you that one of your colleagues is already in talks with Apple. If you miss an airplane flight and book a new one with your cell phone, it automatically sends messages to the friends you're meeting for dinner, letting them know you'll be late. It sounds a lot like the Semantic Web—with less privacy. Will this actually happen? Will people relinquish that much information about their private lives? Who knows? It's just an idea. Of course, people like Seth Godin know a thing or two about spreading ideas.
Web 3.0 : Look, Ma, No Keywords!
Three new Web services reinvent the way we look for music and images.
You won't search for media with keywords in the future-—you'll search for media with media. To find an image, you'll supply another image. To find a song, you'll supply another song. Don't believe it? Three new services—image-crunchers Like.com and Polar Rose, and music-matchmaker Pandora—have already taken the first steps toward this new breed of media search.
Web 3.0 : An Introduction
Web 3.0: Tim, Lucy, and The Semantic Web
Web 3.0 :The Other Semantic Web
Web 3.0 :Semantics and Search
Web 3.0 : A Web Beyond Words
Web 3.0 : Tomorrow's Web, Today
Web 3.0 : An Idiot's Guide to Web 3.0
Web 3.0 : Questions of Semantics
Web 3.0 : Look, Ma, No Keywords!
Web 3.0: Versions 4, 5, 6...
Three new Web services reinvent the way we look for music and images.
You won't search for media with keywords in the future-—you'll search for media with media. To find an image, you'll supply another image. To find a song, you'll supply another song. Don't believe it? Three new services—image-crunchers Like.com and Polar Rose, and music-matchmaker Pandora—have already taken the first steps toward this new breed of media search.
Today, when you search the Web for music and images, you're merely searching for the words that surround them. When you visit Google Image Search and type in "Steve Jobs," you aren't really looking for photos of Apple's CEO. You're looking for filenames and captions that carry those keywords—"Steve" and "Jobs"—hoping the right photos are somewhere nearby.
There's a sizable difference between the two. On any given image search, Google turns up countless photos completely unrelated to your query, even as it misses out on countless others that may be a perfect match. In the end, you're relying on Web publishers to annotate their images accurately, and that's a hit-or-miss proposition.
The situation is much the same with MP3s, podcasts, and other sound files. When trolling Web-based music services, you can run a search on "Elvis" or "Jailhouse Rock." But what if you're looking for music that sounds like Elvis? Wouldn't it be nice if you could use one song to find other similar songs?
Ojos and Polar Rose are tackling the image side of the problem. Last spring, Ojos unveiled a Web-based photo--sharing tool called Riya, which automatically tags your pictures using face recognition. Rather than manually adding "Mom" tags to all your photos of Mom, you can show Riya what she looks like, and it adds the tags for you. The service is surprisingly accurate, gaining a huge following from the moment it hit the Web, but Ojos quickly realized that the Riya face-rec engine—which also identifies objects and words—could be used for Web-wide image search.
That's a mammoth undertaking, but, with an alpha service called Like.com, the company is already offering a simple prototype. Today, Like.com is little more than a shopping engine. You select a photo of a product that best represents what you're looking for, and the service shows all sorts of similar products. But it's an excellent proof-of-concept.
Meanwhile, Polar Rose (www.polarrose.com) recently introduced a browser plug-in that does face recognition with any photo posted to any Web site. For the moment, it's just a means of tagging images automatically—much like Riya. But unlike Riya, it already works across the length and breadth of the Net.
The closest equivalent when it comes to audio is Pandora, from a group of "musicians and music-loving technologists" called the Music Genome Project. Since its inception in 2000, the group has analyzed songs from over 10,000 artists, carefully notating the music makeup of each track. Using this data and a list of your favorite artists, Pandora can instantly construct a new collection of songs that suit your tastes. Again, this is hardly a Web-wide search engine, and unlike the image services from Ojos and Polar Rose, it relies heavily on up-front human input. But it's a step in the right direction. True media search is closer than you think.
Web 3.0 : Questions of Semantics
Tim Berners-Lee isn't the only man behind the Semantic Web. His 2001 Scientific American article, which introduced the concept to the world, was actually written in collaboration with two other eminent -researchers, Ora Lassila and Jim Hendler. Six years on, we tracked down Professor Hendler, now director of the Joint Institute for Knowledge Discovery at the University of Maryland and still one of the driving -forces behind this next-generation Internet.
Web 3.0 : An Introduction
Web 3.0: Tim, Lucy, and The Semantic Web
Web 3.0 :The Other Semantic Web
Web 3.0 :Semantics and Search
Web 3.0 : A Web Beyond Words
Web 3.0 : Tomorrow's Web, Today
Web 3.0 : An Idiot's Guide to Web 3.0
Web 3.0 : Questions of Semantics
Web 3.0 : Look, Ma, No Keywords!
Web 3.0: Versions 4, 5, 6...
Tim Berners-Lee isn't the only man behind the Semantic Web. His 2001 Scientific American article, which introduced the concept to the world, was actually written in collaboration with two other eminent -researchers, Ora Lassila and Jim Hendler. Six years on, we tracked down Professor Hendler, now director of the Joint Institute for Knowledge Discovery at the University of Maryland and still one of the driving -forces behind this next-generation Internet.
Q: Does the Semantic Web idea predate your now-famous Scientific American article—or was that the first mention?
A: That's the first time the term was coined and printed in a fairly accessible place. Recently, we've been looking for the absolute earliest use of the term Semantic Web, and it seems to go a bit further back, to a few small things Tim had written. He and some colleagues were using it locally within MIT and the surrounding community in the late nineties.
Q: The Semantic Web can be a difficult concept to grasp. How do you define it?
A: What the traditional Web does for the text documents in our lives, the Semantic Web does for all our data and information. Today, on my Web page, I can build a pointer to another Web page. But I can't link data together in the way I can link pages together. I can't point from a value in one database to some other value in some other database. To use a simple example, if your driver's license number is in one place and your vehicle identification number is in another, there should be a way of linking those two things together. There should be a way for machines to understand that those two things are related.
Q: Why is this so necessary?
A: Right now, it's very difficult to browse data on the Web. I can use a search engine that gives me the results of a query and draws them as a list, but I can't click on one of those values and see what it really means and what it's really related to. Today's social networking is trying to improve this, with things like tagging. But if you typed "polish" and I typed "polish," how do we know we're talking about the same thing? You might be talking about a language and I might be talking about something that goes on furniture. On the other hand, if those two names are precisely identified, they don't accidentally overlap and it's easier to understand the data we've published. So the technology of the Semantic Web is, in a sense, the technology of precise vocabularies.
Q: And this, in turn, would allow a machine to go out across the Web and find the things we're looking for?
A: Yes. It's very hard for this to happen with just language descriptions. Our idea is to have machine-readable information shadowing the human-readable stuff. So if I have a page that says, "My name is Jim Hendler. Here's a picture of my daughter," the machine realizes that I'm a person, that I have a first name and a last name, that I'm the father of another person, and that she's a female person. The level of information a machine needs would vary from application to application, but just a little of this could go a long way—as long as it can all be linked together. And the linking is the Web part of the Semantic Web. This is all about adding meaning to the stuff we put on the Web—and then linking that meaning together.
Thanks,
Webnology Blog Administrator