WordPress

Demystifying comments migration from Disqus to WordPress

Some time ago, I had switched from a self-hosted wordpress blog to a statically generated (Jekyll) blog hosted on Github Pages. For a commenting system, Disqus was quite an easy choice at that time since it was zero hassle for us site owners, and Disqus did all the heavy lifting from filtering the comments to storing and displaying them.

But as time went on, I started realizing that implementing a static blog was not quite the right thing. Firstly, there were privacy issues around Disqus because of which many readers of my blog were discouraged from commenting. Secondly, the concept of a “static site” itself felt quite constraining to me as I couldn’t implement things like contact form or a questionnaire to interact with my viewers. As a result, I decided to switch back to a plain old self-hosted wordpress blog.

Now, importing the posts was quite straightforward using the Jekyll generated RSS feed link that was pretty straightforward to use. In case you don’t have it in your Jekyll blog, its very easy to write one using liquid template. Just create a file named rss.xml in your root folder with below contents:

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
 <channel>
 <title>{{ site.name | xml_escape }} - Articles</title>
 <description>{% if site.description %}{{ site.description | xml_escape }}{% endif %}</description>
 <link>
 {{ site.url }}</link>
 {% for post in site.posts %}
 {% unless post.link %}
 <item>
 <title>{{ post.title | xml_escape }}</title>
 <description>{{ post.content | xml_escape }}</description>
 <pubDate>{{ post.date | date: "%a, %d %b %Y %H:%M:%S %z" }}</pubDate>
 <link>
 {{ site.url }}{{ post.url }}</link>
 <guid isPermaLink="true">{{ site.url }}{{ post.url }}</guid>
 </item>
 {% endunless %}
 {% endfor %}
 </channel>
</rss>

Once you generate the blog, you can import all posts into your wordpress by referring to the /rss.xml url on your existing Jekyll blog.

However, the bigger issue here was importing the Disqus comments, becuase while Disqus does allow you to export a dump of your site comments, their XML format is pretty weird and isn’t the standard one used by wordpress and other blogging systems, as a result of which there aren’t too many ready tools for importing comments from this format to any other system.

As a result, I had to write my own WordPress importer tool. Since I did not want to go through the hassle of learning to create an “admin plugin” with all the bells and whistles, I decided to write a simple PHP console script to import the XML as mentioned here.

All I needed were two scripts: A parser script to parse the XML output of Disqus comments dump, and secondly, a wordpress handler that loops through these comments and imports them one by one by matching the post’s url attribute and running wp_new_comment() to insert the comment (you can also use the older wp_insert_comment(), but its not the recommended way according to the wordpress codex).

Below is the source code for both these files. First one, console.php is the wordpress handler that you need to run, passing the path of the Disqus comments dump file. And disqus_parse.php is the parser which is called internally by console.php. You need to copy these two files anywhere inside your WP folder structure (I copied them to /wp-content/plugins/test/ folder), and run the console.php from the command line:

console.php:

<?php
/*
 * Tool to import disqus comments to the wordpress system.
 * 
 * @author Prahlad Yeri<prahladyeri@yahoo.com>
 * @date 2017-09-06
 * */
require_once('disqus_parse.php');
if( php_sapi_name() !== 'cli' ) {die("Meant to be run from command line");}
if (count($argv) < 2) { print("Incorrect arguments. Provide path to file\n"); exit; }
$comments = parse_disqus_file($argv[1]);
// $exp = 0;
// $added = [];
// foreach($comments as $comm) {
	// $url = $comm['url'];
	// if (strpos($url, 'prahladyeri.com') && array_search($url, $added)===false) {
		// echo $url."\n";
		// array_push($added, $url);
	// }
	// $exp++;
// }
// echo "exp: $exp\n";
// exit();
function find_wordpress_base_path() {
	$dir = dirname(__FILE__);
	do {
		//it is possible to check for other files here
		if( file_exists($dir."/wp-config.php") ) {
			return $dir;
		}
	} while( $dir = realpath("$dir/..") );
	return null;
}
define( 'BASE_PATH', find_wordpress_base_path()."/" );
define('WP_USE_THEMES', false);
global $wp, $wp_query, $wp_the_query, $wp_rewrite, $wp_did_header;
require(BASE_PATH . 'wp-load.php');
print(BASE_PATH.'wp-load.php'."\n");
print("Ready.");
#print_r($wpdb);
$q = new WP_Query(array('post_type'=>'post', 'posts_per_page' => -1));
$cnt = 0;
$processed = [];
while($q->have_posts()) {
	$t_post = $q->the_post();
	$t_title = get_the_title($t_post);
	//$t_title = the_title();
	$t_url = get_permalink($t_post);
	$t_url = str_replace("http://", "", $t_url);
	$t_url = str_replace("/apps/wp", "", $t_url);
	$idx = strpos($t_url, "/");
	$t_url = substr($t_url, $idx);
	array_push($processed, $t_url);
	
	echo 'processing '. $t_title  ."\n";
	$arr = array();
	foreach($comments as $comment){
		$idx = strpos($comment['url'], "/");
		$c_url = substr($comment['url'],$idx);
		if ($c_url === $t_url) {
			//echo "Match found:\n";
			//echo $t_url."\n";
			//echo $c_url."\n";
			//echo $post->ID."\n";
			$cnt++;
			//todo: insert this comment
			if (!array_search($post->ID.$comment['email'].$comment['body'], $arr)) {
				wp_new_comment(array(
					'comment_post_ID' => $post->ID,
					'comment_author' => $comment['name'],
					'comment_author_email' => $comment['email'],
					'comment_author_url' => '',
					'comment_content' => $comment['body'],
					'comment_type' => '',//empty for regular comments, 'pingback' for pingbacks, 'trackback' for trackbacks
					'comment_parent' => 0, //0 if it's not a reply to another comment;
					'comment_date' => $comment['created_at'],
					//'user_id' => $current_user->ID,
				));
			}
			array_push($arr, $post->ID.$comment['email'].$comment['body']);
			//sleep(1);
		}
		else {
			if (strpos($c_url,'mars') && strpos($t_url,'mars')) {
				echo "Match not found:\n";
				echo $t_url."\n";
				echo $c_url."\n\n";
				//$cnt++;
			}
		}
	}
}
echo "$cnt comments imported.\n";
// foreach ($added as $url) {
	// $url = str_replace("www.prahladyeri.com","",$url);
	// if (!array_search($url, $processed)) {
		// echo "Not processed: $url\n"."\n";
	// }
// }

disqus_parse.php

<?php
/*
 * Tool to parse disqus comments xml file.
 * 
 * @author Prahlad Yeri<prahladyeri@yahoo.com>
 * @date 2017-09-06
 * */
 
function find_url($root, $thid) {
	foreach($root->thread as $thread) {
		$curr_id =  $thread->attributes("dsq", true)[0];
		if (strcmp($thid,$curr_id) === 0) {
			$url = $thread->id;
			return $url;
		}
	}
	return null;
}
function parse_disqus_file($filename) {
	$xmlstr = file_get_contents($filename);
	$root = new SimpleXMLElement($xmlstr);
	$posts = $root->post;
	$cnt = 0;
	$result = [];
	
	foreach($posts as $post) {
		$thid = "";
		foreach($post->thread->attributes("dsq",true) as $a=>$b) {
			$thid = $b;
			break;
		}
		$url = find_url($root, $thid);
		$values = array(
			"body"=>strip_tags($post->message),
			"name"=>$post->author->name,
			"email"=>$post->author->email,
			//"created_at"=>date("Y-m-dTH:i:sZ", $post->created_at),
			//"created_at"=>date_create($post->createdAt),
			"created_at"=>$post->createdAt,
			"url"=>$url,
		);
		array_push($result, $values);
		
		$cnt++;
	}
	return $result;
}

Finally, just keep one thing in mind before running the console.php. Your wordpress system might throw an exception in case it detects too many comments being inserted in a loop. To suppress that exception, you need to add the following line of code to the end of your theme’s functions.php to disable the comment flood filter:

add_filter('comment_flood_filter', '__return_false');

Of course, remember to comment that line once you are done importing the comments. Also, let me know through the comments below how your migration went.

Prahlad Yeri

Prahlad is a freelance software developer working on web and mobile application development. He also likes to blog about programming and contribute to opensource.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button