Details of partial degradation of pool on 14 September 2019

We at believe that our communications with miners should be based on honesty and transparency, it is essential for building a long term relationship with our users.

We want to share some details about the events that led to the partial degradation of the pool on 14 September 2019, which affected the shares sent in by miners.

Mining rewards for the lost shares have been distributed to all of the affected miners on 17 September to compensate for the loses.

We will provide more background to give you the idea of what happened and how it was addressed by team.

14 September at 6:58 am (GMT+3)

At this time pool experienced the first errors saving the shares sent by miners. Meanwhile, the mining itself was not disturbed, which is why miners could not switch to a reserve pool. The reason for the saving errors was disk space insufficiency.

Two mistakes that led to the incident.

On 2 August 2019, our monitoring system has sent a notification about approaching a critical disc space capacity. To address this, we have adjusted the rules for shares storage, decreasing the time of share storage. This was meant to delete the unnecessary old data and free up the disc space for the operating system. But that didn’t happen, and space was still occupied by the database since PostgreSQL doesn’t return disk space to OS immediately, instead, it marks data available for a rewrite.

Image for post
Image for post

To allocate more disc space to the operating system, we would have to compress the table, which means it would be unavailable for some time, and thus mining would be paused.

We didn’t want to interfere with your mining process, and that was the first mistake.

Also, we didn’t change a threshold for critical disc space capacity in our monitoring system, and that was a second mistake.

14 September 2019 at 08:41 am (GMT+3)

Since the monitoring system was quiet, we only learned about the incident from our miners in Telegram channel at 08:41 (GMT+3) and immediately started the investigation to identify the root cause.

14 September 2019 at 09:06 am (GMT+3)

We have found what is causing the error and were able to create the necessary space for the shares to minimize the impact on our miners. After that, we started table compression activities. Our team has transferred the data for the past 2 days to a newly created table and deleted the old one. This helped to free up more than 50% of disc capacity with minimal efforts.

14 September 2019 at 10:06 am (GMT+3)

Table compression was completed, and the pool got back to fully operational mode.

Lessons learned

We are committed to delivering the smooth mining process and user experience for our miners; therefore, we are soon going to enhance our monitoring and incident management system based on the best practices and new approaches.

Please feel free to give us your feedback and comment on this article in our Telegram channel:

Increase your profit with EZIL mining: ZIL+ETH / ZIL+ETC. ZIL staking node operator.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store