Exploring UltiHash: Innovations in Byte-Level Data Deduplication and Its Competitive Edge

UltiHash: A Competitive Edge in Data Infrastructure and Deduplication Technology

UltiHash is a startup from Germany that has developed a unique byte-level deduplication algorithm. The company was established by Tom Lüdersdorf. Their technology is designed to eliminate redundant data across various infrastructures, which improves storage efficiency. This allows businesses to store more data without the need to expand their physical resources.

UltiHash, a company in the data infrastructure and deduplication technology space, does face competition from companies like Siapp, Eonia Labs, and Leap. These companies offer various solutions in the field of data infrastructure, which could be seen as alternatives to UltiHash’s technology. The competitive landscape in this field is dynamic and can change rapidly with technological advancements and market trends.

However, UltiHash has several advantages that potentially make it stand out from its competitors:

Byte-Level Deduplication: UltiHash’s unique byte-level deduplication algorithm can deduplicate data better than existing alternatives, providing faster data access.

Storage Efficiency: The company’s lossless deduplication is claimed to cut storage costs by up to 50 percent, an overall 2:1 dedupe ratio.

Speed Advantages: UltiHash also claims speed advantages, saying its software is up to 50 percent faster on reads (GETs) than Amazon S3 when benchmarked on TIFF files.

Scalability: UltiHash’s deduplication software is deployed in an S3-compatible object storage cluster that can run on-premises or in AWS. The cluster can scale horizontally with variable-sized data nodes supporting petabyte-scale volumes.

Integration and Adoption: UltiHash integrates with S3-native applications and services, helping with its adoption.

Built-in Features: It has built-in features for data backup and recovery, ensuring high availability and business continuity. The software supports multi-tenant environments with robust access control, ensuring secure segmentation and user management.

Monitoring: UltiHash provides monitoring for real-time insights into storage usage, performance, and operational trends.

The technology developed by UltiHash is particularly beneficial for high-performance applications such as machine learning, AI, product engineering, and analytics. The company claims that its technology can decrease volume-dependent infrastructure needs by up to 50%.

UltiHash’s deduplication software is deployed in an S3-compatible object storage cluster that can run on-premises or in AWS. This smart technology not only maximizes storage efficiency but also leads to significant resource savings, all while maintaining optimal performance.

The company recently secured $2.5 million in pre-seed funding. The team at UltiHash is working towards making data growth sustainable and transforming data storage. They aim to unlock the potential of data by significantly streamlining storage, enabling industries to harness the power of data more effectively. This includes enhancing medical research through data-rich healthcare insights, advancing manufacturing analytics, and fueling AI innovations.

In terms of major milestones, UltiHash has made significant strides since its inception. The company has developed a high-performance deduplication technology that increases the sustainability of global data. It has also secured $2.5M in funding to further its mission of making data storage simple and sustainable. The company’s technology has been recognized for its efficiency, sustainability, and high performance, particularly in applications such as machine learning, AI, product engineering, and analytics.

Read Also: Technology News and Trends: AI, Storage, Supercomputing and The Vast Potential

image

The concept of a deduplication algorithm

Deduplication is a process used in computing to eliminate duplicate copies of repeating data. This technique is used to improve storage utilization, which can lower costs by reducing the amount of storage media required to meet storage capacity needs.

Here’s a simple example to illustrate how it works: Imagine you have a music playlist with multiple copies of the same song. Deduplication is like removing those extra copies and keeping just one. Now, whenever you want to play that song, you refer back to the single saved copy instead of having multiple copies taking up unnecessary space.

In terms of its application in data storage, deduplication works by analyzing data ‘chunks’ or ‘byte patterns’, which are unique, contiguous blocks of data. These chunks are identified and stored during a process of analysis, and compared to other chunks within existing data. Whenever a match occurs, the redundant chunk is replaced with a small reference that points to the stored chunk. This greatly reduces the amount of data that must be stored or transferred.

Read Also: DataCore bringing block-level tiering to SANsymphony

Deduplication is different from data compression algorithms. While compression algorithms identify redundant data inside individual files and encode this redundant data more efficiently, the intent of deduplication is to inspect large volumes of data and identify large sections – such as entire files or large sections of files – that are identical, and replace them with a shared copy.

In summary, deduplication is a powerful tool in data management that helps to improve storage efficiency, reduce costs, and optimize data transfers.

UltiHash’s byte-level deduplication algorithm

UltiHash’s byte-level deduplication algorithm works by analyzing data at the byte level and dynamically splitting it into fragments of different sizes. The process involves the following steps:

  1. Files and folders are analyzed at the byte level, and dynamically split into fragments of different sizes.
  2. All repeated fragments in the dataset are losslessly deduplicated to save space, leaving only unique fragments. If a fragment has been stored in the cluster before, it is matched; brand-new fragments are added as normal.
  3. Incoming data is scanned and repetitious variable sized byte-level combinations are replaced by markers. UltiHash says its dedupe operates both within and across datasets and is independent of structured, semi-structured and unstructured data data types such as text, images, videos, audio files, database records and so forth.

Read Also: Storage News Ticker Dec 2023, Opportunities in 2024

The deduplication software is deployed in an S3-compatible object storage cluster that can run on-premises or in AWS. The cluster has a head node and data nodes. Clusters can scale horizontally with variable sized data nodes supporting petabyte-scale volumes.

UltiHash integrates with S3-native applications and services, helping with its adoption. It says it has built-in features for data backup and recovery, ensuring high availability and business continuity. The software supports multi-tenant environments with robust access control, ensuring secure segmentation and user management and has monitoring for real-time insights into storage usage, performance, and operational trends.

The company’s lossless dedupe is claimed to cut storage costs by up to 50 percent, an overall 2:1 dedupe ratio. UltiHash also claims speed advantages, saying its software is up to 50 percent faster on reads (GETs) than Amazon S3 when benchmarked on TIFF files. However, the performance picture versus S3 places UltiHash at a disadvantage with writes (PUTs) for RAW, TIFF, CSV, PNG and XML files.

Read Also: Microsoft advances toward glass-based archival storage

In terms of CPU usage, some CPU is used only during the “write” phase. Since UltiHash, in general, divides between “write” and “read” activities and can provide separate nodes for them (standard practice for any high-load IO solution). For on-premise it would mean that you need one CPU heavier machine only, and the rest can be more general-purpose nodes.

Some potential applications and uses of UltiHash’s byte-level deduplication algorithm

Data Storage: The algorithm can be used to improve the efficiency of data storage systems by eliminating redundant data, thereby allowing more data to be stored without expanding physical resources.

High-Performance Applications: The technology is particularly beneficial for high-performance applications such as machine learning, AI, product engineering, and analytics.

Data Backup and Recovery: UltiHash’s technology has built-in features for data backup and recovery, ensuring high availability and business continuity.

Multi-Tenant Environments: The software supports multi-tenant environments with robust access control, ensuring secure segmentation and user management.

Real-Time Monitoring: It provides monitoring for real-time insights into storage usage, performance, and operational trends.

Remote Synchronization: The algorithm can be used in remote synchronization applications to ensure data consistency across different systems or locations.

Backup Storage Systems: It can be used in backup storage systems to reduce the storage space required for backups by eliminating redundant data.

Large-Scale Storage Systems: By setting the storage pool and sharing resources, it avoids different users preparing their free storage space. This can significantly reduce backup data storage, reducing storage capacity, space, and energy consumption.

Opportunity for The Magnificent Seven

Several tech giants could potentially be interested in the advantages offered by UltiHash’s byte-level deduplication algorithm, especially those involved in data storage, cloud services, and high-performance computing. Here are a few examples:

  1. Amazon Web Services (AWS): Given that UltiHash’s software is compatible with AWS’s S3 storage service, AWS could potentially leverage UltiHash’s technology to enhance its own data storage efficiency and performance.
  2. Google Cloud: Google Cloud could benefit from the storage efficiency and high-speed data access provided by UltiHash’s technology, enhancing its cloud storage offerings.
  3. Microsoft Azure: As a major player in cloud services and AI, Microsoft could potentially use UltiHash’s technology to improve the efficiency of its data storage and high-performance applications.
  4. IBM: With its focus on enterprise-level solutions, IBM could potentially use UltiHash’s technology to enhance its data storage solutions and high-performance computing offerings.
  5. Oracle: Known for its database solutions, Oracle could potentially benefit from UltiHash’s technology to enhance the efficiency and performance of its data storage.

Read Also: Cerabyte demos ceramic-coated glass storage system

Please note that this is speculative and based on the potential fit between these companies’ current focus areas and the benefits of UltiHash’s technology. The actual interest of these companies would depend on various factors including their current technology stack, strategic focus, and business needs. It’s always a good idea to check the latest news for the most current information.

Read More about ultihash and a select of scientific papers in field of deduplication

“A Hybrid Encryption for Secure Data Deduplication in Cloud” published in the International Journal of Cloud Computing

“Unbalanced Big Data-Compatible Cloud Storage Method Based on Redundancy Elimination Technology” published in Scientific Programming

“The Analysis and Implication of Data Deduplication in Digital Forensics” published in Cyberspace Safety and Security – Lecture Notes in Computer Science

“Comparison of Ciphertext Features for Data Deduplication” published in 2021 International Conference on Big Data Analytics for Cyber-Physical System in Smart City – Lecture Notes on Data Engineering and Communications Technologies

“Privacy-Enhanced Data Deduplication Computational Intelligence Technique for Secure Healthcare Applications” published in Computers Materials & Continua

VeriDedup: A Verifiable Cloud Data Deduplication Scheme with Integrity and Duplication Proof

Data Deduplication with Random Substitutions

Tagged , , , , , , , , , , , , , , , , , . Bookmark the permalink.

Comments are closed.