跳到主要内容

18 篇博文 含有标签「fury」

查看所有标签

· 阅读需 3 分钟
Shawn Yang

The Apache Fury team is pleased to announce the 0.9.0 release. This is a major release that includes 34 PR from 14 distinct contributors. See the Install Page to learn how to get the libraries for your platform.

Highlight

  • Optimized serializers for Fury kotlin support
  • Highly-optimized UTF-8 string encoding implementation for Java(2x faster than JDK utf8 encoding)
  • Reduce metastring hashcode payload for small string(<=16 bytes)
  • Support building C++ libs on windows platform

Features

Bug Fix

Other Improvements

New Contributors

Full Changelog: https://github.com/apache/fury/compare/v0.8.0...v0.9.0

Acknowledgements

Thanks @effigies @apupier @wywen @mandrean @HuangXingBo @pjfanning @chaokunyang @penguin-wwy @An-DJ @Forchapeatl @orisgarno @zhaommmmomo @caicancai @Aliothmoon

A big thank you to all our contributors who have worked hard on this release. Your contributions, whether through code, documentation, or issue reporting, are really appreciated.

· 阅读需 2 分钟
Shawn Yang

The Apache Fury team is pleased to announce the 0.8.0 release. This is a major release that includes 23 PR from 7 distinct contributors. See the Install Page to learn how to get the libraries for your platform.

Highlights

  • Support graalvm 17/21/22 native image
  • Release fury optimized serializers for scala collection
  • Reduce scala collection classname serialization cost using dict encoding

Features

Bug Fix

Other Improvements

New Contributors

Full Changelog: https://github.com/apache/fury/compare/v0.7.1...v0.8.0

Acknowledgements

Thanks @jiacai2050 @fink-arthur @sh-cho @pjfanning @chaokunyang @yoohaemin

A big thank you to all our contributors who have worked hard on this release. Your contributions, whether through code, documentation, or issue reporting, are really appreciated.

· 阅读需 2 分钟
Shawn Yang

The Apache Fury team is pleased to announce the 0.7.1 release. This is a major release that includes 20 PR from 8 distinct contributors. See the Install Page to learn how to get the libraries for your platform.

Features

Bug Fix

Other Improvements

New Contributors

Full Changelog: https://github.com/apache/fury/compare/v0.7.0...v0.7.1

Acknowledgements

Thanks @jiacai2050 @chaokunyang @theweipeng @funky-eyes @Forchapeatl @zhaommmmomo @yuluo-yx @LiangliangSui @LofiSu

A big thank you to all our contributors who have worked hard on this release. Your contributions, whether through code, documentation, or issue reporting, are really appreciated.

· 阅读需 4 分钟
Shawn Yang

The Apache Fury team is pleased to announce the 0.7.0 release. This is a major release that includes 24 PR from 7 distinct contributors. See the Install Page to learn how to get the libraries for your platform.

High Light

Experimental

Implement fast object deep copy framework for java:

Fury fury = Fury.builder().withRefCopy(true).build();
fury.register(SomeClass.class);
SomeClass a = xxx;
SomeClass copied = fury.copy(a);

Benchmark result:

BenchmarkobjectTypeScoreErrorUnits
fury_copyMEDIA_CONTENT1243297.690± 451828.452ops/s
fury_copySAMPLE2670545.816± 1378536.021ops/s
fury_copySTRUCT2673356.422± 202288.322ops/s
fury_copySTRUCT21943587.774± 392513.707ops/s
fury_copy_int_mapint map1470264.733± 1021875.257ops/s
fury_copy_listint list3556892.276± 127410.724ops/s
fury_copy_object_arrayarray4430589.112± 25366.893ops/s
fury_copy_string_mapstring map1736145.327± 377806.877ops/s
kryo_copyMEDIA_CONTENT804208.092± 27429.069ops/s
kryo_copySAMPLE717669.608± 71093.370ops/s
kryo_copySTRUCT1076048.642± 223194.146ops/s
kryo_copySTRUCT2141374.767± 14150.535ops/s
kryo_copy_int_mapint map546203.187± 54669.173ops/s
kryo_copy_listint list843643.496± 312306.921ops/s
kryo_copy_object_arrayobject array1593267.344± 1721824.436ops/s
kryo_copy_string_mapstring map574809.875± 47316.340ops/s

Features

Bug Fix

Other Improvements

New Contributors

Full Changelog: https://github.com/apache/fury/compare/v0.6.0...v0.7.0

Acknowledgements

Thanks @komamitsu @pjfanning @chaokunyang @weijiang157152688 @kitty-eu-org @urlyy @zhaommmmomo A big thank you to all our contributors who have worked hard on this release. Your contributions, whether through code, documentation, or issue reporting, are really appreciated.

Full Changelog: https://github.com/apache/fury/compare/v0.6.0...v0.7.0

· 阅读需 5 分钟
Shawn Yang

The Apache Fury team is pleased to announce the 0.6.0 release. This is a major release that includes 35 PR from 12 distinct contributors. See the Install Page to learn how to get the libraries for your platform.

High light

In this release, we introduced a scoped meta share mode for schema evolution in java and enabled it by default when CompatibleMode is set to Compatible:

  • This mode is 50% faster than previous KV compatible mode, and only 1/6 size of serialized payload than before.
  • It's 4x faster than protobuf, less than 1/2 serialized size of protobuf for complex object.

PerfomanceSize

Protobuf/JSON will write message fields meta and values in a KV layout, so when serializzing a list of message, they will have two issues:

  • Write meta multiple times even those message are the same type.
  • KV layout is dispersive, which is not friendly for compression.

The meta share mode will write field name&type meta of a struct only once for multiple objects of same type, which will save space and improve performance comparedto protobuf.

With meta share, we can write field name&type meta of a struct only once for multiple objects of same type, which will save space and improve performance comparedto protobuf. And we can also encode the meta into binary in advance, and use one memory copy to write it which will be much faster.

Serialize data

  public static class NumericStruct {
public int f1;
public int f2;
public int f3;
public int f4;
public int f5;
public int f6;
public int f7;
public int f8;

public static NumericStruct build() {
NumericStruct struct = new NumericStruct();
struct.f1 = 1;
struct.f2 = 2;
struct.f3 = 3;
struct.f4 = 4;
struct.f5 = 5;
struct.f6 = 6;
struct.f7 = 7;
struct.f8 = 8;
return struct;
}
}

public static class NumericStructList {
public List<NumericStruct> list;

public static NumericStructList build() {
NumericStructList structList = new NumericStructList();
structList.list = new ArrayList<>(1000);
for (int i = 0; i < 1000; i++) {
structList.list.add(NumericStruct.build());
}
return structList;
}

Result

Performance:

Benchmark                       Mode  Cnt      Score      Error  Units
fury_deserialize thrpt 30 49667.900 ± 3004.061 ops/s
fury_kv_compatible_deserialize thrpt 30 33014.595 ± 3716.199 ops/s
fury_kv_compatible_serialize thrpt 30 23915.260 ± 3968.119 ops/s
fury_serialize thrpt 30 63146.826 ± 2930.505 ops/s
protobuf_deserialize thrpt 30 14156.610 ± 685.272 ops/s
protobuf_serialize thrpt 30 10060.293 ± 706.064 ops/s

Size:

LibSerialized Payload Size
fury8077
furystrict8009
furykv48028
protobuf18000

Feature

Bug Fix

Others

New Contributors

Full Changelog: https://github.com/apache/fury/compare/v0.5.1...v0.6.0

· 阅读需 3 分钟
Shawn Yang

The Apache Fury team is pleased to announce the 0.5.1 release. This is a minor release that includes 36 PR from 7 distinct contributors. See the Install Page to learn how to get the libraries for your platform.

Feature

Bug Fix

Misc

New Contributors

Full Changelog: https://github.com/apache/fury/compare/v0.5.0...v0.5.1

· 阅读需 7 分钟
Shawn Yang

Background

In rpc/serialization systems, we often need to send namespace/path/filename/fieldName/packageName/moduleName/className/enumValue string between processes.

Those strings are mostly ascii strings. In order to transfer between processes, we encode such strings using utf-8 encodings. Such encoding will take one byte for every char, which is not space efficient actually.

If we take a deeper look, we will found that most chars are lowercase chars, ., $ and _, which can be expressed in a much smaller range 0~32. But one byte can represent range 0~255, the significant bits are wasted, and this cost is not ignorable. In a dynamic serialization framework, such meta will take considerable cost compared to actual data.

So we proposed a new string encoding algorithm which we called meta string encoding in Fury. It will encode most chars using 5 bits instead of 8 bits in utf-8 encoding, which can bring 37.5% space cost savings compared to utf-8 encoding.

Meta String Introduction

Meta string encoding algorithm is mainly used to encode meta strings such as field names, namespace, packageName, className, path and filename. Such a string is enumerated and limited, so the encoding performance is not important since we can cache the encoding result.

Meta string encoding uses 5/6 bits instead of 8 bits in utf-8 encoding for every chars. Since it uses less bits than utf8, it can bring 37.5% space cost savings compared to utf-8 and has a smaller encoded binary size, which uses less storage and makes the network transfer faster.

More details about meta string spec can be found in Fury xlang serialization specification.

Encoding Algorithms

String binary encoding algorithm:

AlgorithmPatternDescription
LOWER_SPECIALa-z._$|every char is written using 5 bits, a-z: 0b00000~0b11001, ._$|: 0b11010~0b11101, prepend one bit at the start to indicate whether strip last char since last byte may have 7 redundant bits(1 indicates strip last char)
LOWER_UPPER_DIGIT_SPECIALa-zA-Z0~9._every char is written using 6 bits, a-z: 0b00000~0b11001, A-Z: 0b11010~0b110011, 0~9: 0b110100~0b111101, ._: 0b111110~0b111111, prepend one bit at the start to indicate whether strip last char since last byte may have 7 redundant bits(1 indicates strip last char)
UTF-8any charsUTF-8 encoding

If we use LOWER_SPECIAL/LOWER_UPPER_DIGIT_SPECIAL, we must add a strip last char flag in encoded data. This is because every char will be encoded using 5/6 bits, and the last char may have 1~7 bits which are unused by encoding, such bits may cause an extra char to be read, which we must strip off.

Here is encoding code snippet in java, see org.apache.fury.meta.MetaStringEncoder#encodeGeneric(char[], int) for more details:

private byte[] encodeGeneric(char[] chars, int bitsPerChar) {
int totalBits = chars.length * bitsPerChar + 1;
int byteLength = (totalBits + 7) / 8; // Calculate number of needed bytes
byte[] bytes = new byte[byteLength];
int currentBit = 1;
for (char c : chars) {
int value =
(bitsPerChar == 5) ? charToValueLowerSpecial(c) : charToValueLowerUpperDigitSpecial(c);
// Encode the value in bitsPerChar bits
for (int i = bitsPerChar - 1; i >= 0; i--) {
if ((value & (1 << i)) != 0) {
// Set the bit in the byte array
int bytePos = currentBit / 8;
int bitPos = currentBit % 8;
bytes[bytePos] |= (byte) (1 << (7 - bitPos));
}
currentBit++;
}
}
boolean stripLastChar = bytes.length * 8 >= totalBits + bitsPerChar;
if (stripLastChar) {
bytes[0] = (byte) (bytes[0] | 0x80);
}
return bytes;
}

private int charToValueLowerSpecial(char c) {
if (c >= 'a' && c <= 'z') {
return c - 'a';
} else if (c == '.') {
return 26;
} else if (c == '_') {
return 27;
} else if (c == '$') {
return 28;
} else if (c == '|') {
return 29;
} else {
throw new IllegalArgumentException("Unsupported character for LOWER_SPECIAL encoding: " + c);
}
}

private int charToValueLowerUpperDigitSpecial(char c) {
if (c >= 'a' && c <= 'z') {
return c - 'a';
} else if (c >= 'A' && c <= 'Z') {
return 26 + (c - 'A');
} else if (c >= '0' && c <= '9') {
return 52 + (c - '0');
} else if (c == specialChar1) {
return 62;
} else if (c == specialChar2) {
return 63;
} else {
throw new IllegalArgumentException(
"Unsupported character for LOWER_UPPER_DIGIT_SPECIAL encoding: " + c);
}
}

Here is decoding code snippet in golang, see go/fury/meta/meta_string_decoder.go:70 for more details:

func (d *Decoder) decodeGeneric(data []byte, algorithm Encoding) ([]byte, error) {
bitsPerChar := 5
if algorithm == LOWER_UPPER_DIGIT_SPECIAL {
bitsPerChar = 6
}
// Retrieve 5 bits every iteration from data, convert them to characters, and save them to chars
// "abc" encodedBytes as [00000] [000,01] [00010] [0, corresponding to three bytes, which are 0, 68, 0
// Take the highest digit first, then the lower, in order

// here access data[0] before entering the loop, so we had to deal with empty data in Decode method
// totChars * bitsPerChar <= totBits < (totChars + 1) * bitsPerChar
stripLastChar := (data[0] & 0x80) >> 7
totBits := len(data)*8 - 1 - int(stripLastChar)*bitsPerChar
totChars := totBits / bitsPerChar
chars := make([]byte, totChars)
bitPos, bitCount := 6, 1 // first highest bit indicates whether strip last char
for i := 0; i < totChars; i++ {
var val byte = 0
for i := 0; i < bitsPerChar; i++ {
if data[bitCount/8]&(1<<bitPos) > 0 {
val |= 1 << (bitsPerChar - i - 1)
}
bitPos = (bitPos - 1 + 8) % 8
bitCount++
}
ch, err := d.decodeChar(val, algorithm)
if err != nil {
return nil, err
}
chars[i] = ch
}
return chars, nil
}

Select Best Encoding

For most lowercase characters, meta string will use 5 bits to encode every char. For string containing uppercase chars, meta string will try to convert the string into a lower case representation by inserting some markers, and compare used bytes with 6 bits encoding, then select the encoding which has smaller encoded size.

Here is the common encoding selection strategy:

Encoding FlagPatternEncoding Algorithm
LOWER_SPECIALevery char is in a-z._|LOWER_SPECIAL
FIRST_TO_LOWER_SPECIALevery char is in a-z._ except first char is upper casereplace first upper case char to lower case, then use LOWER_SPECIAL
ALL_TO_LOWER_SPECIALevery char is in a-zA-Z._replace every upper case char by | + lower case, then use LOWER_SPECIAL, use this encoding if it's smaller than Encoding LOWER_UPPER_DIGIT_SPECIAL
LOWER_UPPER_DIGIT_SPECIALevery char is in a-zA-Z._use LOWER_UPPER_DIGIT_SPECIAL encoding if it's smaller than Encoding FIRST_TO_LOWER_SPECIAL
UTF8any utf-8 charuse UTF-8 encoding
Compressionany utf-8 charlossless compression

For package name, module name or namespace, LOWER_SPECIAL will be used mostly. ALL_TO_LOWER_SPECIAL can be used too, since it can represent the same chars as LOWER_SPECIAL without using more bits, but also support string with uppercase chars.

For className, FIRST_TO_LOWER_SPECIAL will be used mostly. If there are multiple uppercase chars, then ALL_TO_LOWER_SPECIAL will be used instead. If a string contains digits, then LOWER_UPPER_DIGIT_SPECIAL will be used.

Finally, utf8 will be the fallback encoding if the string contains some chars which is not in range a-z0-9A-Z.

Encoding Flags and Data jointly

  • Depending on the case, one can choose encoding flags + data jointly, using 3 bits of first byte for flags and other bytes for data. This can be useful since there are some holes remaining in last byte, adding flags in data doesn't always increase serialized bytes size.
  • Or one can use a header to encode such flags with other meta such as encoded size, this is what Fury does in https://github.com/apache/fury/pull/1556

Benchmark

utf8 encoding uses 30 bytes for string org.apache.fury.benchmark.data, fury meta string uses only 19 bytes. utf8 encoding uses 12 bytes for string MediaContent, but fury meta string uses only 9 bytes.

// utf8 use 30 bytes, we use only 19 bytes
assertEquals(encoder.encode("org.apache.fury.benchmark.data").getBytes().length, 19);
// utf8 uses 12 bytes, we use only 9 bytes.
assertEquals(encoder.encode("MediaContent").getBytes().length, 9);

· 阅读需 6 分钟
Shawn Yang

We're excited to announce the release of Fury v0.5.0. This release incorporates a myriad of improvements, bug fixes, and new features across multiple languages including Java, Golang, Python and JavaScript. It further refines Fury's performance, compatibility, and developer experience.

New Features

Specification

  • Introduced fury cross-language serialization specification (#1413, #1508)
  • Introduced xlang type mapping (#1468)
  • Introduced fury java specification (#1240)
  • Introduced meta string encoding specification (#1565, #1513, #1517)

Java

  • Support for compatible mode with GraalVM (#1586, #1587).
  • Support unexisted array/enum classes and enabled deserializeUnexistedClass by default (#1569, #1575).
  • meta string encoding algorithm in java (#1514, #1568, #1516, #1565)
  • Support meta string encoding for classname and package name (#1527)
  • native streaming mode deserialization (#1451, #1551)
  • native channel stream reader (#1483)
  • Support registration in thread safe fury (#1280)
  • Implement fury logger and remove slf4j library (#1485, #1494, #1506, #1492)
  • Support adjust logger level dynamically (#1557)
  • Support jdk proxy serialization for graalvm (#1379)
  • Specify JPMS module names (#1343)
  • Align string array to collection protocol v2 (#1228)

JavaScript

  • Align implementation to new Xlang protocol (#1487)
  • Implement Xlang map (#1549)
  • Implemented xlang map code generator (#1571)
  • Added magic number feature for better serialization control (#1550).
  • Support oneof (#1348)
  • create zero-copy buffer when convert (#1386)
  • Implement the collection protocol (#1337)
  • Implement Enum (#1321)
  • compress numbers (#1290)

C++

  • Support optional fields/elements in RowEncoder (#1223)
  • Support mapping types for RowEncodeTrait (#1247)

Golang

  • Implemented Fury meta string encoding algorithm (#1566).
  • concat meta string len with flags (#1517)

Enhancements

Java

  • Improved buffer growth strategy to support larger data sizes for serialization (#1582).
  • Performance optimizations for MetaStringDecoder and various serialization processes (#1568, #1511, #1493).
  • concat write classname flag with package name (#1523)
  • concat meta string len with flags (#1517)
  • fastpath for read/write small varint in range [0,127] (#1503)
  • optimize read float/double for jvm jit inline (#1472)
  • replace Guava's TypeToken with self-made (#1553)
  • Remove basic guava API usage (#1244)
  • optimize fury creation speed (#1511)
  • optimize string serialization by concat coder and length (#1486)
  • carry read objects when deserialization fail for better trouble shooting (#1420)
  • implement define_class insteadof using javaassist (#1422)
  • avoid recompilation when gc happens for memory pressure (#1411, #1585)
  • Fix immutable collection ref tracking (#1403)
  • reduce fury caller stack (#1496)
  • Extract BaseFury interface (#1382)
  • refine collection builder util (#1334)
  • disable async compilation for graalvm (#1222)
  • refine endian check code size in buffer (#1501)
  • generate list fori loop instead of iterator loop for list serialization (#1493)
  • Reduce unsafeWritePositiveVarLong bytecode size. (#1491)
  • Reduce unsafePutPositiveVarInt bytecode size. (#1490, #1489)
  • optimize read char/short jvm jit inline (#1471)
  • reduce code size of read long to optimize jvm jit inline (#1470)
  • reduce readInt/readVarInt code size for for jvm jit inline (#1469)
  • refactor readVarUint32 algorithm (#1462)
  • rewrite readVarUint64 algorithm (#1463)

JavaScript

  • Make PlatformBuffer available if has Buffer polyfill (#1373)
  • enhance performance 64bits number (#1320)
  • Refactor & Compress Long (#1313)
  • Improve tag write performance (#1241)
  • Add more methods for BinaryReader (#1231)
  • Implements tuple serializer (#1216)

Python

  • concat meta string len with flags (#1517)

Bug Fix

Java

  • Fix bytebuffer no such method error (#1580)
  • Prevent exception in ObjectArray.clearObjectArray() (#1573)
  • Fix slf4j on graalvm (#1432)
  • Fix illegal classname caused by negative hash (#1436)
  • Fix BigDecimal serializer (#1431)
  • Fix BigInteger serialization (#1479)
  • Fix type conflict in method split (#1371)
  • Fix CodeGen Name conflicts when omitting java.lang prefix #1363 (#1366)
  • Fix ClassLoader npe in loadOrGenCodecClass (#1346)
  • Fix big buffer trunc (#1402)
  • Make Blacklist detection is also performed when the Class is registered. (#1398)
  • avoid big object graph cause buffer take up too much memory (#1397)
  • Fix get static field by unsafe (#1380)
  • Fix javax package for accessor codegen (#1388)
  • Fix nested collection cast for scala/java (#1333)
  • Fix References within InvocationHandler (#1365)
  • Allow partial read of serialized size from InputStream (#1391)
  • add potential missing bean class-loader (#1381)
  • Fix polymorphic array serialization (#1324)
  • Fix nested collection num elements (#1306)
  • Fix collection init size typo (#1342)
  • Clear extRegistry.getClassCtx if generate serializer class failed (#1221)

Rust

  • Fix memory errors caused by casting (#1372)
  • Fix incorrect cast (#1345)

Miscellaneous

  • Numerous code cleanups, refactorings, and internal improvements across all supported languages to enhance code quality and maintainability.
  • Moved various utilities into more appropriate packages to improve code organization and readability (#1584, #1583, #1578).
  • rename MemoryBuffer read/write/put/getType with read/write/put/getTypeNumber (#1480, #1464, #1505, #1500)
  • extract public Fury methods to BaseFury (#1467)
  • Optimize Class ID allocation. (#1406)
  • refine Collection util data structure (#1287) (#1288)
  • Improve Status by using unique_ptr (#1234)
  • Improve FormatTimePoint by removing sstream (#1233)
  • Drop optional chaining expression (#1338)

New Contributors

Acknowledgements

Thanks @chaokunyang @theweipeng @PragmaTwice @LiangliangSui @nandakumar131 @Munoon @qingoba @vesense @liuxiaocs7 @mtf90 @bowin @cn-at-osmit @Maurice-Betzel @phogh @laglangyue @tommyettinger @huisman6 @pixeeai

A big thank you to all our contributors who have worked hard on this release. Your contributions, whether through code, documentation, or issue reporting, are really appreciated.

Full Changelog: https://github.com/apache/fury/compare/v0.4.1...v0.5.0

· 阅读需 12 分钟
Shawn Yang

Apache Fury (incubating) is a multi-language serialization framework powered by JIT dynamic compilation and zero copy. It implements multi-language SDKs: Java, Python, Golang, JavaScript, Rust, C++. It provides automatic multi-language objects serialization features, and 170x speedup compared to JDK serialization.

· 阅读需 2 分钟
Shawn Yang
信息

This release was made before Fury joined the Apache Incubator and thus it's a non-ASF release.

I'm pleased to announce the 0.4.1 release of the Fury: https://github.com/alipay/fury/releases/tag/v0.4.1. With this release, Fury support rust row format now. C++ row format has been enhanced too, now iterable types can be encoded to fury row format. Please try it out and share your feedbacks with us.

Author: chaokunyang

I'm pleased to announce the 0.4.1 release of the Fury: https://github.com/alipay/fury/releases/tag/v0.4.1. With this release, Fury support rust row format now. C++ row format has been enhanced too, now iterable types can be encoded to fury row format too.

Highlight

  • [Rust] Support row format
  • [C++] Support iterable types for RowEncoder
  • [JavaScript] Support partial record
  • [Java] Fix JIT error in corner case, now Fury can generate serializer for every class

What's Changed

New Contributors

Full Changelog: https://github.com/alipay/fury/compare/v0.4.0...v0.4.1