Collective operations are building blocks for interaction patterns, that are often used in SPMD algorithms in the parallel programming context. Hence, there is an interest in efficient realizations of these operations. A realization of the collective operations is provided by the Message Passing Interface (MPI). == Definitions == In all asymptotic runtime functions, we denote the latency α {\displaystyle \alpha } (or startup time per message, independent of message size), the communication cost per word β {\displaystyle \beta } , the number of processing units p {\displaystyle p} and the input size per node n {\displaystyle n} . In cases where we have initial messages on more than one node we assume that all local messages are of the same size. To address individual processing units we use p i ∈ { p 0 , p 1 , … , p p − 1 } {\displaystyle p_{i}\in \{p_{0},p_{1},\dots ,p_{p-1}\}} . If we do not have an equal distribution, i.e. node p i {\displaystyle p_{i}} has a message of size n i {\displaystyle n_{i}} , we get an upper bound for the runtime by setting n = max ( n 0 , n 1 , … , n p − 1 ) {\displaystyle n=\max(n_{0},n_{1},\dots ,n_{p-1})} . A distributed memory model is assumed. The concepts are similar for the shared memory model. However, shared memory systems can provide hardware support for some operations like broadcast (§ Broadcast) for example, which allows convenient concurrent read. Thus, new algorithmic possibilities can become available. == Broadcast == The broadcast pattern is used to distribute data from one processing unit to all processing units, which is often needed in SPMD parallel programs to dispense input or global values. Broadcast can be interpreted as an inverse version of the reduce pattern (§ Reduce). Initially only root r {\displaystyle r} with i d {\displaystyle id} 0 {\displaystyle 0} stores message m {\displaystyle m} . During broadcast m {\displaystyle m} is sent to the remaining processing units, so that eventually m {\displaystyle m} is available to all processing units. Since an implementation by means of a sequential for-loop with p − 1 {\displaystyle p-1} iterations becomes a bottleneck, divide-and-conquer approaches are common. One possibility is to utilize a binomial tree structure with the requirement that p {\displaystyle p} has to be a power of two. When a processing unit is responsible for sending m {\displaystyle m} to processing units i . . j {\displaystyle i..j} , it sends m {\displaystyle m} to processing unit ⌈ ( i + j ) / 2 ⌉ {\displaystyle \left\lceil (i+j)/2\right\rceil } and delegates responsibility for the processing units ⌈ ( i + j ) / 2 ⌉ . . j {\displaystyle \left\lceil (i+j)/2\right\rceil ..j} to it, while its own responsibility is cut down to i . . ⌈ ( i + j ) / 2 ⌉ − 1 {\displaystyle i..\left\lceil (i+j)/2\right\rceil -1} . Binomial trees have a problem with long messages m {\displaystyle m} . The receiving unit of m {\displaystyle m} can only propagate the message to other units, after it received the whole message. In the meantime, the communication network is not utilized. Therefore pipelining on binary trees is used, where m {\displaystyle m} is split into an array of k {\displaystyle k} packets of size ⌈ n / k ⌉ {\displaystyle \left\lceil n/k\right\rceil } . The packets are then broadcast one after another, so that data is distributed fast in the communication network. Pipelined broadcast on balanced binary tree is possible in O ( α log p + β n ) {\displaystyle {\mathcal {O}}(\alpha \log p+\beta n)} , whereas for the non-pipelined case it takes O ( ( α + β n ) log p ) {\displaystyle {\mathcal {O}}((\alpha +\beta n)\log p)} cost. == Reduce == The reduce pattern is used to collect data or partial results from different processing units and to combine them into a global result by a chosen operator. Given p {\displaystyle p} processing units, message m i {\displaystyle m_{i}} is on processing unit p i {\displaystyle p_{i}} initially. All m i {\displaystyle m_{i}} are aggregated by ⊗ {\displaystyle \otimes } and the result is eventually stored on p 0 {\displaystyle p_{0}} . The reduction operator ⊗ {\displaystyle \otimes } must be associative at least. Some algorithms require a commutative operator with a neutral element. Operators like s u m {\displaystyle sum} , m i n {\displaystyle min} , m a x {\displaystyle max} are common. Implementation considerations are similar to broadcast (§ Broadcast). For pipelining on binary trees the message must be representable as a vector of smaller object for component-wise reduction. Pipelined reduce on a balanced binary tree is possible in O ( α log p + β n ) {\displaystyle {\mathcal {O}}(\alpha \log p+\beta n)} . == All-Reduce == The all-reduce pattern (also called allreduce) is used if the result of a reduce operation (§ Reduce) must be distributed to all processing units. Given p {\displaystyle p} processing units, message m i {\displaystyle m_{i}} is on processing unit p i {\displaystyle p_{i}} initially. All m i {\displaystyle m_{i}} are aggregated by an operator ⊗ {\displaystyle \otimes } and the result is eventually stored on all p i {\displaystyle p_{i}} . Analog to the reduce operation, the operator ⊗ {\displaystyle \otimes } must be at least associative. All-reduce can be interpreted as a reduce operation with a subsequent broadcast (§ Broadcast). For long messages a corresponding implementation is suitable, whereas for short messages, the latency can be reduced by using a hypercube (Hypercube (communication pattern) § All-Gather/ All-Reduce) topology, if p {\displaystyle p} is a power of two. All-reduce can also be implemented with a butterfly algorithm and achieve optimal latency and bandwidth. All-reduce is possible in O ( α log p + β n ) {\displaystyle {\mathcal {O}}(\alpha \log p+\beta n)} , since reduce and broadcast are possible in O ( α log p + β n ) {\displaystyle {\mathcal {O}}(\alpha \log p+\beta n)} with pipelining on balanced binary trees. All-reduce implemented with a butterfly algorithm achieves the same asymptotic runtime. == Prefix-Sum/Scan == The prefix-sum or scan operation is used to collect data or partial results from different processing units and to compute intermediate results by an operator, which are stored on those processing units. It can be seen as a generalization of the reduce operation (§ Reduce). Given p {\displaystyle p} processing units, message m i {\displaystyle m_{i}} is on processing unit p i {\displaystyle p_{i}} . The operator ⊗ {\displaystyle \otimes } must be at least associative, whereas some algorithms require also a commutative operator and a neutral element. Common operators are s u m {\displaystyle sum} , m i n {\displaystyle min} and m a x {\displaystyle max} . Eventually processing unit p i {\displaystyle p_{i}} stores the prefix sum ⊗ i ′ <= i {\displaystyle \otimes _{i'<=i}} m i ′ {\displaystyle m_{i'}} . In the case of the so-called exclusive prefix sum, processing unit p i {\displaystyle p_{i}} stores the prefix sum ⊗ i ′ < i {\displaystyle \otimes _{i'
Olio (app)
Olio is a mobile app for sharing by giving away, getting, borrowing or lending things in your community for free, aiming to reduce household and food waste. It does this by connecting neighbours with spare food or household items to others nearby who wish to pick up those items. The food must be edible; it can be raw or cooked, sealed or open. Non-food items often listed on Olio include books, clothes and furniture. Those donating surplus food can be individuals or companies such as food retailers, restaurants, corporate canteens, food photographers etc., and donations can take place on an ad-hoc or recurrent basis. For example, some supermarket chains in the UK, including Tesco, the Midcounties Co-operative, Morrisons, Sainsbury's and Iceland have piloted Olio as an 'online food bank' to donate food and to reduce their waste. In March 2022, Olio partnered with Pandamart in Singapore. First launched in early 2015 by Tessa Clarke and Saasha Celestial-One, by October 2017 the company had raised $2.2 million in funding. Olio subsequently performed a series A funding round of $6 million in 2018 and a Series B of $43 million. Notable investors include Accel, Octopus Ventures and VNV Global. The Olio app had around 7 million registered users as of May 2023.
Transmission security
Transmission security (TRANSEC) is the component of communications security (COMSEC) that results from the application of measures designed to protect transmissions from interception and exploitation by means other than cryptanalysis. Goals of transmission security include: Low probability of interception (LPI) Low probability of detection (LPD) Antijam — resistance to jamming (EPM or ECCM) This involves securing communication links from being compromised by techniques like jamming, eavesdropping, and signal interception. TRANSEC includes the use of frequency hopping, spread spectrum and the physical protection of communication links to obscure the patterns of transmission. It is particularly vital in military and government communication systems, where the security of transmitted data is critical to prevent adversaries from gathering intelligence or disrupting operations. TRANSEC is often implemented alongside COMSEC (Communications Security) to form a comprehensive approach to communication security. Methods used to achieve transmission security include frequency hopping and spread spectrum where the required pseudorandom sequence generation is controlled by a cryptographic algorithm and key. Such keys are known as transmission security keys (TSK). Modern U.S. and NATO TRANSEC-equipped radios include SINCGARS and HAVE QUICK.
Sharenting
"Sharenting" is a portmanteau of "sharing" and "parenting", describing the practice of parents publicizing a large amount of potentially sensitive content about their children on internet platforms, most notably on social media. While the term was coined as recently as 2010, sharenting has become an international phenomenon with widespread presence in the United States, Spain, France, and the United Kingdom. Proponents of sharenting frame the practice as a natural expression of parental pride in their children and argue that critics take sharenting-related posts out of context. Detractors find that it violates child privacy and hurts a parent–child relationship. Academic research has been conducted over the potential social motivations for sharenting and legal frameworks to balance child privacy with this parental practice. Researchers have conducted several psychological surveys, outlining social media accessibility, parental self-identification with children, and social pressure as potential causes for sharenting. Legal scholars have identified international human rights laws, labor protections, and recent online child privacy statutes as potential legal standards to check sharenting abuses. == History == The origins of the term "sharenting" have been attributed to the Wall Street Journal, where they called it "oversharenting," a portmanteau of "oversharing" and "parenting." Priya Kumar suggests that recording life moments of children rearing is not a new practice: people have been using diaries, scrapbooks and baby log books as the media of documentation for centuries. Scholars assert that sharenting has become popular as a result of social media, which has made many people more comfortable with sharing their lives and those of their children online. The trend of oversharing on social media has raised public attention in the 2010s and become the focus of a number of editorials and academic research projects. It was also added to Times Word of the Day in February 2013 and Collins English Dictionary in 2016 given its influence. == Popularity == Several studies describe sharenting as an international phenomenon with widespread prevalence across households. In the United States, researchers at the University of Michigan C.S. Mott Children's Hospital found that almost 75% of American parents were familiar with someone who over-shared information about their child on social media, and an AVG survey determined that 92% of all American two-year-olds had some presence on the internet. In Australia, Fisher-Price conducted a survey which revealed that 90% of Australian parents admitted to over-sharing. In Spain and Czech Republic, a survey of approximately 1,500 parents found that 70-80% participated in sharenting. In the United Kingdom, France, Germany, and Italy, a Research Now report revealed that almost three-quarters of surveyed parents said that they were "willing to share images of their infants". Some claim that sharenting presents a violation of child privacy, and this backlash includes anti-sharenting sites and apps that block baby pictures. One particular outlet of protest was the blog STFU Parents, founded in 2009 to criticize parental oversharing on social media. Some parents felt that these criticisms of sharenting often took posts out of context and neglected some positive aspects of the practice, including advancing a stronger sense of online community. Others, while acknowledging the potential privacy violations of sharenting, suggested a more tailored approach that would only permit posting under certain conditions, notwithstanding audience and identification restrictions for social media posts. == Motivations == Research has suggested that sharenting is associated with a mix of parent self-identification with children, mothering pressures, and the accessibility of social media. Conducting 17 interviews with mothers in the United Kingdom, a London School of Economics study found that parent bloggers often re-explained their sharing practices in terms of expressing their own personal identity, representing their own child as part of themselves. In particular, the report surveyed the use of blogs as a networking vehicle to connect parents with similar family situations and found that sharenting parents, by filtering self-presentation through their parent-child relationship, adopted a more relational identity on social media websites. This included identifying oneself in terms of parental circumstances, whether it be raising a child with a disability or being a single mother. Alternatively, some have suggested that these online expressions indicate the infiltration of individual pride into the sphere of parenting, as family photography becomes a means to "show off" one's children to the others and strengthens a parent's sense of individuated self. Addressing the prevalence of mothers engaging in sharenting, those who purport this view argue that the rise of digital communication has pressured mothers into performing the role of a "good" parent on social media platforms. They claim that these developments may reinforce a dominant vision of a "normal" family, as sharenting posts could be motivated by the need to converge to a normative interpretation of family. == Controversy == While some people assert that online platforms enable parents to establish a community and seek parenting support, others are concerned about the children's data privacy and their lack of informed consent. Sharing content may not only embarrass children but also creates an initial digital footprint, a history of online activity, that the children themselves have no control over. This might bring some negative consequences, such as being ridiculed at school or leaving a negative impression on future employers. === Parental benefits === Many parents use social media to seek parenting advice and share information about their children. With the convenience of online platforms, parent bloggers can easily connect with other people in similar situations as well as those who are willing to contribute meaningful advice. By forming a community, parents can receive encouragement from empathetic peers and assistance from experts in children rearing. Parents whose children need special educational accommodations or have disabilities often found themselves detached from the mainstream parenting style. Therefore, they regard online blogs as a means to gain support from others and support back. Online blogging enables parents of children with disabilities and special needs to connect with other parents. The advice from similarly situated families can open up new possibilities that help the parents "negotiate the complexities of social services, health care, and schools". However, in some cases, posting online about a parent's struggles can cause a backlash, as advocates may accuse the parent of presenting people with that condition in a bad light, or wonder how the child will feel, if they later read these posts and see how much their parents struggled to care for them. Such advantages of social media are not limited to particular groups of parents. In general, most parents benefit from exchanging parenting experience. Statistically speaking, 72% of parents rate social media useful for emotional connection and affirmations, and 74% of them receive support about parenting from friends on social media. Sharenting also plays a role in fostering interpersonal relationships. As the images and words about children's lives initiate conversations, parents use sharenting to stay connected with distant friends and relatives. In particular, mothers, as a research study reveals, are willing to engage in sharenting since they believe that the positive contents can help avoid digital conflicts and maintain close relations with those in their social circles. Researchers also found that female participants in this study carefully chose photos and phrases to express love and present laudable behaviors of children in their updates, which indicates their intention to convey positive messages. These messages also promote a close social network for a child as the parents invites supportive family members and friends into daily life. === Children's privacy === Given the potential misuse of digital data, people are critical about sharenting, and the majority of parents are cautious about the wrongdoing with online posts. The disclosure of minors' personal information, such as geographic location, name, date of birth, pictures, and the schools they attend, might expose them to illegal practices by recipients with malicious intentions. Sharented information is often abused for "identity theft", when imposters manage to track, stalk, commit fraud against children, or even blackmail the family. According to Barclays, online fraud targeting the young generation will contribute to a loss of £670 million (approximately $790 million) by 2030, and two-thirds of identity fraud will be related to s
Data deduplication
In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve storage utilization, which may in turn lower capital expenditure by reducing the overall amount of storage media required to meet storage capacity needs. It can also be applied to network data transfers to reduce the number of bytes that must be sent. The deduplication process requires comparison of data 'chunks' (also known as 'byte patterns') which are unique, contiguous blocks of data. These chunks are identified and stored during a process of analysis, and compared to other chunks within existing data. Whenever a match occurs, the redundant chunk is replaced with a small reference that points to the stored chunk. Given that the same byte pattern may occur dozens, hundreds, or even thousands of times (the match frequency is dependent on the chunk size), the amount of data that must be stored or transferred can be greatly reduced. A related technique is single-instance (data) storage, which replaces multiple copies of content at the whole-file level with a single shared copy. While possible to combine this with other forms of data compression and deduplication, it is distinct from newer approaches to data deduplication (which can operate at the segment or sub-block level). Deduplication is different from data compression algorithms, such as LZ77 and LZ78. Whereas compression algorithms identify redundant data inside individual files and encodes this redundant data more efficiently, the intent of deduplication is to inspect large volumes of data and identify large sections – such as entire files or large sections of files – that are identical, and replace them with a shared copy. == Functioning principle == For example, a typical email system might contain 100 instances of the same 1 MB (megabyte) file attachment. Each time the email platform is backed up, all 100 instances of the attachment are saved, requiring 100 MB storage space. With data deduplication, only one instance of the attachment is actually stored; the subsequent instances are referenced back to the saved copy for deduplication ratio of roughly 100 to 1. Deduplication is often paired with data compression for additional storage saving: Deduplication is first used to eliminate large chunks of repetitive data, and compression is then used to efficiently encode each of the stored chunks. In computer code, deduplication is done by, for example, storing information in variables so that they don't have to be written out individually but can be changed all at once at a central referenced location. Examples are CSS classes and named references in MediaWiki. == Benefits == Storage-based data deduplication reduces the amount of storage needed for a given set of files. It is most effective in applications where many copies of very similar or even identical data are stored on a single disk. In the case of data backups, which routinely are performed to protect against data loss, most data in a given backup remain unchanged from the previous backup. Common backup systems try to exploit this by omitting (or hard linking) files that haven't changed or storing differences between files. Neither approach captures all redundancies, however. Hard-linking does not help with large files that have only changed in small ways, such as an email database; differences only find redundancies in adjacent versions of a single file (consider a section that was deleted and later added in again, or a logo image included in many documents). In-line network data deduplication is used to reduce the number of bytes that must be transferred between endpoints, which can reduce the amount of bandwidth required. See WAN optimization for more information. Virtual servers and virtual desktops benefit from deduplication because it allows nominally separate system files for each virtual machine to be coalesced into a single storage space. At the same time, if a given virtual machine customizes a file, deduplication will not change the files on the other virtual machines—something that alternatives like hard links or shared disks do not offer. Backing up or making duplicate copies of virtual environments is similarly improved. == Classification == === Post-process versus in-line deduplication === Deduplication may occur "in-line", as data is flowing, or "post-process" after it has been written. With post-process deduplication, new data is first stored on the storage device and then a process at a later time will analyze the data looking for duplication. The benefit is that there is no need to wait for the hash calculations and lookup to be completed before storing the data, thereby ensuring that store performance is not degraded. Implementations offering policy-based operation can give users the ability to defer optimization on "active" files, or to process files based on type and location. One potential drawback is that duplicate data may be unnecessarily stored for a short time, which can be problematic if the system is nearing full capacity. Alternatively, deduplication hash calculations can be done in-line: synchronized as data enters the target device. If the storage system identifies a block which it has already stored, only a reference to the existing block is stored, rather than the whole new block. The advantage of in-line deduplication over post-process deduplication is that it requires less storage and network traffic, since duplicate data is never stored or transferred. On the negative side, hash calculations may be computationally expensive, thereby reducing the storage throughput. However, certain vendors with in-line deduplication have demonstrated equipment which performs in-line deduplication at high rates. Post-process and in-line deduplication methods are often heavily debated. === Data formats === The SNIA Dictionary identifies two methods: Content-agnostic data deduplication – a data deduplication method that does not require awareness of specific application data formats. Content-aware data deduplication – a data deduplication method that leverages knowledge of specific application data formats. === Source versus target deduplication === Another way to classify data deduplication methods is according to where they occur. Deduplication occurring close to where data is created, is referred to as "source deduplication". When it occurs near where the data is stored, it is called "target deduplication". Source deduplication ensures that data on the data source is deduplicated. This generally takes place directly within a file system. The file system will periodically scan new files creating hashes and compare them to hashes of existing files. When files with same hashes are found then the file copy is removed and the new file points to the old file. Unlike hard links however, duplicated files are considered to be separate entities and if one of the duplicated files is later modified, then using a system called copy-on-write a copy of that changed file or block is created. The deduplication process is transparent to the users and backup applications. Backing up a deduplicated file system will often cause duplication to occur resulting in the backups being bigger than the source data. Source deduplication can be declared explicitly for copying operations, as no calculation is needed to know that the copied data is in need of deduplication. This leads to a new form of link on file systems, called a reference-counted link, or reflink, in some systems (e.g. Linux), or a cloned file on macOS, where one or more inodes (file information entries) are made to share some or all of their data. It is named analogously to hard links, which work at the inode level, and symbolic links, which work at the filename level.The individual entries have a copy-on-write behavior that is non-aliasing, i.e. changing one copy afterwards will not affect other copies. Microsoft's ReFS also supports this operation. Target deduplication is the process of removing duplicates when the data was not generated at that location. Example of this would be a server connected to a SAN/NAS, The SAN/NAS would be a target for the server (target deduplication). The server is not aware of any deduplication, the server is also the point of data generation. A second example would be backup. Generally this will be a backup store such as a data repository or a virtual tape library. === Deduplication methods === One of the most common forms of data deduplication implementations works by comparing chunks of data to detect duplicates. For that to happen, each chunk of data is assigned an identification, calculated by the software, typically using cryptographic hash functions. In many implementations, the assumption is made that if the identification is identical, the data is identical, even though this cannot be true in all cases due to the pigeonhole principle; other implementations do not as
Equalized odds
Equalized odds, also referred to as conditional procedure accuracy equality and disparate mistreatment, is a measure of fairness in machine learning. A classifier satisfies this definition if the subjects in the protected and unprotected groups have equal true positive rate and equal false positive rate, satisfying the formula: P ( R = + | Y = y , A = a ) = P ( R = + | Y = y , A = b ) y ∈ { + , − } ∀ a , b ∈ A {\displaystyle P(R=+|Y=y,A=a)=P(R=+|Y=y,A=b)\quad y\in \{+,-\}\quad \forall a,b\in A} For example, A {\displaystyle A} could be gender, race, or any other characteristics that we want to be free of bias, while Y {\displaystyle Y} would be whether the person is qualified for the degree, and the output R {\displaystyle R} would be the school's decision whether to offer the person to study for the degree. In this context, higher university enrollment rates of African Americans compared to whites with similar test scores might be necessary to fulfill the condition of equalized odds, if the "base rate" of Y {\displaystyle Y} differs between the groups. The concept was originally defined for binary-valued Y {\displaystyle Y} . In 2017, Woodworth et al. generalized the concept further for multiple classes.
Memory-hard function
In cryptography, a memory-hard function (MHF) is a function that costs a significant amount of memory to efficiently evaluate. It differs from a memory-bound function, which incurs cost by slowing down computation through memory latency. MHFs have found use in key stretching and proof of work as their increased memory requirements significantly reduce the computational efficiency advantage of custom hardware over general-purpose hardware compared to non-MHFs. == Introduction == MHFs are designed to consume large amounts of memory on a computer in order to reduce the effectiveness of parallel computing. In order to evaluate the function using less memory, a significant time penalty is incurred. As each MHF computation requires a large amount of memory, the number of function computations that can occur simultaneously is limited by the amount of available memory. This reduces the efficiency of specialised hardware, such as application-specific integrated circuits and graphics processing units, which utilise parallelisation, in computing a MHF for a large number of inputs, such as when brute-forcing password hashes or mining cryptocurrency. == Motivation and examples == Bitcoin's proof-of-work uses repeated evaluation of the SHA-256 function, but modern general-purpose processors, such as off-the-shelf CPUs, are inefficient when computing a fixed function many times over. Specialized hardware, such as application-specific integrated circuits (ASICs) designed for Bitcoin mining, can use 30,000 times less energy per hash than x86 CPUs whilst having much greater hash rates. This led to concerns about the centralization of mining for Bitcoin and other cryptocurrencies. Because of this inequality between miners using ASICs and miners using CPUs or off-the shelf hardware, designers of later proof-of-work systems utilised hash functions for which it was difficult to construct ASICs that could evaluate the hash function significantly faster than a CPU. As memory cost is platform-independent, MHFs have found use in cryptocurrency mining, such as for Litecoin, which uses scrypt as its hash function. They are also useful in password hashing because they significantly increase the cost of trying many possible passwords against a leaked database of hashed passwords without significantly increasing the computation time for legitimate users. == Measuring memory hardness == There are various ways to measure the memory hardness of a function. One commonly seen measure is cumulative memory complexity (CMC). In a parallel model, CMC is the sum of the memory required to compute a function over every time step of the computation. Other viable measures include integrating memory usage against time and measuring memory bandwidth consumption on a memory bus. Functions requiring high memory bandwidth are sometimes referred to as "bandwidth-hard functions". == Variants == MHFs can be categorized into two different groups based on their evaluation patterns: data-dependent memory-hard functions (dMHF) and data-independent memory-hard functions (iMHF). As opposed to iMHFs, the memory access pattern of a dMHF depends on the function input, such as the password provided to a key derivation function. Examples of dMHFs are scrypt and Argon2d, while examples of iMHFs are Argon2i and catena. Many of these MHFs have been designed to be used as password hashing functions because of their memory hardness. A notable problem with dMHFs is that they are prone to side-channel attacks such as cache timing. This has resulted in a preference for using iMHFs when hashing passwords. However, iMHFs have been mathematically proven to have weaker memory hardness properties than dMHFs.